question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support scraping data from local files

See original GitHub issue

Hi @roclark! Really like the project. What do you think of supporting local HTML files that have been downloaded from sports-reference in advance?

Could be nice to let users specify that they’ve pre-downloaded certain resources through some kind of API configuration, maybe with a mapping like {'some-resource-id': 'path_to_resource_page.html'}

After looking through the code a bit, maybe this could happen in utils.py with some new function that gets a document, choosing between PyQuery(url=x) and PyQuery(filename=x)?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
roclarkcommented, Nov 6, 2018

I like your idea of the downloader, I think that would clear up a lot here! Also, allowing the user the option to specify local HTML files and downloading them if necessary would be helpful in order to minimize confusion as to which pages are needed for each class.

You are more than welcome to create a PR for this if you desire. I am focused on adding a few other features at the moment (including creating a website for a related project), so I don’t think I will be able to get to this immediately, but as mentioned, this is definitely something I see value in and would like to include. It just might take me a bit before I can call it complete. 😄

0reactions
roclarkcommented, Nov 29, 2019

Hey @vesper8, thanks for the additional feedback! I think now is a good time to revisit this, and you make a great point on lowering the server load on their side. I will try and work on a way to incorporate this with one of the upcoming releases. I think in the utility module, I can create a way to route the pulling of the webpage, and get it from a local directory. I will work on this a bit and see if I can get something going!

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Scrape Data From Local HTML Files using Python?
BeautifulSoup module in Python allows us to scrape data from local HTML files. For some reason, website pages might get stored in a...
Read more >
HowTo - Scrape Data From Local HTML Files - WebSundew
Select Local Files. The agent's start up mode will change. Select folder with target HTML files. You can add several folders to process,...
Read more >
can we scrape a local file? - Google Groups
Web Scraper won't accept local file urls. You can serve the html files as a local web site and then scrape it. If...
Read more >
scraping the html file saved in local system - Stack Overflow
You can write the code in this way to scrape your own file saved in local system from bs4 import BeautifulSoup import html5lib ......
Read more >
Web Scraping Basics - Towards Data Science
Web Scraping is an automatic way to retrieve unstructured data from a website and store them in a structured format.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found