Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support scraping data from local files

See original GitHub issue

Hi @roclark! Really like the project. What do you think of supporting local HTML files that have been downloaded from sports-reference in advance?

Could be nice to let users specify that they’ve pre-downloaded certain resources through some kind of API configuration, maybe with a mapping like {'some-resource-id': 'path_to_resource_page.html'}

After looking through the code a bit, maybe this could happen in utils.py with some new function that gets a document, choosing between PyQuery(url=x) and PyQuery(filename=x)?

Issue Analytics

State:
Created 5 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

roclarkcommented, Nov 6, 2018

I like your idea of the downloader, I think that would clear up a lot here! Also, allowing the user the option to specify local HTML files and downloading them if necessary would be helpful in order to minimize confusion as to which pages are needed for each class.

You are more than welcome to create a PR for this if you desire. I am focused on adding a few other features at the moment (including creating a website for a related project), so I don’t think I will be able to get to this immediately, but as mentioned, this is definitely something I see value in and would like to include. It just might take me a bit before I can call it complete. 😄

0reactions

roclarkcommented, Nov 29, 2019

Hey @vesper8, thanks for the additional feedback! I think now is a good time to revisit this, and you make a great point on lowering the server load on their side. I will try and work on a way to incorporate this with one of the upcoming releases. I think in the utility module, I can create a way to route the pulling of the webpage, and get it from a local directory. I will work on this a bit and see if I can get something going!