Tracking injuries programmatically
See original GitHub issueWow, a lot has happened over the holiday regarding the Complete List of Active Players, thanks to cminton and BurntSushi. The team rosters look very nice, but does anyone else notice the number of duplicate names, seams like more than ever. These are just the ones on my existing -----‘excel generated’ ------injury list.
Jonathan Stewart Brandon Marshall Andre Smith Chris Givens A.J. Davis Brandon Williams Steve Williams Chris Clemons Zach Miller Michael Smith
Currently I’m getting the injuries here , http://www.pro-football-reference.com/years/2013/injuries.htm . The name formats are a very good match. But because of the lack of a gsisid
simple name matching will be troublesome again this year. Ultimately, I believe my answer is here; http://www.nfl.com/injuries?week=1 which just became active this morning but has yet to be populated. I’m confident in being able to scrape this page with excel. However, if the gsisid
is present it certainly won’t be listed in the table format, but rather obscured in the source code. Does anyone know how to scrape that type of page of all its contents?
Issue Analytics
- State:
- Created 10 years ago
- Comments:14 (12 by maintainers)
Top GitHub Comments
An astute observation! In fact, that is precisely how I am discovering a mapping between profile ID and gsis ID. That is also how I would do it if I wanted a mapping to these new ESB ids. But that’s an additional request for each player, which would turn several thousand requests into 10,000+ requests. Owch.
Although, these are only
HEAD
requests, which only retrieves meta data about the HTTP request and doesn’t actually download the full page. So they are much faster and more lightweight.The HTTP redirect info you show later in your post is exactly what these
HEAD
requests are capturing. It’s just enough to say, “oh hey where did you move to? oh ok, that’s fine, now I’m done.”Hacks, man. Hacks. They have legacy systems. Then they decide, “oh hey, we actually want to be able to cross reference data between these legacy systems.” A programmer speaks up and says, “Yes, we’ll need to rebuild some stuff in order to incorporate a more cohesive design.” Astute business man says, “Rebuild!?! WAT! No. Don’t you know this is a business and we need to make money! Long term advantages are irrelevant. WE WANT IT NOW!”
And so, the programmer adds just enough intermediary code for one legacy system to talk to another. This no doubt includes a table somewhere that maintains a correspondence between these different identifiers.
Sorry for the rant. It’s a sore point.
Ha… Ha… If only.
Do not, I repeat DO NOT, try to discover a mapping between
F.Lastname
and a player’s real name. There is absolutely no consistent. There are many many duplicates. And there’s just no point in doing it now that the JSON player database has full names for almost every single player.If you want/need injuries imminently, the easiest approach would be to scrape that injury page you have, and match the name/team/position to data in
nflgame.players
. I actually suspect it will achieve very good results if you only care about current injuries.I think I’m going to close this for now. I don’t really have any plans to add extra injury support to
nflgame
. If someone else would like to come along and add it, please open a new issue with ideas.