Tracking injuries programmaticallySee original GitHub issue
Wow, a lot has happened over the holiday regarding the Complete List of Active Players, thanks to cminton and BurntSushi. The team rosters look very nice, but does anyone else notice the number of duplicate names, seams like more than ever. These are just the ones on my existing -----‘excel generated’ ------injury list.
Jonathan Stewart Brandon Marshall Andre Smith Chris Givens A.J. Davis Brandon Williams Steve Williams Chris Clemons Zach Miller Michael Smith
Currently I’m getting the injuries here , http://www.pro-football-reference.com/years/2013/injuries.htm . The name formats are a very good match. But because of the lack of a
gsisid simple name matching will be troublesome again this year. Ultimately, I believe my answer is here; http://www.nfl.com/injuries?week=1 which just became active this morning but has yet to be populated. I’m confident in being able to scrape this page with excel. However, if the
gsisid is present it certainly won’t be listed in the table format, but rather obscured in the source code. Does anyone know how to scrape that type of page of all its contents?
- Created 10 years ago
- Comments:14 (12 by maintainers)
Top GitHub Comments
These all go to the same place with the last one being the final destination.
An astute observation! In fact, that is precisely how I am discovering a mapping between profile ID and gsis ID. That is also how I would do it if I wanted a mapping to these new ESB ids. But that’s an additional request for each player, which would turn several thousand requests into 10,000+ requests. Owch.
Although, these are only
HEAD requests, which only retrieves meta data about the HTTP request and doesn’t actually download the full page. So they are much faster and more lightweight.
The HTTP redirect info you show later in your post is exactly what these
HEAD requests are capturing. It’s just enough to say, “oh hey where did you move to? oh ok, that’s fine, now I’m done.”
How do they keep those straight on their site!
Hacks, man. Hacks. They have legacy systems. Then they decide, “oh hey, we actually want to be able to cross reference data between these legacy systems.” A programmer speaks up and says, “Yes, we’ll need to rebuild some stuff in order to incorporate a more cohesive design.” Astute business man says, “Rebuild!?! WAT! No. Don’t you know this is a business and we need to make money! Long term advantages are irrelevant. WE WANT IT NOW!”
And so, the programmer adds just enough intermediary code for one legacy system to talk to another. This no doubt includes a table somewhere that maintains a correspondence between these different identifiers.
Sorry for the rant. It’s a sore point.
Is there a table anywhere on the nfl.com side that maps all of these together and shows the redirect to the profile ID page?
Ha… Ha… If only.
There were a few players last year that didn’t follow the F.Lastname
Do not, I repeat DO NOT, try to discover a mapping between
F.Lastname and a player’s real name. There is absolutely no consistent. There are many many duplicates. And there’s just no point in doing it now that the JSON player database has full names for almost every single player.
If you want/need injuries imminently, the easiest approach would be to scrape that injury page you have, and match the name/team/position to data in
nflgame.players. I actually suspect it will achieve very good results if you only care about current injuries.
I think I’m going to close this for now. I don’t really have any plans to add extra injury support to
nflgame. If someone else would like to come along and add it, please open a new issue with ideas.