Add ability to manually specify expansions and fields
See original GitHub issueNormally, twarc aims to grab everything. But this seems like it’s causing problems in the API if the requests are too big, eg #449
It would be good to have a manual override for the expansions and fields. The extra command line parameters to align with the API https://developer.twitter.com/en/docs/twitter-api/data-dictionary/using-fields-and-expansions should have:
--expansions "author_id,geo.place_id"
where the valid ones are: https://github.com/DocNow/twarc/blob/main/twarc/expansions.py#L16
Same for:
--user-fields
--tweet-fields
--media-fields
--poll-fields
--place-fields
Ideally it should also complain with an error or automatically set things fro you - if you specify --poll-fields
but fail to specify attachments.poll_ids
in --expansions
. It would be nice to parse these and validate them for the user, but if that’s too complicated and cumbersome, just a check and a warning should be enough.
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (9 by maintainers)
Top GitHub Comments
How are we feeling about this now?
Based on a bit more handson work with the API, the only thing I’d really want to turn off is the context annotations so that I can collect data faster. Maybe instead of full customisability an off-by-default --exclude-context-annotations flag to support the 500 requests/page would cover most of this?
I’m a hard disagree on this one right now. We’re really only a few months into early access, I think it’s a little premature to be working around Twitter’s API instability. Especially since that has impacts on downstream plugins.
Also I live 15000km from most of the internet, so 503’s aren’t exactly rare 😉