Should twarc filter write limit warnings to output?
See original GitHub issueUsing filter mode like so: twarc filter aleppo --log aleppo_filter.log > aleppo_filter.json
, I collected a few million tweets, and noticed I was having a number of issues using the utilities, mailing ids.py
and deduplicate.py
when I combine that filter output with search output. Lots of uhoh: 'id_str
from ids.py
, and I got two different counts:
$ wc -l aleppo_filter.json filter_ids.txt
6136161 aleppo_filter.json
6133958 filter_ids.txt
12270119 total
I added a couple lines to ids.py
to print out the offending line numbers, and printed out an offending line:
$ sed -n '4502042p' aleppo_filter.json
{"limit": {"track": 2530, "timestamp_ms": "1482168932301"}}
That’s from going over the 1% of the over all stream, right? If so, should that be in the output from the filter stream using twarc filter aleppo --log aleppo_filter.log > aleppo_filter.json
?
Issue Analytics
- State:
- Created 7 years ago
- Comments:12 (12 by maintainers)
Top Results From Across the Web
twarc.Client - Read the Docs
Twarc allows you retrieve data from the Twitter API. Each method is an iterator that runs to completion, and handles rate limiting so...
Read more >twarc - PyPI
twarc is a command line tool and Python library for archiving Twitter JSON data. Each tweet is represented as a JSON object that...
Read more >Twarc there it is! - Nick Ruest
Twarc there it is! Nick Ruest York University Data Love-In Vancouver, Canada February 14, 2018. Workshop Overview.
Read more >Getting no data in full archive search using Twarc
Hi All, I tried using twarc to further get the results that I want but am still facing issues. Every time I run...
Read more >Harvesting Twitter Data with twarc - The Carpentries Incubator
twarc allows you to request specific data based on keywords, hashtags, events, and other areas of interest. Once you have your dataset, twarc...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Sure, that would be great.
…I could clean it up and put it
utils
if you think that would be useful.