question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Should twarc filter write limit warnings to output?

See original GitHub issue

Using filter mode like so: twarc filter aleppo --log aleppo_filter.log > aleppo_filter.json, I collected a few million tweets, and noticed I was having a number of issues using the utilities, mailing ids.py and deduplicate.py when I combine that filter output with search output. Lots of uhoh: 'id_str from ids.py, and I got two different counts:

$ wc -l aleppo_filter.json filter_ids.txt
    6136161 aleppo_filter.json
    6133958 filter_ids.txt
   12270119 total

I added a couple lines to ids.py to print out the offending line numbers, and printed out an offending line:

$ sed -n '4502042p' aleppo_filter.json 
{"limit": {"track": 2530, "timestamp_ms": "1482168932301"}}

That’s from going over the 1% of the over all stream, right? If so, should that be in the output from the filter stream using twarc filter aleppo --log aleppo_filter.log > aleppo_filter.json?

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:12 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
edsucommented, Jan 29, 2017

Sure, that would be great.

1reaction
ruebotcommented, Jan 29, 2017

…I could clean it up and put it utils if you think that would be useful.

Read more comments on GitHub >

github_iconTop Results From Across the Web

twarc.Client - Read the Docs
Twarc allows you retrieve data from the Twitter API. Each method is an iterator that runs to completion, and handles rate limiting so...
Read more >
twarc - PyPI
twarc is a command line tool and Python library for archiving Twitter JSON data. Each tweet is represented as a JSON object that...
Read more >
Twarc there it is! - Nick Ruest
Twarc there it is! Nick Ruest York University Data Love-In Vancouver, Canada February 14, 2018. Workshop Overview.
Read more >
Getting no data in full archive search using Twarc
Hi All, I tried using twarc to further get the results that I want but am still facing issues. Every time I run...
Read more >
Harvesting Twitter Data with twarc - The Carpentries Incubator
twarc allows you to request specific data based on keywords, hashtags, events, and other areas of interest. Once you have your dataset, twarc...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found