'twarc search' retrevieing slightly different results compared to 'rtweet::search_tweets()'?
See original GitHub issueHi all,
Trying to get off the ground with the Twitter API and doing some sanity checks for basic calls across different API wrappers. I really like twarc
and rtweet
, and it has been a pretty straightforward experience so far! Before running larger calls, I’m trying to wrap my head around the basic functionality of the API, especially around geocoded calls. As my baseline, I am trying to pull all the tweets in a two-mile area around College Park, MD, in the last 7 days.
Here is the twarc
command I am using:
twarc search --geocode 38.987202,-76.945999,2mi > tweets.jsonl
Which returned 3594 records. Shortly thereafter, I tried the following command in rtweet
:
rt2mi <- search_tweets(
geocode = "38.987202,-76.945999,2mi",
retryonratelimit = TRUE,
n = 10000
)
Which returns about 3610 tweets.
Surface-level checks indicate that they are indeed pulling the same tweets, but I’m coming up a bit short using twarc
.
Are there any obvious reasons for this discrepancy? Or perhaps I’m missing something about the underlying twitter API? I am using the same API keys and access tokens.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
If you figure out a good way to read line-oriented-json from R please let us know as it’s a question that has come up here periodically, e.g. #322
Yes, that’s a good question. The twarc command line outputs each tweet on a line so you can count the number of lines in the file. On Unix/OSX you can:
Does that make sense?