question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

conversations module "sleep" inconsistencies

See original GitHub issue

Hi there, I feel like I’m running into a similar issue as described here: https://twittercommunity.com/t/inconsistent-rate-limit-academic-research-full-archive-search/162928/18?u=igorbrigadir

and here: https://github.com/DocNow/twarc/pull/578.

I too am fetching all tweets related to a conversation id using the twarc2 command: Twarc2 conversations --archive input_conversation_ids.txt output_conversation_tweets.jsonl

But, I’m finding that it is doing far fewer than the 300 requests per 15 minutes that my academic Twitter API account has been allotted.

I’m using the latest version of twarc 2.8.2

Here is some of the log that I’m seeing (Notice that at 10:14:24 it stops…and then doesn’t restart until 11:23:27 for no clear reason.

2022-01-01 10:04:32,148 INFO fetching conversation 290902007206772736
2022-01-01 10:04:32,148 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'media.fields': 'alt_text,duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'start_time': '2006-03-21T00:00:00+00:00', 'end_time': None, 'query': 'conversation_id:290902007206772736', 'max_results': 100}}
2022-01-01 10:04:32,174 WARNING rate limit exceeded: sleeping 591.8250887393951 secs
2022-01-01 10:14:24,003 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'media.fields': 'alt_text,duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'start_time': '2006-03-21T00:00:00+00:00', 'end_time': None, 'query': 'conversation_id:290902007206772736', 'max_results': 100}}
2022-01-01 10:14:24,117 INFO Retrieved an empty page of results.
2022-01-01 10:14:24,117 INFO No more results for search conversation_id:290902007206772736.
2022-01-01 11:23:27,550 INFO fetching conversation 290902009131966464
2022-01-01 11:23:27,551 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'media.fields': 'alt_text,duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'start_time': '2006-03-21T00:00:00+00:00', 'end_time': None, 'query': 'conversation_id:290902009131966464', 'max_results': 100}}
2022-01-01 11:23:28,605 INFO Retrieved an empty page of results.
2022-01-01 11:23:28,605 INFO No more results for search conversation_id:290902009131966464.
2022-01-01 11:23:28,607 INFO fetching conversation 290902010599964672

I’m also getting a lot of these warnings about "overlong sleep interval"s:

2022-01-01 15:53:56,688 WARNING Detected overlong sleep interval - is your system clock accurate? An accurate system time is needed to calculate how long to sleep for, and data collection might be slowed.
2022-01-01 15:53:56,688 WARNING rate limit exceeded: sleeping 901 secs
2022-01-01 16:08:57,693 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'media.fields': 'alt_text,duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'start_time': '2006-03-21T00:00:00+00:00', 'end_time': None, 'query': 'conversation_id:291011973909467136', 'max_results': 100}}
2022-01-01 16:08:57,795 INFO Retrieved an empty page of results.
2022-01-01 16:08:57,795 INFO No more results for search conversation_id:291011973909467136.
2022-01-01 16:08:57,795 INFO fetching conversation 291018004416839680
2022-01-01 16:08:57,796 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'media.fields': 'alt_text,duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'start_time': '2006-03-21T00:00:00+00:00', 'end_time': None, 'query': 'conversation_id:291018004416839680', 'max_results': 100}}
2022-01-01 16:08:57,820 WARNING Detected overlong sleep interval - is your system clock accurate? An accurate system time is needed to calculate how long to sleep for, and data collection might be slowed.
2022-01-01 16:08:57,820 WARNING rate limit exceeded: sleeping 901 secs
2022-01-01 16:23:58,834 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'media.fields': 'alt_text,duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'start_time': '2006-03-21T00:00:00+00:00', 'end_time': None, 'query': 'conversation_id:291018004416839680', 'max_results': 100}}
2022-01-01 16:23:58,967 INFO Retrieved an empty page of results.
2022-01-01 16:23:58,967 INFO No more results for search conversation_id:291018004416839680.
2022-01-01 16:23:58,968 INFO fetching conversation 291062323647500288
2022-01-01 16:23:58,968 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'media.fields': 'alt_text,duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'start_time': '2006-03-21T00:00:00+00:00', 'end_time': None, 'query': 'conversation_id:291062323647500288', 'max_results': 100}}
2022-01-01 16:23:58,992 WARNING Detected overlong sleep interval - is your system clock accurate? An accurate system time is needed to calculate how long to sleep for, and data collection might be slowed.
2022-01-01 16:23:58,992 WARNING rate limit exceeded: sleeping 901 secs

Any help would be appreciated as at this rate, I won’t put this dataset together within any reasonable amount of time.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:21 (11 by maintainers)

github_iconTop GitHub Comments

2reactions
jonlee112commented, Jan 5, 2022

Here’s the log after roughly 1000 requests. So far, only seeing the non-900 second warnings every 300 requests. Not seeing 10 second warnings. Not seeing back-to-back long sleeps.

So far looking good…😃 sample_log (twarc-dev1).log

2reactions
SamHamescommented, Jan 5, 2022

Oh wait, looking more closely at the log, it is definitely a bug, we’re just making too many calls… responses with no results are the most notable place, probably because they’re returning extremely fast.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Social conversation and its relationship to sleep behavior ...
In a 2-step mixed-methods design, this study explores conversations among college students about sleep and how these conversations relate to sleep-related ...
Read more >
Sleep, Alertness, and Fatigue Education in Residency ...
The goal of the SAFER program is to increase knowledge and awareness about sleep and fatigue among medical students and residents, and to...
Read more >
Building on Campaigns with Conversations - NCEMCH
This series of learning modules is designed for a range of health professionals, ... who interact with families on topics of safe sleep...
Read more >
Sleep Inconsistency and Markers of Inflammation - Frontiers
Objective: Poor sleep is associated with higher levels of inflammatory biomarkers. Conventionally, higher average time awake, lower average ...
Read more >
Module A HSC English Advanced - Textual Conversations ...
We'll use a handy example ( Sleeping Beauty!) to explain some key ideas in the Textual Conversations module. Along the way, engage with...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found