How to scrape posts for large IG accounts in python and avoid 429 Too Many Requests
See original GitHub issueI am trying to download the most recent 10 posts from profiles on Instagram using the python Instaloader package. Some of these profiles are quite large and have a lot of likes and comments. For these posts, I keep getting a 429: Too many requests error. I understand that Instagram has a limit of 200 requests per hour and have read up on Instaloaders troubleshooting page here, as well as scoured the depths of Github issues including issue 774, 1006 , 944, 802, 822, etc. and Stackoverflow and unfortunately I’m having trouble finding a solution in general and especially for python, as it seems most people use this tool on the command line.
First, I understand instagram’s limit is 200 requests per hour. What does this mean? What constitutes a request? If a post has 2,000 likes, is each like a request, meaning it would take 10 hours to get 2,000 likes without exceeding the 429 limit?
Second, I am wondering how I can most efficiently abide by these limitations. I have been trying to use the RateController to set custom scraping intervals that will abide by it such as the code block below, but I keep getting a 429.
`class MyRateController(RateController):
def sleep(self, secs:30): wait_time=random.uniform(30, 120) time.sleep(wait_time)
def count_per_sliding_window(self, query_type): return 20`
To instantiate the class, I call:
L = Instaloader(rate_controller=lambda ctx: MyRateController(ctx))
My implementation makes sense to me because the documentation says that the count_per_sliding_window “return[s] how many requests of the given type can be done within a sliding window of 11 minutes.” So, if I set the value to be 20, this looks to me to be 20 requests every 11 minutes, or about 110 requests an hour, which is less than 200. Unfortunately I still get continuous errors as shown below: `Too many queries in the last time. Need to wait 614 seconds, until 22:31.
Too many queries in the last time. Need to wait 566 seconds, until 22:31.
Too many queries in the last time. Need to wait 488 seconds, until 22:31.
Too many queries in the last time. Need to wait 426 seconds, until 22:31.`
Generally my intended process is the sequence below:
- create instaloader instance
- load session file
- profile.get_posts()
- for post in posts:
- post.get_likes()
- post.get_comments()
I also commonly get the error message shown below instead of the one previously shown: `JSON Query to graphql/query: 429 Too Many Requests [retrying; skip with ^C] Number of requests within last 10/11/20/22/30/60 minutes grouped by type: other: 1 1 1 1 1 1 37479f2b8209594dde7facb0d904896a: 1 1 1 1 1 1 2b0673e0dc4580674a88d426fe00ea90: 1 1 1 1 1 1
- 1cb6ec562846122743b61e492c85999f: 1 1 1 1 1 1 Instagram responded with HTTP error “429 - Too Many Requests”. Please do not run multiple instances of Instaloader in parallel or within short sequence. Also, do not use any Instagram App while Instaloader is running. The request will be retried in 666 seconds, at 18:02.`
Ultimately my goals is to answer two questions:
- What constitutes a request using this API?
- How can I retrieve data for likes and comments for instagrams with a lot of engaged followers in Python? I The pseudocode I showed above works well for smaller users, but I keep getting 429 errors for larger users. I would love to find a way to programmatically grab the largest chunk of likes/comments possible at a time, then wait the minimum amount of time before grabbing again, and so on until I have all of the information I need.
I sincerely apologize for my poor formatting of code on here, this is the first time I’ve posted a github issue. Please let me know if anything can be clarified further. Thank you so much for your time! @aandergr @Thammus
Issue Analytics
- State:
- Created 2 years ago
- Reactions:4
- Comments:20
Huh? I see the code, but I don’t see any hint of which file this code should be inside, or how it integrates with instaloader. It’s obviously not a standalone python program, and it doesn’t seem to import anything from instaloader.
@estatistics Sorry for answering after so long. Here’s the code which I have used!