question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to scrape posts for large IG accounts in python and avoid 429 Too Many Requests

See original GitHub issue

I am trying to download the most recent 10 posts from profiles on Instagram using the python Instaloader package. Some of these profiles are quite large and have a lot of likes and comments. For these posts, I keep getting a 429: Too many requests error. I understand that Instagram has a limit of 200 requests per hour and have read up on Instaloaders troubleshooting page here, as well as scoured the depths of Github issues including issue 774, 1006 , 944, 802, 822, etc. and Stackoverflow and unfortunately I’m having trouble finding a solution in general and especially for python, as it seems most people use this tool on the command line.

First, I understand instagram’s limit is 200 requests per hour. What does this mean? What constitutes a request? If a post has 2,000 likes, is each like a request, meaning it would take 10 hours to get 2,000 likes without exceeding the 429 limit?

Second, I am wondering how I can most efficiently abide by these limitations. I have been trying to use the RateController to set custom scraping intervals that will abide by it such as the code block below, but I keep getting a 429.

`class MyRateController(RateController):

def sleep(self, secs:30): wait_time=random.uniform(30, 120) time.sleep(wait_time)

def count_per_sliding_window(self, query_type): return 20`

To instantiate the class, I call: L = Instaloader(rate_controller=lambda ctx: MyRateController(ctx))

My implementation makes sense to me because the documentation says that the count_per_sliding_window “return[s] how many requests of the given type can be done within a sliding window of 11 minutes.” So, if I set the value to be 20, this looks to me to be 20 requests every 11 minutes, or about 110 requests an hour, which is less than 200. Unfortunately I still get continuous errors as shown below: `Too many queries in the last time. Need to wait 614 seconds, until 22:31.

Too many queries in the last time. Need to wait 566 seconds, until 22:31.

Too many queries in the last time. Need to wait 488 seconds, until 22:31.

Too many queries in the last time. Need to wait 426 seconds, until 22:31.`

Generally my intended process is the sequence below:

  • create instaloader instance
  • load session file
  • profile.get_posts()
  • for post in posts:
    • post.get_likes()
    • post.get_comments()

I also commonly get the error message shown below instead of the one previously shown: `JSON Query to graphql/query: 429 Too Many Requests [retrying; skip with ^C] Number of requests within last 10/11/20/22/30/60 minutes grouped by type: other: 1 1 1 1 1 1 37479f2b8209594dde7facb0d904896a: 1 1 1 1 1 1 2b0673e0dc4580674a88d426fe00ea90: 1 1 1 1 1 1

  • 1cb6ec562846122743b61e492c85999f: 1 1 1 1 1 1 Instagram responded with HTTP error “429 - Too Many Requests”. Please do not run multiple instances of Instaloader in parallel or within short sequence. Also, do not use any Instagram App while Instaloader is running. The request will be retried in 666 seconds, at 18:02.`

Ultimately my goals is to answer two questions:

  1. What constitutes a request using this API?
  2. How can I retrieve data for likes and comments for instagrams with a lot of engaged followers in Python? I The pseudocode I showed above works well for smaller users, but I keep getting 429 errors for larger users. I would love to find a way to programmatically grab the largest chunk of likes/comments possible at a time, then wait the minimum amount of time before grabbing again, and so on until I have all of the information I need.

I sincerely apologize for my poor formatting of code on here, this is the first time I’ve posted a github issue. Please let me know if anything can be clarified further. Thank you so much for your time! @aandergr @Thammus

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:4
  • Comments:20

github_iconTop GitHub Comments

1reaction
m-sundmancommented, Jul 28, 2022

Huh? I see the code, but I don’t see any hint of which file this code should be inside, or how it integrates with instaloader. It’s obviously not a standalone python program, and it doesn’t seem to import anything from instaloader.

1reaction
Aditya-Rajgorcommented, May 30, 2022

@estatistics Sorry for answering after so long. Here’s the code which I have used!

import time  # for adding random delay between requests
import random
import tqdm  # for keeping track of the progress
import itertools
from concurrent.futures import ThreadPoolExecutor 

def thread_executor(l, threads=30, chunk=100):
    """l is list, threads is total number of threads and chunk is the value which we wants to sleep between"""

    global result_whole
    with ThreadPoolExecutor(max_workers=threads) as exe, tqdm.tqdm(total=len(l)) as prog:
        result_whole = []
        start = time.time()
        for chunk in chunker(chunk, l):
            start_chunk = time.time()
            # sleeping for random delay of 200 to 400 seconds
            sl = round(random.uniform(30, 50), 3)
            print('sleeping for', sl)
            time.sleep(sl)

            result = exe.map(get_user_meta, chunk, [prog] * len(chunk)) # get user_meta is the function which gets the profile details and returns a response which is then being appended in result_whole
            result_whole.extend(list(result))

            end_chunk = time.time()

            print('chunk time ', end_chunk - start_chunk, "seconds")

        end = time.time()
        print(f"taken time for {len(l)} profiles", end - start, "seconds")
Read more comments on GitHub >

github_iconTop Results From Across the Web

How to avoid Instagram API error '429 too many requests'
I have an hard problem to solve: I have a list of Instagram users and I need to extract the list of 'following'...
Read more >
[DISCONTINUED] Scraping Instagram in 2021: avoiding 302 ...
Apparently, all datacenter ip ranges have been banned by Instagram. Issues about 302 and 429 errors are created on Github issue queues almost ......
Read more >
Advanced Python Web Scraping: Best Practices & Workarounds
Here are some helpful tips for web scraping with Python. ... If the server is receiving too many requests within a timeframe from...
Read more >
Bypass Rate Limit While Web Scraping Like a Pro - ZenRows
When your scraper exceeds the rate limit, the web server responds with 429: Too Many Requests . There are different rate-limiting methods.
Read more >
How to scrape Instagram posts, comments, and photos
When you are on the actor's page, click the Try for free button. If you already have an Apify account, you will be...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found