Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to scrape posts for large IG accounts in python and avoid 429 Too Many Requests

See original GitHub issue

I am trying to download the most recent 10 posts from profiles on Instagram using the python Instaloader package. Some of these profiles are quite large and have a lot of likes and comments. For these posts, I keep getting a 429: Too many requests error. I understand that Instagram has a limit of 200 requests per hour and have read up on Instaloaders troubleshooting page here, as well as scoured the depths of Github issues including issue 774, 1006 , 944, 802, 822, etc. and Stackoverflow and unfortunately I’m having trouble finding a solution in general and especially for python, as it seems most people use this tool on the command line.

First, I understand instagram’s limit is 200 requests per hour. What does this mean? What constitutes a request? If a post has 2,000 likes, is each like a request, meaning it would take 10 hours to get 2,000 likes without exceeding the 429 limit?

Second, I am wondering how I can most efficiently abide by these limitations. I have been trying to use the RateController to set custom scraping intervals that will abide by it such as the code block below, but I keep getting a 429.

`class MyRateController(RateController):

def sleep(self, secs:30): wait_time=random.uniform(30, 120) time.sleep(wait_time)

def count_per_sliding_window(self, query_type): return 20`

To instantiate the class, I call: L = Instaloader(rate_controller=lambda ctx: MyRateController(ctx))

My implementation makes sense to me because the documentation says that the count_per_sliding_window “return[s] how many requests of the given type can be done within a sliding window of 11 minutes.” So, if I set the value to be 20, this looks to me to be 20 requests every 11 minutes, or about 110 requests an hour, which is less than 200. Unfortunately I still get continuous errors as shown below: `Too many queries in the last time. Need to wait 614 seconds, until 22:31.

Too many queries in the last time. Need to wait 566 seconds, until 22:31.

Too many queries in the last time. Need to wait 488 seconds, until 22:31.

Too many queries in the last time. Need to wait 426 seconds, until 22:31.`

Generally my intended process is the sequence below:

create instaloader instance
load session file
profile.get_posts()
for post in posts:
- post.get_likes()
- post.get_comments()

I also commonly get the error message shown below instead of the one previously shown: `JSON Query to graphql/query: 429 Too Many Requests [retrying; skip with ^C] Number of requests within last 10/11/20/22/30/60 minutes grouped by type: other: 1 1 1 1 1 1 37479f2b8209594dde7facb0d904896a: 1 1 1 1 1 1 2b0673e0dc4580674a88d426fe00ea90: 1 1 1 1 1 1

1cb6ec562846122743b61e492c85999f: 1 1 1 1 1 1 Instagram responded with HTTP error “429 - Too Many Requests”. Please do not run multiple instances of Instaloader in parallel or within short sequence. Also, do not use any Instagram App while Instaloader is running. The request will be retried in 666 seconds, at 18:02.`

Ultimately my goals is to answer two questions:

What constitutes a request using this API?
How can I retrieve data for likes and comments for instagrams with a lot of engaged followers in Python? I The pseudocode I showed above works well for smaller users, but I keep getting 429 errors for larger users. I would love to find a way to programmatically grab the largest chunk of likes/comments possible at a time, then wait the minimum amount of time before grabbing again, and so on until I have all of the information I need.

I sincerely apologize for my poor formatting of code on here, this is the first time I’ve posted a github issue. Please let me know if anything can be clarified further. Thank you so much for your time! @aandergr @Thammus

Issue Analytics

State:
Created 2 years ago
Reactions:4
Comments:20

Top GitHub Comments

1reaction

m-sundmancommented, Jul 28, 2022

Huh? I see the code, but I don’t see any hint of which file this code should be inside, or how it integrates with instaloader. It’s obviously not a standalone python program, and it doesn’t seem to import anything from instaloader.

1reaction

Aditya-Rajgorcommented, May 30, 2022

@estatistics Sorry for answering after so long. Here’s the code which I have used!

import time  # for adding random delay between requests
import random
import tqdm  # for keeping track of the progress
import itertools
from concurrent.futures import ThreadPoolExecutor 

def thread_executor(l, threads=30, chunk=100):
    """l is list, threads is total number of threads and chunk is the value which we wants to sleep between"""

    global result_whole
    with ThreadPoolExecutor(max_workers=threads) as exe, tqdm.tqdm(total=len(l)) as prog:
        result_whole = []
        start = time.time()
        for chunk in chunker(chunk, l):
            start_chunk = time.time()
            # sleeping for random delay of 200 to 400 seconds
            sl = round(random.uniform(30, 50), 3)
            print('sleeping for', sl)
            time.sleep(sl)

            result = exe.map(get_user_meta, chunk, [prog] * len(chunk)) # get user_meta is the function which gets the profile details and returns a response which is then being appended in result_whole
            result_whole.extend(list(result))

            end_chunk = time.time()

            print('chunk time ', end_chunk - start_chunk, "seconds")

        end = time.time()
        print(f"taken time for {len(l)} profiles", end - start, "seconds")

Top Results From Across the Web

How to avoid Instagram API error '429 too many requests'

I have an hard problem to solve: I have a list of Instagram users and I need to extract the list of 'following'...

[DISCONTINUED] Scraping Instagram in 2021: avoiding 302 ...

Apparently, all datacenter ip ranges have been banned by Instagram. Issues about 302 and 429 errors are created on Github issue queues almost ......

Advanced Python Web Scraping: Best Practices & Workarounds

Here are some helpful tips for web scraping with Python. ... If the server is receiving too many requests within a timeframe from...

Bypass Rate Limit While Web Scraping Like a Pro - ZenRows

When your scraper exceeds the rate limit, the web server responds with 429: Too Many Requests . There are different rate-limiting methods.

How to scrape Instagram posts, comments, and photos

When you are on the actor's page, click the Try for free button. If you already have an Apify account, you will be...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

How to scrape posts for large IG accounts in python and avoid 429 Too Many Requests

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Best way to avoid time-outs when downloading multiple "post types" (--igtv, --highlights, --story)

2FA not working