Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to collect posts beyond a certain number due to Temporary Block

See original GitHub issue

Hi,

First of all, this is a really great tool, so thank you very much for your work! I want to scrape some private groups. However, every time I’m trying, I get the message You are Temporarily Blocked after scraping from 100 posts up to 9000 posts, even thought the group I’m trying to scrape has way more posts. I have tested alt accounts too. Is there any possible solution to my problem so that Facebook don’t block me every time so quick? Or if there is a way I can continue from where I left off because I was blocked? Furthermore, I’m using "allow_extra_requests": True since I want to download all photos to max quality. Could you add get_photos for groups to speed up scraping or is there any other way I could get the link of the first photo (at max quality) of every post faster without using allow_extra_requests which is slow?

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:18 (2 by maintainers)

Top GitHub Comments

4reactions

neon-ninjacommented, Jun 13, 2021

Increasing posts_per_page might help, as then you’d make fewer requests. Adding some time.sleep lines might help reduce the rate at which you’re making requests. Yes, you can continue from a pagination url, by passing the url as the start_url argument to get_posts. These pagination URLs can be seen in the logs if you have debug logging enabled, or you can pass a callback function as request_url_callback to get_posts to handle extracting these pagination urls. Here’s some sample code:

import time
from facebook_scraper import *

results = []
start_url = None
def handle_pagination_url(url):
    global start_url
    start_url = url
set_cookies("cookies.txt")
while True:
    try:
        for post in get_posts("Nintendo", page_limit=None, start_url=start_url, request_url_callback=handle_pagination_url):
            print(len(results))
            results.append(post)
        print("All done")
        break
    except exceptions.TemporarilyBanned:
        print("Temporarily banned, sleeping for 10m")
        time.sleep(600)

Note: https://github.com/kevinzg/facebook-scraper/commit/f3c8948ae04414932899686c89e696306f37ce1f simplifies this code a bit by making it possible to pass a start_url of None.

AFAIK, facebook only provides the high resolution image URL if you click on the photo, which involves an extra request for each photo.

2reactions

neon-ninjacommented, Jun 2, 2021

Probably nothing to worry about, so long as that image URL extraction worked