Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

:290: UserWarning: Facebook served mbasic/noscript content unexpectedly on [url]

See original GitHub issue

Thanks for a fantastic library!

I have been having some issues downloading posts from groups. I got a TemporarilyBanned exception a few days ago, and now it seems that every time I run my code, each post iteration gives me the following error:

:290: UserWarning: Facebook served mbasic/noscript content unexpectedly on [url]

This runs for a while, with what seems like each post giving the same error while no data gets extracted, until I received the TemporarilyBanned exception again. I am logged in to Facebook with a cookie and the script is running on a Raspberry Pi running Raspbian.

Edit 1 Running a more simple version of the script works much better, so I assume my problem has something do with trying to re-run the script from the last reached post, which I handle with the request_url_callback function and the resume_info.json file. Edit 2 Actually, it turns out removing the callback stuff had not effect on the problem. Removing the ‘cookie.json’ part though removes the Facebook served mbasic-error, but still gives no data. Edit 3 I realized I made a stupid syntax mistake and mananged to fix the error. I am still getting the warning, but I am also getting data now.

The following is my code:

from time import sleep
import json
from pathlib import Path
from hashlib import shake_256
import os
import facebook_scraper
from facebook_scraper import get_posts


def request_url_callback(url):
    """
    Takes care of broken downloads by saving last cursor of specific
    group_id to to a json file.
    """
    resume_info[group_id] = url
    with open('resume_info.json', 'w') as f:
        json.dump(resume_info, f)


# Groups for download. Should be dct with keys="group name" and values="group id"
with open('groups.json', 'r') as f:
    groups = json.load(f)

# Take care of download path
dl_path = Path('downloads')
dl_path.mkdir(exist_ok=True)

# Keep list of already done targets
if Path('done.txt').exists():
    with open('done.txt', 'r') as f:
        done = f.read().splitlines()
else:
    done = list()

# Resume info
if Path('resume_info.json').exists():
    with open('resume_info.json', 'r') as f:
        resume_info = json.load(f)
else:
    resume_info = dict()

# Main loop
for group_name, group_id in groups.items():
    while True:
        try:
            if group_id in done:
                break
            print(group_name)

            config = {
                'group': group_id,
                'pages': None,
                'cookies': 'cookie.json',
                'request_url_callback': request_url_callback,
                'options': {'comments': True},
                'start_url': getattr(resume_info, group_id, None)
            }

            posts = list()
            for post in get_posts(**config):
                keys = ['post_id', 'text', 'time',
                        'likes', 'comments', 'shares']
                post_data = {key: post[key] for key in keys}
                post_data['user_id'] = shake_256(
                    str(post['user_id']).encode('utf-8')).hexdigest(15)
                if post["comments_full"]:
                    post_data["comments_text"] = [
                        {
                            "comment": com["comment_text"],
                            "replies": [reply["comment_text"] for reply in com.get("replies", list())],
                        }
                        for com in post["comments_full"]
                    ]
                posts.append(post_data)
                print(getattr(post, 'time', None))
            with open(f"downloads/{group_id}.json", 'w') as f:
                json.dump(posts, f, default=str)
            done.append(group_id)
            with open(f"done.txt", 'a') as f:
                f.write(group_id)
            break
        except facebook_scraper.exceptions.TemporarilyBanned:
            print("Temporarily banned, sleeping for 10m")
            sleep(600)

Issue Analytics

State:
Created 2 years ago
Comments:6 (1 by maintainers)

Top GitHub Comments

1reaction

neon-ninjacommented, Jul 1, 2021

The scraper will request additional pages of friends until it has at least the amount of friends you’ve asked for. As there’s several friends per page, you might get a few more than you asked for

0reactions

LightMooncommented, Jul 1, 2021

@neon-ninja I am confused. I am not sure I know what exactly the “friends” parameter does in get_profile. Since if I pass “1” it returns the info of ~11 mutual friend of the user which is a friend of mine. So I am confused what is the difference between passing an integer number or True.