question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

:290: UserWarning: Facebook served mbasic/noscript content unexpectedly on [url]

See original GitHub issue

Thanks for a fantastic library!

I have been having some issues downloading posts from groups. I got a TemporarilyBanned exception a few days ago, and now it seems that every time I run my code, each post iteration gives me the following error:

:290: UserWarning: Facebook served mbasic/noscript content unexpectedly on [url]

This runs for a while, with what seems like each post giving the same error while no data gets extracted, until I received the TemporarilyBanned exception again. I am logged in to Facebook with a cookie and the script is running on a Raspberry Pi running Raspbian.

Edit 1 Running a more simple version of the script works much better, so I assume my problem has something do with trying to re-run the script from the last reached post, which I handle with the request_url_callback function and the resume_info.json file. Edit 2 Actually, it turns out removing the callback stuff had not effect on the problem. Removing the ‘cookie.json’ part though removes the Facebook served mbasic-error, but still gives no data. Edit 3 I realized I made a stupid syntax mistake and mananged to fix the error. I am still getting the warning, but I am also getting data now.

The following is my code:

from time import sleep
import json
from pathlib import Path
from hashlib import shake_256
import os
import facebook_scraper
from facebook_scraper import get_posts


def request_url_callback(url):
    """
    Takes care of broken downloads by saving last cursor of specific
    group_id to to a json file.
    """
    resume_info[group_id] = url
    with open('resume_info.json', 'w') as f:
        json.dump(resume_info, f)


# Groups for download. Should be dct with keys="group name" and values="group id"
with open('groups.json', 'r') as f:
    groups = json.load(f)

# Take care of download path
dl_path = Path('downloads')
dl_path.mkdir(exist_ok=True)

# Keep list of already done targets
if Path('done.txt').exists():
    with open('done.txt', 'r') as f:
        done = f.read().splitlines()
else:
    done = list()

# Resume info
if Path('resume_info.json').exists():
    with open('resume_info.json', 'r') as f:
        resume_info = json.load(f)
else:
    resume_info = dict()

# Main loop
for group_name, group_id in groups.items():
    while True:
        try:
            if group_id in done:
                break
            print(group_name)

            config = {
                'group': group_id,
                'pages': None,
                'cookies': 'cookie.json',
                'request_url_callback': request_url_callback,
                'options': {'comments': True},
                'start_url': getattr(resume_info, group_id, None)
            }

            posts = list()
            for post in get_posts(**config):
                keys = ['post_id', 'text', 'time',
                        'likes', 'comments', 'shares']
                post_data = {key: post[key] for key in keys}
                post_data['user_id'] = shake_256(
                    str(post['user_id']).encode('utf-8')).hexdigest(15)
                if post["comments_full"]:
                    post_data["comments_text"] = [
                        {
                            "comment": com["comment_text"],
                            "replies": [reply["comment_text"] for reply in com.get("replies", list())],
                        }
                        for com in post["comments_full"]
                    ]
                posts.append(post_data)
                print(getattr(post, 'time', None))
            with open(f"downloads/{group_id}.json", 'w') as f:
                json.dump(posts, f, default=str)
            done.append(group_id)
            with open(f"done.txt", 'a') as f:
                f.write(group_id)
            break
        except facebook_scraper.exceptions.TemporarilyBanned:
            print("Temporarily banned, sleeping for 10m")
            sleep(600)

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
neon-ninjacommented, Jul 1, 2021

The scraper will request additional pages of friends until it has at least the amount of friends you’ve asked for. As there’s several friends per page, you might get a few more than you asked for

0reactions
LightMooncommented, Jul 1, 2021

@neon-ninja I am confused. I am not sure I know what exactly the “friends” parameter does in get_profile. Since if I pass “1” it returns the info of ~11 mutual friend of the user which is a friend of mine. So I am confused what is the difference between passing an integer number or True.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Unable to collect posts beyond a certain number due ... - GitHub
Yes, you can continue from a pagination url, by passing the url as ... UserWarning: Facebook served mbasic/noscript content unexpectedly on ...
Read more >
Troubleshoot Link Failures | Meta Business Help Center
A "link failure(s)" status means that when people click on the website link included in your ad they may have been redirected to...
Read more >
My url is blocked for not following community guideline
I've used the debugger, put Fb DNS txt in my DNS, Deleted my website content, check if my domain was blacklisted. Nothing. There...
Read more >
About URL parameters | Meta Business Help Center - Facebook
Learn more about how to use URL parameters to identify where your ad traffic is coming from and understand the effectiveness of your...
Read more >
Facebook Help Center
How do I get a link (URL) to report a piece of content? To show us where the abusive content is located on...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found