:290: UserWarning: Facebook served mbasic/noscript content unexpectedly on [url]
See original GitHub issueThanks for a fantastic library!
I have been having some issues downloading posts from groups. I got a TemporarilyBanned
exception a few days ago, and now it seems that every time I run my code, each post iteration gives me the following error:
:290: UserWarning: Facebook served mbasic/noscript content unexpectedly on [url]
This runs for a while, with what seems like each post giving the same error while no data gets extracted, until I received the TemporarilyBanned
exception again. I am logged in to Facebook with a cookie and the script is running on a Raspberry Pi running Raspbian.
Edit 1
Running a more simple version of the script works much better, so I assume my problem has something do with trying to re-run the script from the last reached post, which I handle with the request_url_callback
function and the resume_info.json
file.
Edit 2
Actually, it turns out removing the callback stuff had not effect on the problem. Removing the ‘cookie.json’ part though removes the Facebook served mbasic
-error, but still gives no data.
Edit 3
I realized I made a stupid syntax mistake and mananged to fix the error. I am still getting the warning, but I am also getting data now.
The following is my code:
from time import sleep
import json
from pathlib import Path
from hashlib import shake_256
import os
import facebook_scraper
from facebook_scraper import get_posts
def request_url_callback(url):
"""
Takes care of broken downloads by saving last cursor of specific
group_id to to a json file.
"""
resume_info[group_id] = url
with open('resume_info.json', 'w') as f:
json.dump(resume_info, f)
# Groups for download. Should be dct with keys="group name" and values="group id"
with open('groups.json', 'r') as f:
groups = json.load(f)
# Take care of download path
dl_path = Path('downloads')
dl_path.mkdir(exist_ok=True)
# Keep list of already done targets
if Path('done.txt').exists():
with open('done.txt', 'r') as f:
done = f.read().splitlines()
else:
done = list()
# Resume info
if Path('resume_info.json').exists():
with open('resume_info.json', 'r') as f:
resume_info = json.load(f)
else:
resume_info = dict()
# Main loop
for group_name, group_id in groups.items():
while True:
try:
if group_id in done:
break
print(group_name)
config = {
'group': group_id,
'pages': None,
'cookies': 'cookie.json',
'request_url_callback': request_url_callback,
'options': {'comments': True},
'start_url': getattr(resume_info, group_id, None)
}
posts = list()
for post in get_posts(**config):
keys = ['post_id', 'text', 'time',
'likes', 'comments', 'shares']
post_data = {key: post[key] for key in keys}
post_data['user_id'] = shake_256(
str(post['user_id']).encode('utf-8')).hexdigest(15)
if post["comments_full"]:
post_data["comments_text"] = [
{
"comment": com["comment_text"],
"replies": [reply["comment_text"] for reply in com.get("replies", list())],
}
for com in post["comments_full"]
]
posts.append(post_data)
print(getattr(post, 'time', None))
with open(f"downloads/{group_id}.json", 'w') as f:
json.dump(posts, f, default=str)
done.append(group_id)
with open(f"done.txt", 'a') as f:
f.write(group_id)
break
except facebook_scraper.exceptions.TemporarilyBanned:
print("Temporarily banned, sleeping for 10m")
sleep(600)
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (1 by maintainers)
Top GitHub Comments
The scraper will request additional pages of friends until it has at least the amount of friends you’ve asked for. As there’s several friends per page, you might get a few more than you asked for
@neon-ninja I am confused. I am not sure I know what exactly the “friends” parameter does in get_profile. Since if I pass “1” it returns the info of ~11 mutual friend of the user which is a friend of mine. So I am confused what is the difference between passing an integer number or True.