[Question] Do you plan on adding get_profile to get the About section but for pages as well?
See original GitHub issueI am using this library for a research project and I want to thank you for all your hard work 😃
I have a few URLs that I wish to get their profile info (About Section). The problem is that I don’t know what type of User the URL belongs to (A page or regular user). Thankfully using your get_posts function I can get the user_id, but if it’s a page, I can’t use your get_profile for it.
It works fine for User accounts.
The format I am passing to get_posts is:
"unique name of the page"/posts/"post id"
Other questions:
- I am also trying to get the user_id from a post in the following form:
https://www.facebook.com/photo.php?fbid=495386170902530&set=pb.100012934538675.-2207520000..&type=3
But it tells me:
HTTPError: 404 Client Error: Not Found for url: https://m.facebook.com/photo.php?fbid=495386170902530&set=pb.100012934538675.-2207520000..&type=3/
I am just using the code from the examples.
Example of my code:
for post in get_posts("elisamartinezfuentes/posts/1190292224745251", cookies=cookie_path, options={"allow_extra_requests": True}):
account = post['user_id']
print(get_profile(account, cookies=cookie_path))
TL;DR: I need to get the info on the public about section for Pages/Groups/Users
I apologize if this is not the correct format for an issue.
Issue Analytics
- State:
- Created 2 years ago
- Comments:9
Top GitHub Comments
What information are you trying to extract from a page’s about section that
get_page_info
doesn’t extract? Do you have a sample page where it doesn’t work? I’ve mostly just tested it with Nintendo’s page.With directly extracting a post, you should signify to the scraper that the URL represents one post so that it doesn’t try to paginate. The argument for this is post_urls. This code:
Outputs:
This function works by finding a sample post and extracting the JSON-LD, which seems to vary depending on the post. Try this - https://github.com/kevinzg/facebook-scraper/commit/58629bc6f5474a3e59a2a90eff3aca439296e8ca
outputs