issue with cookies/ facebook language
See original GitHub issuehello!
I had quite a weird issue: your script would scrape any group, however it would be very slow: in every retrieved post field, a lot of unnecessary text would get sucked in for some reason. For example, in the field post_text, you would get the post text first, but then also the text of the next post and of the next 30 posts, the group name, and pretty much all the text that figures on the webpage (+ a lot of human-unreadable text).
I actually resolved it by going to Facebook and setting the language to english (US) and re-exporting the cookies. I have absolutely no idea why, but it worked!
Thought I’d share! In case anyone runs into the issue.
EDIT:
actually, it did not fix the issue… Still getting a lot of extra text.
As an example: when scraping group 712266335452208, this is the text_post output of the first post:
Please report all non-room related posts
We can keep it clean togetherPlease report all non-room related posts
We can keep it clean together
Like
13
2 Comments
Like
Show more reactions
Comment
Share
Top posts
View TimelineAdd to GroupInvite to Event
Cancel
Where are you?
Cancel
Detecting location...
There was an error detecting your current location.
Please make sure Location Services is enabled in your browser, and that facebook.com has permission to use them. You can still search for a place, but the search will not be as accurate.
Add a Place
Required
Category
I am currently at this location
Places are public and can be seen by anyone. If you add a place with someone's personal information, ask them first.
Add
Cancel
Who are you with?
Done
Cancel
What are you doing?
Cancel
Done
Tag Photo
Done
Cancel
Zoekt kamer in Amsterdam Community
Post
Please add content to your status before posting
Write Something
Please include a written review with your ratingYou’re doing great!
Loading preview...
×
With
At
×
Photo
Zoekt kamer in Amsterdam Community
Posting something for sale?
Make this a sale post to highlight important info, such as price and photos.
Post item for sale
Post
(function(){var n=now_inl();requireLazy(["__bigPipe"],function(bigPipe){bigPipe.beforePageletArrive("MRoot",n);})})(); requireLazy(["__bigPipe"],(function(bigPipe){bigPipe.onPageletArrive({displayResources:["c4ExI9N","1eyqQBr",
...
Issue Analytics
- State:
- Created 2 years ago
- Comments:17
Top GitHub Comments
OK, i resolved the issue! It was painfully simple… I reinstalled the requirements.txt and now all runs fine. I was under the impression that cloning from github ran the requirements.txt file automatically? I have issues all the time with using both conda and pip, i guess that didn’t help here.
Regardless, thank you so much for your help!
on my browser, 2 facebook account are remembered. Maybe that causes interference? I will try to remove the cookies, log in and try again. Ill let you know what happens
EDIT: Doesnt solve the issue, unfortunately