Scrape does not get full post when there is 2 layers of <See more...>
See original GitHub issue@neon-ninja When a post have long text or post_text with ‘double layer’ of ‘See more’ that need to be clicked, extractor only manage to get the first layer. What i had test: facebook-scraper==0.2.42 from git-master
- Using 2 different accounts (with 2 different cookies) in chrome and also firefox. I used EditThisCookie in chrome and Cookie Quick Manager in firefox
- Using both windows CLI and also from .py
- WIth --encoding utf-8 and without encoding.
For cli i used this code :
facebook-scraper --filename najibFullPost1.csv --pages 5 najibrazak -c C:\\Users\\insane\\Desktop\\NajibRazak\\cookies.json -v --encoding utf-8
the output for 1 layer of See more is fine. But if there is two layers it will only capture the first layer :
1 Layer output
2 layer output
I have read about others that been facing this issues but none seems to solve this problem.
by using
>>> from facebook_scraper import get_posts, enable_logging
>>> import logging
>>> import pprint
>>> enable_logging(logging.DEBUG)
>>> for post in get_posts(post_urls=[10157944979490952]):
... print(post['text'])
...
it will return correct post value, but not if in cli with username.
side note : i have a problem that the output file is printing empty space between each record (row). I fixed it by adding
newline=''
with open(filename, 'w', encoding=encoding, newline='') as output_file: dict_writer = csv.DictWriter(output_file, keys) dict_writer.writeheader() dict_writer.writerows(list_of_posts)
Issue Analytics
- State:
- Created 2 years ago
- Comments:23
Top GitHub Comments
Ok, I think I see the problem. For me, the HTML is
but for you, it’s
which
(?<=…\s)<a href="([^"]+)
does not match, as data-gt is preceding the href. This regex can be simplified - try this - https://github.com/kevinzg/facebook-scraper/commit/e7b2a50cb39ecccd66d43e0a8ff66b65f9e75311Git master