Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Scrape does not get full post when there is 2 layers of <See more...>

See original GitHub issue

@neon-ninja When a post have long text or post_text with ‘double layer’ of ‘See more’ that need to be clicked, extractor only manage to get the first layer. What i had test: facebook-scraper==0.2.42 from git-master

Using 2 different accounts (with 2 different cookies) in chrome and also firefox. I used EditThisCookie in chrome and Cookie Quick Manager in firefox
Using both windows CLI and also from .py
WIth --encoding utf-8 and without encoding.

For cli i used this code : facebook-scraper --filename najibFullPost1.csv --pages 5 najibrazak -c C:\\Users\\insane\\Desktop\\NajibRazak\\cookies.json -v --encoding utf-8

the output for 1 layer of See more is fine. But if there is two layers it will only capture the first layer :

1 Layer output

2 layer output

I have read about others that been facing this issues but none seems to solve this problem.

by using

>>> from facebook_scraper import get_posts, enable_logging
>>> import logging
>>> import pprint
>>> enable_logging(logging.DEBUG)
>>> for post in get_posts(post_urls=[10157944979490952]):
...     print(post['text'])
...

it will return correct post value, but not if in cli with username.

side note : i have a problem that the output file is printing empty space between each record (row). I fixed it by adding newline=''

with open(filename, 'w', encoding=encoding, newline='') as output_file: dict_writer = csv.DictWriter(output_file, keys) dict_writer.writeheader() dict_writer.writerows(list_of_posts)

Issue Analytics

State:
Created 2 years ago
Comments:23

Top GitHub Comments

1reaction

neon-ninjacommented, Jun 14, 2021

Ok, I think I see the problem. For me, the HTML is

<p> 1. i-Sinar dan i-Lestari juga… <a href="/story.php?story_fbid=10157944979490952&amp;id=157851205951&amp;_ft_=mf_story_key.10157944979490952%3Atop_level_post_id.10157944979490952%3Atl_objid.10157944979490952%3Acontent_owner_id_new.157851205951%3Athrowback_story_fbid.10157944979490952%3Apage_id.157851205951%3Astory_location.4%3Astory_attachment_style.photo%3Atds_flgs.3%3Aott.AX-KtQoVMZIEDTeL&amp;__tn__=%2C%3B" data-gt="{&quot;tn&quot;:&quot;,;&quot;}">More</a></p>

but for you, it’s

<p>
       1. i-Sinar dan i-Lestari juga…
       <a data-gt="{&quot;tn&quot;:&quot;,;&quot;}" href="/story.php?story_fbid=10157944979490952&amp;id=157851205951&amp;_ft_=mf_story_key.10157944979490952%3Atop_level_post_id.10157944979490952%3Atl_objid.10157944979490952%3Acontent_owner_id_new.157851205951%3Athrowback_story_fbid.10157944979490952%3Apage_id.157851205951%3Astory_location.4%3Astory_attachment_style.photo%3Atds_flgs.3%3Aott.AX-KtQoVMZIEDTeL&amp;__tn__=%2C%3B">
        More
       </a>
      </p>

which (?<=…\s)<a href="([^"]+) does not match, as data-gt is preceding the href. This regex can be simplified - try this - https://github.com/kevinzg/facebook-scraper/commit/e7b2a50cb39ecccd66d43e0a8ff66b65f9e75311

1reaction

neon-ninjacommented, Jun 14, 2021

Git master

Top Results From Across the Web

Python multi layer web scraping [closed] - Stack Overflow

1 Answer 1 ... YOu can find all the <a> tags with href and pull those into a list. Then just iterate over...

Web Scraping without getting blocked | ScrapingBee

This post will guide you through all the tools websites use to block you and all the ways you can successfully overcome these...

Data Scraping - multi layer? - Help - UiPath Community Forum

Hi @ghdunn, Welcome to the Community! Data scraping can only extract data that is currently loaded/available.

Mohs Surgery - The Skin Cancer Foundation

Mohs surgery is considered the most effective technique for treating the two most common types of skin cancer. Learn more about the procedure....

The Cuticle – Should You Clip, Push, or Scrape? - Bliss Kiss

Most people can't see their cuticle since the skin is so thin, but this photo captured it perfectly. Breaking the Cuticle's Grasp –...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Scrape does not get full post when there is 2 layers of <See more...>

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Using pagination URLs returns always the same posts

[Question] Do you plan on adding get_profile to get the About section but for pages as well?