Using pagination URLs returns always the same posts
See original GitHub issueHi again! I’m testing the start_url and request_url_callback params to continue scraping data if I’m temporarily banned, but it seems like it always returns the same posts after a ban. For example, using this
import time
from facebook_scraper import *
start_url = None
def handle_pagination_url(url):
start_url = url
set_cookies("cookies.json")
row = 1
while True:
try:
for post in get_posts("Nintendo", pages = None, start_url=start_url, request_url_callback=handle_pagination_url, options={"allow_extra_requests": False, "comments": False, "reactors": False, "posts_per_page": 200}, timeout=120):
if(row % 9 == 0):
raise exceptions.TemporarilyBanned
print(post.get('post_url', '0'))
row += 1
print("All done")
break
except exceptions.TemporarilyBanned:
row += 1
print("Temporarily banned, sleeping for 1")
time.sleep(2)
I always get the same posts again and again with different order. I have the same error in every group or page. However, in groups, posts are with different order.
https://facebook.com/Nintendo/posts/4217919734959114
https://facebook.com/Nintendo/posts/4217774244973663
https://facebook.com/Nintendo/posts/4214518608632560
https://facebook.com/Nintendo/posts/4214033132014441
https://facebook.com/Nintendo/posts/4194934713924283
https://facebook.com/Nintendo/posts/4193752747375813
https://facebook.com/Nintendo/posts/4191173794300375
https://facebook.com/Nintendo/posts/4188174317933656
Temporarily banned, sleeping for 1
https://facebook.com/Nintendo/posts/4217919734959114
https://facebook.com/Nintendo/posts/4217774244973663
https://facebook.com/Nintendo/posts/4214518608632560
https://facebook.com/Nintendo/posts/4214033132014441
https://facebook.com/Nintendo/posts/4194934713924283
https://facebook.com/Nintendo/posts/4193752747375813
https://facebook.com/Nintendo/posts/4191173794300375
https://facebook.com/Nintendo/posts/4188174317933656
Temporarily banned, sleeping for 1
https://facebook.com/Nintendo/posts/4217919734959114
https://facebook.com/Nintendo/posts/4217774244973663
https://facebook.com/Nintendo/posts/4214518608632560
https://facebook.com/Nintendo/posts/4214033132014441
https://facebook.com/Nintendo/posts/4194934713924283
https://facebook.com/Nintendo/posts/4193752747375813
https://facebook.com/Nintendo/posts/4191173794300375
https://facebook.com/Nintendo/posts/4188174317933656
Thanks in advance!
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (2 by maintainers)
Top Results From Across the Web
Post Pagination Showing Same Posts Every Page
With the paged argument set, it'll return the appropriate posts depending on the page number (the first x posts for page 1, the...
Read more >Wordpress pagination always returns same 3 posts
So I have created dedicated page template, and on this page I want to list the 3 most recent blog posts, with the...
Read more >Custom posttype archive – pagination – Post missing / repeating
Hello,. I'm using a custom post type with a pagination. But some posts are repeated and some are not shown. Can't figure out...
Read more >Laravel pagination links not working - Laracasts
It does create pagination links in view but when I click on any link it ... When I click on any pagination link...
Read more >SEO-Friendly Pagination: A Complete Best Practices Guide
In this guide, learn how pagination can hurt SEO, the pros and cons of pagination handling options, and how to track KPIs.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
By 200 posts, I was referring to your code that sets
"posts_per_page": 200
. But actually, this setting only works for pages, not groups, so you’ll be getting the group default of 20-40 posts per page, and you can’t change that.Pagination is triggered when necessary - when you’ve consumed all of the posts on each page. The scraper makes this transition between pages as seamless as possible for you. Because a web request is only made when requesting each page, by definition, you’re only going to get a TemporaryBanned exception when requesting a new page, and not when processing posts on the same page.
You should probably increase your time.sleep from 60 seconds, in my experience temporary bans last much longer than that. Otherwise your code looks fine to me.
@roma-glushko Thanks for the code! You are amazing! 🙌 @neon-ninja I tested the code above, and it seems like the trigger of pagination works every 21 posts? here is the output bellow
So, when the pagination is triggered? And if is triggered every 200 posts, is there a way to trigger it more often? Because when temp ban occurs before 200 posts start_url remains the same and it scrapes the same posts again. Finally, I use this code to continue scrape data after I have been banned. Is that correct? I’m sorry for the spam, but I really want to fix this problem! Thanks again!