google scholar pagination not returning final results page
See original GitHub issueI am using the paginate method with google scholar engine to return all results for a search term. When I use a for loop to iterate the pagination and put the results a list, it doesn’t return the final page of results, instead stopping at the penultimate page (code snippet and terminal output below).
import serpapi
import os
from loguru import logger
from dotenv import load_dotenv
load_dotenv()
search_string = '"Singer Instruments" PhenoBooth'
# Pagination allows iterating through all pages of results
logger.info("Initialising search through serpapi")
search = serpapi.GoogleSearch(
{
"engine": "google_scholar",
"q": search_string,
"api_key": os.getenv("SERPAPI_KEY"),
"as_ylo": 1900,
}
)
pages = search.pagination(start=0, page_size=20)
# get dict for each page of results and store in list
results_list = []
page_number = 1
for page in pages:
logger.info(f"Retrieving results page {page_number}")
results_list.append(page)
page_number += 1
gscholar_results = results_list[0]["search_information"]["total_results"]
print(f"results reported by google scholar: {gscholar_results}")
paper_count = 0
for page in results_list:
for paper in page["organic_results"]:
paper_count += 1
print(f"number of papers in results: {paper_count}")
If I check my searches on serpAPI.com, results are being generated for all pages (see below for example in code). So the problem is not that the result isn’t generated, its just not coming out of the pagination iterator for some reason.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:9 (3 by maintainers)
Top Results From Across the Web
paginate(page_size=20) not returning final results page | Voters
I am using the paginate method with google scholar engine to return all results for a search term. When I use a for...
Read more >Extract Profile and Author Results from Google Scholar to CSV ...
Scrape all Google Scholar Profile Results # if next page in SerpApi pagination -> update params to new a page results. # if...
Read more >Clicking 'Show More' button on Google Scholar Results
You can pass pagination parameters to the request url. pagesize - Parameter defines the number of results to return.
Read more >Scrape historic Google Scholar Organic, Cite results using Python
Scrape Google Scholar Organic Results using Pagination ... try/except was used to handle None values when they were not returned from Google backend....
Read more >How to scrape Google Scholar profiles results with Node.js
First, we need to create a Node.js* project and add npm packages cheerio to parse parts of the HTML markup, and axios to...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Same issue here with google news. The paginator misses the last page.
Hi @jvmvik @ilyazub, is there any progress on a fix? Currently I am using a modified version of the package in my code but I need to finalise my code for others to use by next Thursday before I leave my current job and I want to be using an official release of this package before I hand over.