question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

google scholar pagination not returning final results page

See original GitHub issue

I am using the paginate method with google scholar engine to return all results for a search term. When I use a for loop to iterate the pagination and put the results a list, it doesn’t return the final page of results, instead stopping at the penultimate page (code snippet and terminal output below).

import serpapi
import os
from loguru import logger
from dotenv import load_dotenv

load_dotenv()

search_string = '"Singer Instruments" PhenoBooth'

# Pagination allows iterating through all pages of results
logger.info("Initialising search through serpapi")
search = serpapi.GoogleSearch(
    {
        "engine": "google_scholar",
        "q": search_string,
        "api_key": os.getenv("SERPAPI_KEY"),
        "as_ylo": 1900,
    }
)
pages = search.pagination(start=0, page_size=20)

# get dict for each page of results and store in list
results_list = []
page_number = 1
for page in pages:
    logger.info(f"Retrieving results page {page_number}")
    results_list.append(page)
    page_number += 1

gscholar_results = results_list[0]["search_information"]["total_results"]
print(f"results reported by google scholar: {gscholar_results}")

paper_count = 0
for page in results_list:
    for paper in page["organic_results"]:
        paper_count += 1

print(f"number of papers in results: {paper_count}")

Screenshot 2021-07-30 at 17 05 11

If I check my searches on serpAPI.com, results are being generated for all pages (see below for example in code). So the problem is not that the result isn’t generated, its just not coming out of the pagination iterator for some reason.

Screenshot 2021-07-30 at 17 06 21

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:1
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
kikohscommented, Aug 6, 2022

Same issue here with google news. The paginator misses the last page.

2reactions
samuelhaysomcommented, Sep 20, 2021

Hi @jvmvik @ilyazub, is there any progress on a fix? Currently I am using a modified version of the package in my code but I need to finalise my code for others to use by next Thursday before I leave my current job and I want to be using an official release of this package before I hand over.

Read more comments on GitHub >

github_iconTop Results From Across the Web

paginate(page_size=20) not returning final results page | Voters
I am using the paginate method with google scholar engine to return all results for a search term. When I use a for...
Read more >
Extract Profile and Author Results from Google Scholar to CSV ...
Scrape all Google Scholar Profile Results​​ # if next page in SerpApi pagination -> update params to new a page results. # if...
Read more >
Clicking 'Show More' button on Google Scholar Results
You can pass pagination parameters to the request url. pagesize - Parameter defines the number of results to return.
Read more >
Scrape historic Google Scholar Organic, Cite results using Python
Scrape Google Scholar Organic Results using Pagination ... try/except was used to handle None values when they were not returned from Google backend....
Read more >
How to scrape Google Scholar profiles results with Node.js
First, we need to create a Node.js* project and add npm packages cheerio to parse parts of the HTML markup, and axios to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found