Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pagination iterator doesn't work for APIs with token-based pagination

See original GitHub issue

For several APIs, parsing the serpapi_pagination.next is the only way to update params_dict with correct values. An increment of params.start won’t work for Google Scholar Profiles, Google Maps, YouTube.

https://github.com/serpapi/google-search-results-python/blob/ed7797c132d80613080b11b99f5b137bbeb5c3f5/serpapi/pagination.py#L26-L27

Google Scholar Profiles

Google Scholar Profiles API have pagination.next_page_token instead of serpapi_pagination.next.

pagination.next is a next page URI like https://serpapi.com/search.json?after_author=0QICAGE___8J&engine=google_scholar_profiles&hl=en&mauthors=label%3Asecurity where after_author is set to next_page_token.

Google Maps

In Google Maps Local Results API there’s only serpapi_pagination.next with a URI like https://serpapi.com/search.json?engine=google_maps&ll=%4040.7455096%2C-74.0083012%2C14z&q=Coffee&start=20&type=search

YouTube

In YouTube Search Engine Results API there’s serpapi_pagination.next_page_token similar to Google Scholar Profiles. serpapi_pagination.next is a URI with sp parameter set to next_page_token.

@jvmvik What do you think about parsing serpapi_pagination.next in Pagination#__next__?

- self.start += self.page_size
+ self.client.params_dict.update(dict(parse.parse_qsl(parse.urlsplit(result['serpapi_pagination']['next']).query)))

Here’s an example of endless pagination of Google Scholar Authors (scraped 190 pages and manually stopped).

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:5 (2 by maintainers)

Top GitHub Comments

2reactions

ilyazubcommented, Sep 14, 2021

Could we improve the consistency between Search engines on the backend or the client ?

It depends on the target website — we mirror their query parameters. But consistency on the SerpApi backend should be improved too. For example, response for google_scholar_profiles engine contains pagination but no serpapi_pagination.

Currently, a reliable way to consume pagination across all search engines on the client is to use result['serpapi_pagination']['next'].

2reactions

jvmvikcommented, Sep 13, 2021

Sorry for the long wait on this. I was working with a couple of wrong assumption. 1- page start, num are always supported / translate if needed. 2- start = f(x * num)

1- On the top the suggestion above, I can implement a mapping table per search engine.

    # set default
    self.start_key = "start"
    self.num_key = "num"
    self.end_key = "end"

    # override per search engine
    if engine == BAIDU_ENGINE:
      self.start_key = "pn"
      self.num_key = "rn"

So, the response.next can be parse for most of the search engine except Google Scholar.

2- The way the offset is returned by SerpApi is not consistent. Google takes start=0, num=20

page 1 : start=0
page 2 : start=20
page 3 : start=40 see: tests/test_google_search.py#test_paginate in branch: 2.5.0

Ebay takes pn=0, rn=20

page 1 : start=2
page 2 : start=3
page 3 : start=4

see: tests/test_ebay_search.py#test_paginate in branch: 2.5.0

https://github.com/serpapi/google-search-results-python/pull/new/2.5.0

Could we improve the consistency between Search engines on the backend or the client ? Or do we even care ? The user might not switch back on forth between search engine.

Top Results From Across the Web

Iterate over dynamic Web API Pages with Power Query

In conclusion, we have learnt how to implement Cursor-Based Pagination using List.Generate to extract large data sets from Web APIs. You can use ......

Consume "paginated" (not really) API in Scala with unknown ...

1 Answer 1 · Very interesting! .iterate ! · What's the reason for the LazyList ? Can't you just do: val iter: Iterator[Seq[Msg]]...

Paginating Requests in APIs. | Medium - Ignacio Chiazzo

Cursor-based Pagination: The response body of a GET request returns the first page of results, and the response header returns the URLs to...

Pagination for TimestreamDB · Issue #838 - GitHub

There is still a problem with the current implementation. 'NextToken' is still not part of the return value. What's the point of adding...

Pagination | Docs | Twitter Developer Platform

Introduction. Pagination is a feature in Twitter API v2 endpoints that return more results than can be returned in a single response.