Pagination iterator doesn't work for APIs with token-based pagination
See original GitHub issueFor several APIs, parsing the serpapi_pagination.next
is the only way to update params_dict
with correct values. An increment of params.start
won’t work for Google Scholar Profiles, Google Maps, YouTube.
Google Scholar Profiles
Google Scholar Profiles API have pagination.next_page_token
instead of serpapi_pagination.next
.
pagination.next
is a next page URI like https://serpapi.com/search.json?after_author=0QICAGE___8J&engine=google_scholar_profiles&hl=en&mauthors=label%3Asecurity
where after_author
is set to next_page_token
.
Google Maps
In Google Maps Local Results API there’s only serpapi_pagination.next
with a URI like https://serpapi.com/search.json?engine=google_maps&ll=%4040.7455096%2C-74.0083012%2C14z&q=Coffee&start=20&type=search
YouTube
In YouTube Search Engine Results API there’s serpapi_pagination.next_page_token
similar to Google Scholar Profiles. serpapi_pagination.next
is a URI with sp
parameter set to next_page_token
.
@jvmvik What do you think about parsing serpapi_pagination.next
in Pagination#__next__
?
- self.start += self.page_size
+ self.client.params_dict.update(dict(parse.parse_qsl(parse.urlsplit(result['serpapi_pagination']['next']).query)))
Here’s an example of endless pagination of Google Scholar Authors (scraped 190 pages and manually stopped).
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:5 (2 by maintainers)
It depends on the target website — we mirror their query parameters. But consistency on the SerpApi backend should be improved too. For example, response for
google_scholar_profiles
engine containspagination
but noserpapi_pagination
.Currently, a reliable way to consume pagination across all search engines on the client is to use
result['serpapi_pagination']['next']
.Sorry for the long wait on this. I was working with a couple of wrong assumption. 1- page start, num are always supported / translate if needed. 2- start = f(x * num)
1- On the top the suggestion above, I can implement a mapping table per search engine.
So, the
response.next
can be parse for most of the search engine except Google Scholar.2- The way the offset is returned by SerpApi is not consistent. Google takes start=0, num=20
Ebay takes pn=0, rn=20
see: tests/test_ebay_search.py#test_paginate in branch: 2.5.0
https://github.com/serpapi/google-search-results-python/pull/new/2.5.0
Could we improve the consistency between Search engines on the backend or the client ? Or do we even care ? The user might not switch back on forth between search engine.