Pagination Support
See original GitHub issueRight now, pagination only works with selection queries with order by
SELECT foo, bar FROM myTable
WHERE baz > 20
ORDER BY bar DESC
LIMIT 50, 100
Paginate the selection results from the ‘a’ th results and return at most ‘b’ results.
The below query doesn’t paginate the results
SELECT count(*), foo, bar FROM myTable
WHERE baz > 20
GROUP BY foo
ORDER BY bar DESC
LIMIT 50, 100
Issue Analytics
- State:
- Created 3 years ago
- Reactions:16
- Comments:12 (8 by maintainers)
Top Results From Across the Web
Everything You Need to Know About API Pagination
There are a few different ways to implement pagination in your APIs. We cover everything you need to know about API pagination.
Read more >Pagination in the REST API - Atlassian Developer
For that reason, we paginate the results to make sure responses are easier to handle. Let's say your initial call is asking for...
Read more >Pagination - Square Developer
Pagination is a process that is used to divide a large dataset into smaller chunks (pages). All Square API endpoints that return a...
Read more >Paginating Requests in APIs. | Medium - Ignacio Chiazzo
Learn the most common pagination API strategies: Cursor-based, Key Set-based pagination and Paged based ... Only a few endpoints support it.
Read more >What is API Pagination? Technical topics explained simply
Pagination turns big archives of data into smaller, more digestible pieces. Clicking through an archive of pictures, or turning the page of a ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@kishoreg as discussed offline, I will take this up
At LinkedIn, we have started to work on pagination on priority considering multiple requests we have received internally.
At a high level, our customer requirements are around the fact that they want to run query in Pinot that can potentially return a large response and users want the ability to paginate the response as multiple result sets (size per result set dictated by the user app).
The current pagination implementation in Pinot (even if it is just for selection query) is sub-optimal in the sense that it takes each query as a fresh query and executes the query again and again for every pagination window, discard the results outside the window and provides the result within the M, N window that user has asked for.
The main thing to note about pagination is that it has to be treated as a single query. In case of our customers, they don’t want to run a one-off pagination query OFFSET M, FETCH N where M and N are completely random in which case it is not possible to reason about the results and it’s even hard for the user to decide M as a one-off starting point. Result of a random pagination query doesn’t add any value to the user since they want to look at the entire result as a continuous stream of results / pages / batches with the will to stop anytime.
So, the semantics that we want to provide is that “I want to fetch 10 million records from Pinot for a query and want to fetch 100K at a time”. The customer will typically start with M as 0 and might just keep N fixed (say at 10K or 100K etc) and just keep paging the results through multiple calls from their app which simply changes M during every call (and they potentially refresh the results in UI etc returned by Pinot in every call).
I think we should look at the pagination problem from this perspective as opposed to a random one-off pagination query. We are trying to tackle the problem from this angle. Detail design discussion is in progress.
Some more thoughts slightly related to this –
Now, one problem is that users who run such queries may have the tendency to think that support for pagination means they can run “any” query in Pinot that can be very long running and Pinot is guaranteed to finish it and provide results. This can easily cause OOM (out of memory) and bring down the cluster.
Pinot is unlikely to enter the territory of running very long running queries and getting the entire 100% accurate result by spilling to disk and avoiding OOM at all times. Presto should be used for those cases.
However, for some of our users (who are ok with multi-second latency and prefer slightly more accurate response for GROUP BY queries), as a follow-up / next phase, we want to consider enhancing support in Pinot for queries that return large responses and/or process / aggregate more than usual amounts of data. We want to do this by doing some of the memory intensive query execution operations in off-heap (direct) memory. This along with the ability to paginate a large response back sort of fulfills the requirements we are seeing in production.