question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Implement efficient pagination helpers using search_after

See original GitHub issue

Currently any of the pagination needs to be done manually either via slicing (which can be inefficient for deep pagination) or using search_after (0), which can be complex. What I propose is to introduce several new methods on Search objects:

def get_page(self, page_no):
    """
    use slicing to get the `page_no` page and return a response (it will execute your search)
    """

def get_next_page(self, last_hit, step=1):
    """
    use `search_after` and return a response representing page of the response + step
    """

def get_previous_page(self, first_hit, step=1):
    """
    similar to get_next_page but will have to reverse the order first to be able to use search_after
    """

and helper methods on Response to retrieve last_hit and first_hit (self.hits[0/-1].meta.sort) and also to directly use those to call get_next/previous_page.

0 - https://www.elastic.co/guide/en/elasticsearch/reference/6.1/search-request-search-after.html

Or do people think this should be a separate object/module altogether? is there anything I am missing? (number of pages? Direct jump to last/first page?

Issue Analytics

  • State:open
  • Created 6 years ago
  • Reactions:2
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
shotecorpscommented, Sep 26, 2019

When using search_after, we need to choose a unique sort key. And there was a little difficulty in choosing. Field _id is not recommended, it’s not a doc_value field. And when shard is large, for example close to 50G. Sorting with _id leads to poor performance(comparing to default sort _doc). _doc is also not suitable for sorting either, for it is not unique for each doc.

0reactions
drpumpcommented, Apr 13, 2018

Thanks, so I have 3 solutions I could implement:

  1. Paginate first N (10,000 or other max) and last N records (reverse search) and throw an exception for those in between.
  2. Use the forward/reverse search with search_after and do a lazy fetch of records in between (i.e. get first N, get last N, get next N if required, get next-to-last N etc). Some accuracy issues, but not significant for a large number of records.
  3. Retrieve all metadata only using scroll API and paginate on the array. In Rails, due to integration with ActiveRecord, I can retrieve each match from my DB rather than going back to ES. Has memory and latency implications for my app, although background fetch would probably make it perform OK. Again, some accuracy issues due to currency of scroll but not significant.

All are client side solutions. I’d need to implement a new searcher class in Rails or monkey patch the elasticsearch gems. Doable if perhaps a bit messy.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Elasticsearch Pagination Techniques: SearchAfter, Scroll ...
Elasticsearch currently provides 3 different techniques for fetching many results: pagination, Search-After and Scroll. Each use case calls for a different ...
Read more >
How to Optimize Your Elasticsearch Queries Using Pagination
Elasticsearch provides three ways of paginating data that are each useful: From/Size Pagination; Search After Pagination; Scroll Pagination.
Read more >
Paginate search results | Elasticsearch Guide [8.5] | Elastic
Search after edit. You can use the search_after parameter to retrieve the next page of hits using a set of sort values from...
Read more >
EFFECTIVE PAGINATION IN ELASTICSEARCH - LinkedIn
We built and implemented a custom wrapper where pagination could be easily integrated into the Elasticsearch engine. By using the search_after ...
Read more >
Search After (pagination) in Elasticsearch when sorting by score
Why can't you use the from parameter to get results at the offset? – Dennis. Dec 29, 2019 at 22:42.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found