question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Elastic Search with Scroll and Slice with NEST to retrieve large volume data parallel

See original GitHub issue

I am writing NEST code to retrieve large volume of data from Elastic search.Right now I am using Scroll feature to fetch all records from cluster as synchronous way . Below is the code snippet.

var response = elasticClient.Search<IndexType>(s => s                                 
                                    .Source(sf => sf
                                    .Includes(i => i
                                               .Fields(                                                          
                                                          f => f.DateTime
                                                      )
                                             )
                                            )
                                            .Scroll("1m") 
                                            .From(0)
                                            .Size(9999)
                                    .Query(q => q

                                         .DateRange(r => r
                                             .Field(f => f.DateTime)
                                             .GreaterThanOrEquals(new DateTime(2017, 01, 01))
                                             .LessThan(new DateTime(2017, 04, 01))
                                         )
                                         )                                               
                                          .Sort(q => q.Ascending(u => u.DateTime))
                                         );


            List<IndexType> allData = new List<IndexType>();                                         
            while (response.Documents.Any())
            {
                foreach (var document in response.Documents)
                {
                    allData.Add(document);
                }                    

                response = elasticClient.Scroll<RACType>("1m", response.ScrollId);
            }

Now instead of while loop (fetching 10000 record in batch till all documents fetched ), is there any mechanism to do this Asynchronously/parallel so that I don’t have to wait for all iteration ?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:1
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
Mpdreamzcommented, Feb 4, 2021

The sweet spot for numberOfSlices should be the number of shards at play.

numberOfSlices should ideally be <= number of shards see the official documentation for reasoning behind this:

https://www.elastic.co/guide/en/elasticsearch/reference/7.10/paginate-search-results.html#slice-scroll

It may exceed the number of shards but in that case an additional filter has to be maintained.

Since this is IO intensive the default for maxDegreeOfParallelism is usually the sanest default. It should at a minimum be the same as the numberOfSlices.

1reaction
RK-Rahulcommented, Apr 2, 2018

Hey thanks for your reply, now I have doubts regarding ScrollAll parameters like, How can we decide no of slices (Any standard rule to decide this number as I have read some where the ideal no should be equivalent to no of shards) ?

Same goes for MaxDegreeOfParallelism() like how can I decide this number as well ?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Elastic Search with Scroll and Slice with NEST to retrieve ...
I am writing NEST code to retrieve large volume of data from Elastic search.Right now I am using Scroll feature to fetch all...
Read more >
Simultaneously executing multiple queries on Scroll API ...
How to fetch large data in parallel from elastic search? Using Scroll API I am able to fetch complete data from elasticsearch.
Read more >
How to get data more than 10000 in elasticsearch
We have given 10 slices and scroll size is 4000. But slice API is taking more time to fetch all data from elasticsearch....
Read more >
Paginate search results | Elasticsearch Guide [8.9]
Collapse search results · Filter search results · Highlighting · Long-running searches · Near real-time search · Paginate search results · Retrieve inner ......
Read more >
Retrieve Large Dataset in Elasticsearch
It's easy to get small dataset from Elasticsearch by using size and from. ... scroll is more efficient when retrieve large set of...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found