Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Elastic Search with Scroll and Slice with NEST to retrieve large volume data parallel

See original GitHub issue

I am writing NEST code to retrieve large volume of data from Elastic search.Right now I am using Scroll feature to fetch all records from cluster as synchronous way . Below is the code snippet.

var response = elasticClient.Search<IndexType>(s => s                                 
                                    .Source(sf => sf
                                    .Includes(i => i
                                               .Fields(                                                          
                                                          f => f.DateTime
                                                      )
                                             )
                                            )
                                            .Scroll("1m") 
                                            .From(0)
                                            .Size(9999)
                                    .Query(q => q

                                         .DateRange(r => r
                                             .Field(f => f.DateTime)
                                             .GreaterThanOrEquals(new DateTime(2017, 01, 01))
                                             .LessThan(new DateTime(2017, 04, 01))
                                         )
                                         )                                               
                                          .Sort(q => q.Ascending(u => u.DateTime))
                                         );


            List<IndexType> allData = new List<IndexType>();                                         
            while (response.Documents.Any())
            {
                foreach (var document in response.Documents)
                {
                    allData.Add(document);
                }                    

                response = elasticClient.Scroll<RACType>("1m", response.ScrollId);
            }

Now instead of while loop (fetching 10000 record in batch till all documents fetched ), is there any mechanism to do this Asynchronously/parallel so that I don’t have to wait for all iteration ?

Issue Analytics

State:
Created 5 years ago
Reactions:1
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

Mpdreamzcommented, Feb 4, 2021

The sweet spot for numberOfSlices should be the number of shards at play.

numberOfSlices should ideally be <= number of shards see the official documentation for reasoning behind this:

https://www.elastic.co/guide/en/elasticsearch/reference/7.10/paginate-search-results.html#slice-scroll

It may exceed the number of shards but in that case an additional filter has to be maintained.

Since this is IO intensive the default for maxDegreeOfParallelism is usually the sanest default. It should at a minimum be the same as the numberOfSlices.

1reaction

RK-Rahulcommented, Apr 2, 2018

Hey thanks for your reply, now I have doubts regarding ScrollAll parameters like, How can we decide no of slices (Any standard rule to decide this number as I have read some where the ideal no should be equivalent to no of shards) ?

Same goes for MaxDegreeOfParallelism() like how can I decide this number as well ?

Top Results From Across the Web

Elastic Search with Scroll and Slice with NEST to retrieve ...

I am writing NEST code to retrieve large volume of data from Elastic search.Right now I am using Scroll feature to fetch all...

Simultaneously executing multiple queries on Scroll API ...

How to fetch large data in parallel from elastic search? Using Scroll API I am able to fetch complete data from elasticsearch.

How to get data more than 10000 in elasticsearch

We have given 10 slices and scroll size is 4000. But slice API is taking more time to fetch all data from elasticsearch....

Paginate search results | Elasticsearch Guide [8.9]

Collapse search results · Filter search results · Highlighting · Long-running searches · Near real-time search · Paginate search results · Retrieve inner ......

Retrieve Large Dataset in Elasticsearch

It's easy to get small dataset from Elasticsearch by using size and from. ... scroll is more efficient when retrieve large set of...