Elastic Search with Scroll and Slice with NEST to retrieve large volume data parallel
See original GitHub issueI am writing NEST code to retrieve large volume of data from Elastic search.Right now I am using Scroll feature to fetch all records from cluster as synchronous way . Below is the code snippet.
var response = elasticClient.Search<IndexType>(s => s
.Source(sf => sf
.Includes(i => i
.Fields(
f => f.DateTime
)
)
)
.Scroll("1m")
.From(0)
.Size(9999)
.Query(q => q
.DateRange(r => r
.Field(f => f.DateTime)
.GreaterThanOrEquals(new DateTime(2017, 01, 01))
.LessThan(new DateTime(2017, 04, 01))
)
)
.Sort(q => q.Ascending(u => u.DateTime))
);
List<IndexType> allData = new List<IndexType>();
while (response.Documents.Any())
{
foreach (var document in response.Documents)
{
allData.Add(document);
}
response = elasticClient.Scroll<RACType>("1m", response.ScrollId);
}
Now instead of while loop (fetching 10000 record in batch till all documents fetched ), is there any mechanism to do this Asynchronously/parallel so that I don’t have to wait for all iteration ?
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Elastic Search with Scroll and Slice with NEST to retrieve ...
I am writing NEST code to retrieve large volume of data from Elastic search.Right now I am using Scroll feature to fetch all...
Read more >Simultaneously executing multiple queries on Scroll API ...
How to fetch large data in parallel from elastic search? Using Scroll API I am able to fetch complete data from elasticsearch.
Read more >How to get data more than 10000 in elasticsearch
We have given 10 slices and scroll size is 4000. But slice API is taking more time to fetch all data from elasticsearch....
Read more >Paginate search results | Elasticsearch Guide [8.9]
Collapse search results · Filter search results · Highlighting · Long-running searches · Near real-time search · Paginate search results · Retrieve inner ......
Read more >Retrieve Large Dataset in Elasticsearch
It's easy to get small dataset from Elasticsearch by using size and from. ... scroll is more efficient when retrieve large set of...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
The sweet spot for
numberOfSlices
should be the number of shards at play.numberOfSlices
should ideally be<= number of shards
see the official documentation for reasoning behind this:https://www.elastic.co/guide/en/elasticsearch/reference/7.10/paginate-search-results.html#slice-scroll
It may exceed the number of shards but in that case an additional filter has to be maintained.
Since this is IO intensive the default for
maxDegreeOfParallelism
is usually the sanest default. It should at a minimum be the same as thenumberOfSlices
.Hey thanks for your reply, now I have doubts regarding ScrollAll parameters like, How can we decide no of slices (Any standard rule to decide this number as I have read some where the ideal no should be equivalent to no of shards) ?
Same goes for MaxDegreeOfParallelism() like how can I decide this number as well ?