[Documentation Request] Document how to stream results of large search queries
See original GitHub issueDescription
Search queries can return a large volume of data and c.e.c.e.core.search.HitsMetadata#hits()
is a List
rather than an a kind of Iterator
suggesting that the results are all loaded in memory rather than being streamed similarly to the java.sql.ResultSet
API.
Could we have documentation on best practices to stream large search responses?
The Jenkins OpenTelemetry Plugin has implemented an abstraction of Iterator
that progressively retrieves small chunks of and Elasticsearch query using the Point in Time API (code here) but it may not be the most efficient solution.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
How to Stream Query Results | RavenDB 5.4 Documentation
RavenDB supports streaming data from the server to the client. Streaming is useful when processing a large number of results. or just filtered...
Read more >Searching data in Amazon OpenSearch Service
Learn about several common methods of searching documents in Amazon OpenSearch ... In a URI search, you specify the query as an HTTP...
Read more >Does Elasticsearch stream results? - Stack Overflow
Generally to implement "streaming" , you make an initial search to get total count of matching documents and then ask for documents in...
Read more >Paginate search results | Elasticsearch Guide [8.5] | Elastic
To page through a larger set of results, you can use the search API's from and size parameters. The from parameter defines the...
Read more >Report Streaming | Google Ads API
While Search can send multiple paginated requests to download the entire report, SearchStream sends a single request and initiates a persistent connection with ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
A SQL
ResultSet
is a sequence of independent messages sent over the wire, and lazy iteration is possible there. A JSON response is a tree where properties can come in arbitrary order. So we have to read the entire tree to be able to return a consistent response, and so a search response’s hits are all loaded in memory, and this is really aList
.The approach taken in the OpenTelemetry plugin is the right one, by using the Point in Time API to split a large result set into smaller chunks and provide an
Iterator
on top of it.This approach is what we plan to use to provide a
PointInTimeHelper
. Bear with us, it’s on our roadmap, and documentation will come with this helper. However we will not write documentation explaining how to use the Point In Time API specifically from the Java API client, as it would not add much value compared to the corresponding Elasticsearch API doc and would have to trashed once thePointInTimeHelper
is there.Documentation on
SearchRequest.pit
has been added to the specification and is now in the javadoc.