question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Documentation Request] Document how to stream results of large search queries

See original GitHub issue

Description

Search queries can return a large volume of data and c.e.c.e.core.search.HitsMetadata#hits() is a List rather than an a kind of Iterator suggesting that the results are all loaded in memory rather than being streamed similarly to the java.sql.ResultSet API.

Could we have documentation on best practices to stream large search responses?

The Jenkins OpenTelemetry Plugin has implemented an abstraction of Iterator that progressively retrieves small chunks of and Elasticsearch query using the Point in Time API (code here) but it may not be the most efficient solution.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
swallezcommented, Mar 8, 2022

A SQL ResultSet is a sequence of independent messages sent over the wire, and lazy iteration is possible there. A JSON response is a tree where properties can come in arbitrary order. So we have to read the entire tree to be able to return a consistent response, and so a search response’s hits are all loaded in memory, and this is really a List.

The approach taken in the OpenTelemetry plugin is the right one, by using the Point in Time API to split a large result set into smaller chunks and provide an Iterator on top of it.

This approach is what we plan to use to provide a PointInTimeHelper. Bear with us, it’s on our roadmap, and documentation will come with this helper. However we will not write documentation explaining how to use the Point In Time API specifically from the Java API client, as it would not add much value compared to the corresponding Elasticsearch API doc and would have to trashed once the PointInTimeHelper is there.

0reactions
swallezcommented, May 5, 2022

Documentation on SearchRequest.pit has been added to the specification and is now in the javadoc.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Stream Query Results | RavenDB 5.4 Documentation
RavenDB supports streaming data from the server to the client. Streaming is useful when processing a large number of results. or just filtered...
Read more >
Searching data in Amazon OpenSearch Service
Learn about several common methods of searching documents in Amazon OpenSearch ... In a URI search, you specify the query as an HTTP...
Read more >
Does Elasticsearch stream results? - Stack Overflow
Generally to implement "streaming" , you make an initial search to get total count of matching documents and then ask for documents in...
Read more >
Paginate search results | Elasticsearch Guide [8.5] | Elastic
To page through a larger set of results, you can use the search API's from and size parameters. The from parameter defines the...
Read more >
Report Streaming | Google Ads API
While Search can send multiple paginated requests to download the entire report, SearchStream sends a single request and initiates a persistent connection with ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found