Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Documentation Request] Document how to stream results of large search queries

See original GitHub issue

Description

Search queries can return a large volume of data and c.e.c.e.core.search.HitsMetadata#hits() is a List rather than an a kind of Iterator suggesting that the results are all loaded in memory rather than being streamed similarly to the java.sql.ResultSet API.

Could we have documentation on best practices to stream large search responses?

The Jenkins OpenTelemetry Plugin has implemented an abstraction of Iterator that progressively retrieves small chunks of and Elasticsearch query using the Point in Time API (code here) but it may not be the most efficient solution.

Issue Analytics

State:
Created 2 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

swallezcommented, Mar 8, 2022

A SQL ResultSet is a sequence of independent messages sent over the wire, and lazy iteration is possible there. A JSON response is a tree where properties can come in arbitrary order. So we have to read the entire tree to be able to return a consistent response, and so a search response’s hits are all loaded in memory, and this is really a List.

The approach taken in the OpenTelemetry plugin is the right one, by using the Point in Time API to split a large result set into smaller chunks and provide an Iterator on top of it.

This approach is what we plan to use to provide a PointInTimeHelper. Bear with us, it’s on our roadmap, and documentation will come with this helper. However we will not write documentation explaining how to use the Point In Time API specifically from the Java API client, as it would not add much value compared to the corresponding Elasticsearch API doc and would have to trashed once the PointInTimeHelper is there.

0reactions

swallezcommented, May 5, 2022

Documentation on SearchRequest.pit has been added to the specification and is now in the javadoc.

Top Results From Across the Web

How to Stream Query Results | RavenDB 5.4 Documentation

RavenDB supports streaming data from the server to the client. Streaming is useful when processing a large number of results. or just filtered...

Searching data in Amazon OpenSearch Service

Learn about several common methods of searching documents in Amazon OpenSearch ... In a URI search, you specify the query as an HTTP...

Does Elasticsearch stream results? - Stack Overflow

Generally to implement "streaming" , you make an initial search to get total count of matching documents and then ask for documents in...

Paginate search results | Elasticsearch Guide [8.5] | Elastic

To page through a larger set of results, you can use the search API's from and size parameters. The from parameter defines the...

Report Streaming | Google Ads API

While Search can send multiple paginated requests to download the entire report, SearchStream sends a single request and initiates a persistent connection with ......