question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Async delete by query API

See original GitHub issue

I have an application where I use delete by query via the repository query methods e.g. long deleteByAssetType(String assetType); this works great. But some indexes are now large enough this operation can timeout throwing an exception.

After looking at the ES API I realised there is a async option when using delete by query and in the Spring docs there are some return types that could match this async feature Future<T>, CompletableFuture<T> and ListenableFuture so I implemented a repository method with the sutible signature e.g.

@Async
Future<Void> deleteByAssetType(String assetType);

however it doesn’t seem to implement the async behaviour? It still calls org.elasticsearch.client.RestHighLevelClient#deleteByQuery whereas I expected a call to org.elasticsearch.client.RestHighLevelClient#deleteByQueryAsync Would it be possible to add support for this kind of method signature to allow the correct call to be automatically generated, and the returned Future to block until the ES task has finished?

Or maybe i’m using the API wrong?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
sothawocommented, Aug 26, 2021

task support would have to be added by a method DocumentOperations.submitDelete(Query) that returns a Task object. No future or async stuff there. And ClusterOperations would need to have all the methods added to read tasks or wait for them (with a timeout).

To have this in a repository you would need a custom repository implementation that first submits the query and then waits for the tasks.

As for the roadmap: there is none. Spring Data Elasticsearch is a community driven project, so open issues are done when somebody has the time to do it. The problem is there are only a few people contributing, mostly when it’s a change they need. I do what I can, but I am doing this in my free time and not as a job. Currently I am working on the new clients, and the possibility to use Spring Data Elasticsearch with Elasticsearch and OpenSearch.

I created issue #1910, If you find time and want to contribute, comment on that to align on the things that need to be done, and so others can see that the issue is worked on.

1reaction
sothawocommented, Aug 21, 2021

As the linked documentation states:

The following table lists the return types generally supported by Spring Data repositories. However, consult the store-specific documentation for the exact list of supported return types, because some types listed here might not be supported in a particular store.

Spring Data Elasticsearch does not support these return types.

Elasticsearch’s RestHighLevelClient has asynchronous versions for almost every operation, not only for the delete by query. Spring Data Elasticsearch does not use these. Why? I cannot tell you the reasons because I came to this project after the first support for the REST client was implemented. This was in 3.2 and in the very same version the reactive implementation was added. So there probably was no need to add support for the async calls when providing a reactive and asynchronous version as well.

Would the async version solve your problem? Probably not. I worked myself into the Elasticsearch code to see the difference between the aysnc and non-async code. And there is nearly none. The non-async code uses basically the same async code to send the request to the cluster but then does a get() to wait for the response data to return. So you’d get the timeout using the async version as well.

What could be done? There is the possibility for the client to not wait for the result (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html#docs-delete-by-query-task-api) but to immediate return a task information which can then be checked later for completion. For this we’d need to call the org.elasticsearch.client.RestHighLevelClient#submitDeleteByQueryTask method which is currently not supported in Spring Data Elasticsearch , and we’d need to provide an implementation of the task API (https://www.elastic.co/guide/en/elasticsearch/reference/current/tasks.html) which we currently do not have.

You could try to set a higher socket timeout value in the configuration (see https://docs.spring.io/spring-data/elasticsearch/docs/current/reference/html/#elasticsearch.clients.configuration).

Or perhaps you can reduce the index size by splitting an index into several indices by date or some other criteria, using an alias or wildcard when searching, and on deletion you could do separate delete requests.

Sorry that I don’t have a better solution at the moment.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Delete by query API | Elasticsearch Guide [8.5] | Elastic
While processing a delete by query request, Elasticsearch performs multiple search requests sequentially to find all of the matching documents to delete. A...
Read more >
Delete By Query API - API Manual
Delete by query is implemented using batches, and any failure causes the entire process to abort but all failures in the current batch...
Read more >
Delete By Query API :: Java REST Client
A DeleteByQueryRequest can be used to delete documents from an index. It requires an existing index (or a set of indices) on which...
Read more >
Elasticsearch Delete By Query - Examples & Common Problems
Delete-by-query is an Elasticsearch API, which was introduced in version 5.0 and provides functionality to delete all documents that match the provided query....
Read more >
Asynchronous Delete Multiple Profile List Recipients API
Query Parameters: action - Name of the action to perform. Must be set to delete . Request Method: POST. Sample Request ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found