Async delete by query API
See original GitHub issueI have an application where I use delete by query via the repository query methods e.g. long deleteByAssetType(String assetType);
this works great. But some indexes are now large enough this operation can timeout throwing an exception.
After looking at the ES API I realised there is a async option when using delete by query and in the Spring docs there are some return types that could match this async feature Future<T>
, CompletableFuture<T>
and ListenableFuture
so I implemented a repository method with the sutible signature e.g.
@Async
Future<Void> deleteByAssetType(String assetType);
however it doesn’t seem to implement the async behaviour? It still calls org.elasticsearch.client.RestHighLevelClient#deleteByQuery
whereas I expected a call to org.elasticsearch.client.RestHighLevelClient#deleteByQueryAsync
Would it be possible to add support for this kind of method signature to allow the correct call to be automatically generated, and the returned Future
to block until the ES task has finished?
Or maybe i’m using the API wrong?
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (1 by maintainers)
task support would have to be added by a method
DocumentOperations.submitDelete(Query)
that returns aTask
object. No future or async stuff there. AndClusterOperations
would need to have all the methods added to read tasks or wait for them (with a timeout).To have this in a repository you would need a custom repository implementation that first submits the query and then waits for the tasks.
As for the roadmap: there is none. Spring Data Elasticsearch is a community driven project, so open issues are done when somebody has the time to do it. The problem is there are only a few people contributing, mostly when it’s a change they need. I do what I can, but I am doing this in my free time and not as a job. Currently I am working on the new clients, and the possibility to use Spring Data Elasticsearch with Elasticsearch and OpenSearch.
I created issue #1910, If you find time and want to contribute, comment on that to align on the things that need to be done, and so others can see that the issue is worked on.
As the linked documentation states:
Spring Data Elasticsearch does not support these return types.
Elasticsearch’s
RestHighLevelClient
has asynchronous versions for almost every operation, not only for the delete by query. Spring Data Elasticsearch does not use these. Why? I cannot tell you the reasons because I came to this project after the first support for the REST client was implemented. This was in 3.2 and in the very same version the reactive implementation was added. So there probably was no need to add support for the async calls when providing a reactive and asynchronous version as well.Would the async version solve your problem? Probably not. I worked myself into the Elasticsearch code to see the difference between the aysnc and non-async code. And there is nearly none. The non-async code uses basically the same async code to send the request to the cluster but then does a
get()
to wait for the response data to return. So you’d get the timeout using the async version as well.What could be done? There is the possibility for the client to not wait for the result (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html#docs-delete-by-query-task-api) but to immediate return a task information which can then be checked later for completion. For this we’d need to call the
org.elasticsearch.client.RestHighLevelClient#submitDeleteByQueryTask
method which is currently not supported in Spring Data Elasticsearch , and we’d need to provide an implementation of the task API (https://www.elastic.co/guide/en/elasticsearch/reference/current/tasks.html) which we currently do not have.You could try to set a higher socket timeout value in the configuration (see https://docs.spring.io/spring-data/elasticsearch/docs/current/reference/html/#elasticsearch.clients.configuration).
Or perhaps you can reduce the index size by splitting an index into several indices by date or some other criteria, using an alias or wildcard when searching, and on deletion you could do separate delete requests.
Sorry that I don’t have a better solution at the moment.