[FEATURE] Parameter for specifying BULK API operation
See original GitHub issueIs your feature request related to a problem? Please describe.
Currently, I’ve managed to get updates working by specifying a transform that takes and ID field from the source and uses it as the document ID for Elasticsearch, and loading the doc as the source:
elasticdump \
--input "test.json" \
--output="http://host:port/index" \
--transform="doc._source=Object.assign({},doc)" \
--transform="doc._id=doc._source['id']"
The data.json files contains the following structure:
{"id":"1", "updated_field":"updated_value"}
The problem is that this option makes me overwrite the entire source for the document. I’d like to only update the updated_value provided in the JSON, and the fields that don’t exist in the source JSON, I’d like them to remain unchaged in Elasticseach.
Describe the solution you’d like I was taking a look at the source code for Elasticdump and found the part of the _data.js that apparently builds the payload. The Bulk API call appears to be fixed as “index” type. It would be great if we could swap out the “index” in the actionMeta variable for a parameter.
This would allow users to provide the “update” Bulk API operation for example, that only updates the provided fields.
Describe alternatives you’ve considered I tried to examine the code but I couldn’t find a way to implement this myself.
Give some examples of implementations
An optional flag --bulkApiOperation
with defaults to “index” (so the default behavior would stay the same as-is) but can be changed to update to allow bulk updates using Elasticdump.
Additional context Add any other context or screenshots about the feature request here.
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:8
Tested and working locally with inserts (index) and update.
Get the latest and try again