question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[FEATURE] Parameter for specifying BULK API operation

See original GitHub issue

Is your feature request related to a problem? Please describe.

Currently, I’ve managed to get updates working by specifying a transform that takes and ID field from the source and uses it as the document ID for Elasticsearch, and loading the doc as the source:

elasticdump \
    --input "test.json" \
    --output="http://host:port/index" \
    --transform="doc._source=Object.assign({},doc)" \
    --transform="doc._id=doc._source['id']" 

The data.json files contains the following structure:

{"id":"1", "updated_field":"updated_value"}

The problem is that this option makes me overwrite the entire source for the document. I’d like to only update the updated_value provided in the JSON, and the fields that don’t exist in the source JSON, I’d like them to remain unchaged in Elasticseach.

Describe the solution you’d like I was taking a look at the source code for Elasticdump and found the part of the _data.js that apparently builds the payload. The Bulk API call appears to be fixed as “index” type. It would be great if we could swap out the “index” in the actionMeta variable for a parameter.

This would allow users to provide the “update” Bulk API operation for example, that only updates the provided fields.

Describe alternatives you’ve considered I tried to examine the code but I couldn’t find a way to implement this myself.

Give some examples of implementations An optional flag --bulkApiOperation with defaults to “index” (so the default behavior would stay the same as-is) but can be changed to update to allow bulk updates using Elasticdump.

Additional context Add any other context or screenshots about the feature request here.

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:1
  • Comments:8

github_iconTop GitHub Comments

1reaction
gustavom2998commented, Aug 10, 2022

Tested and working locally with inserts (index) and update.

0reactions
ferronrsmithcommented, Aug 9, 2022

Get the latest and try again

Read more comments on GitHub >

github_iconTop Results From Across the Web

Bulk API 2.0 and Bulk API Developer Guide
Any data operation that includes more than 2,000 records is a good ... A Bulk API job specifies which object is being processed...
Read more >
Use Bulk API 2.0 Unit | Salesforce Trailhead
A job specifies the type of operation and data object you're working with. ... REST API uses, which means that Bulk API supports...
Read more >
Bulk API | Elasticsearch Guide [8.5] | Elastic
To make the result of a bulk operation visible to search using the refresh parameter, you must have the maintenance or manage index...
Read more >
Supporting bulk operations in REST APIs - mscharhag
Bulk (or batch) operations are used to perform an action on more than one resource in single request. In this post we will...
Read more >
Salesforce Bulk API support for Query Operation - IBM
A new stage GUI category by name Bulk Mode is provided for the Access Method Job Property. This will be enabled only for...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found