question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] Update all records based on key rather than using preCombineKey field

See original GitHub issue

If a batch contains multiple updates for the same key, hudi is updating the dataset only with the latest value based on the field given in the preCombineKey field. Whereas I wanted to update all the records in the batch by considering only the key without worrying about the latest value.

Lets say, I have this current snapshot of table K1, V1, F1, F2, F3, F4, F5 K1 - recordKey(uniqueKey) V1 - preCombineField F1-5 are other column fields

If a batch comes with the following values, K1, V2, F1’, F2’ K1, V3, F3’, K1, V4, F5’

Since am using version as my preCombineKey it always update only the latest modification and my final record looks like K1, V4, null, null, null, null, F5`

Whereas, I want all those updates needs to be applied to my existing snapshot which should look like K1, V4, F1’, F2’, F3’, F4, F5’

Please help what is the correct way to do this.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
nsivabalancommented, Mar 22, 2022

@arunb2w : yes, we don’t have a partial update payload patch merged yet. @YannByron : sounds good. thanks!

1reaction
nsivabalancommented, Mar 10, 2022

yes, you can try OverwriteNonDefaultsWithLatestAvroPayload. let us know how it goes.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Apache Hudi — The Basics. Features | by Parth Gupta - Medium
It is used to pick the latest record in case we get multiple records with same primary key. We have used “updated_date” as...
Read more >
Create and run an update query - Microsoft Support
Use an update query in Access desktop databases to update or change the existing data in a set of records.
Read more >
Writing Data | Apache Hudi
Record keys can either be a single column or refer to multiple columns. ... stay consistent when writing to (updating) the table using...
Read more >
Considerations and limitations for using Hudi on Amazon EMR
Deletion requires schema – When deleting, you must specify the record key, the partition key, and the pre-combine key fields. Other columns can...
Read more >
Update value on the fields which have the same primary key
This is a wild guess. I think that your sample data is not relevant. I understand that you want an update query to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found