Handle the case that RewriteFiles and RowDelta commit the transaction at the same time
See original GitHub issueSince we’ve introduced row-level delete in format v2, so we will encounter the problem that RewriteFiles and RowDelta commit the transaction at the same time.
Assume that we have an iceberg table test
, and has the following events:
INSERT <1, 'AAA'>
INSERT <2, 'BBB'>
DELETE <1, 'AAA'>
At the timestamp t1
, someone start a rewrite action to rewrite the whole table.
At the timestamp t2
, someone start another transaction to update the rows in table test
:
DELETE <2, 'BBB'>
At the timestamp t3
, the update txn (which started from t2
) commit the txn successfully.
At the timestamp t4
, the rewrite action commit the txn successfully.
Finally, the table will have one row <2, 'BBB'>
, while in fact we should have no rows. That’s an unexpected bug after introducing format v2, and we will need solution to handle it.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:8
- Comments:17 (15 by maintainers)
Top Results From Across the Web
Handle the case that RewriteFiles and RowDelta commit the ...
Since we've introduced row-level delete in format v2, so we will encounter the problem that RewriteFiles and RowDelta commit the transaction at the...
Read more >Transaction - Apache Iceberg
A transaction for performing multiple updates to a table. ... Apply the pending changes from all actions and commit. ... RewriteFiles, newRewrite().
Read more >Why are rows affected after executing and committing the ...
It seems you are using a case-insensitive collation so the same rows will qualify every time the query runs. The row count following...
Read more >"MDB_PROBLEM: Unexpected problem - txn" - You.com
apache/icebergHandle the case that RewriteFiles and RowDelta commit the transaction at the same time#2308. Created over 1 year ago.
Read more >Transaction locking and row versioning guide - SQL Server
Delayed durable transactions commit before the transaction log record is ... Users who access a resource at the same time are said to...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
If we plan to maintain sequence number like that, then the semantic of sequence number will be: if there are some real update/insert/delete in the table data set then we should increase the sequence number, otherwise we should keep the same sequence number for the rewrite action. The previous semantic is: if there is any txn that committed to iceberg table, then the sequence number should be increased.
@RussellSpitzer If we just fail the rewrite operation as @stevenzwu said, Yes it should be no data semantic issue. The failure could be acceptable for batch jobs, I mean we could just fail the rewrite operation if someone write few new update/deletes into the base iceberg table, the batch update/deletes job is a rare operation, the probability of conflict between rewrite operation and update operation is small.
But in streaming case, that is a different story. Because the streaming job is always running and continues to commit the delta changes to iceberg table periodlly ( such as 1 minutes), the probability of conflict between rewrite operation and update operation is very large. So failing the rewrite operation and re-do the whole expensive job is unacceptable for this case. I’m trying to propose the solution that we don’t have to re-do the rewrite operation, the simplest way is following the current opportunistic concurrency control, I mean the rewrite operation could just grab the table lock again and re-commit the generated files to iceberg table with the original sequence number from base snapshot, then the delete files will always effect the rewritten data files. But it does not work for the position delete files because once we rewrite the data files then all of the row offset are changed, finally the delete files could not be applied to the rewrite files. Luckily, we streaming update/upsert job won’t produce any position delete files that delete the old data files, so that should be OK.
( btw, I provided an unit test to demonstrate why the data semantics will be an issue if just try to reuse the files from rewrite operation here https://github.com/apache/iceberg/pull/2303/commits/27e47a95d2761fa36d7724f8eac21b75b91f280c#diff-e573276c8dbbcd32f174f0e778334aeea106c67a19efabfefedf0afa4779a499R155-R218, the line will be broken because the rewrite action makes the
<2, B>
visible again.