Incorrect table definition after executing rollback_to_snapshot procedure
See original GitHub issueApache Iceberg version
0.14.0 (latest release)
Query engine
Spark
Please describe the bug 🐞
Spark returns the latest table definition even after executing rollback_to_snapshot
procedure.
Steps to reproduce
> CREATE TABLE test USING iceberg AS SELECT 1 c1;
> ALTER TABLE test ADD COLUMN c2 int;
> INSERT INTO test VALUES (1, 1);
> SELECT * FROM iceberg_test.default.test.snapshots;
2022-08-19 07:32:29.499 2770581293596517273 ...
2022-08-19 07:32:50.006 6893045681966948046 ...
> DESC iceberg_test.default.test.snapshot_id_2770581293596517273;
c1 int
# Partitioning
Not partitioned
> CALL iceberg_test.system.rollback_to_snapshot('default.test', 2770581293596517273);
> DESC iceberg_test.default.test;
c1 int
c2 int
The result is same even after I executed REFRESH TABLE iceberg_test.default.test
after rollback_to_snapshot
.
Relates to https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1660895079836159 Trino issue https://github.com/trinodb/trino/issues/13699
Issue Analytics
- State:
- Created a year ago
- Comments:8 (7 by maintainers)
Top Results From Across the Web
snapshot too old error - Ask TOM
ORA-01555: snapshot too old: rollback segment number # with name "???" too small then this means this is a read consistent failure on...
Read more >How to rollback using explicit SQL Server transactions
This demonstration shows that an explicit transaction rollbacks a transaction, but it cannot revert the identity value. It is the reason we see ......
Read more >Database Engine events and errors - SQL Server
In this article. The table contains error message numbers and the description, which is the text of the error message from the sys.messages...
Read more >ORA-01555 Snapshot Too Old - Burleson Consulting
Cause: Rollback records needed by a reader for consistent read are overwritten by other writers. Action: If in Automatic Undo Management mode, increase...
Read more >Error and Transaction Handling in SQL Server Part Two
Implementing Error Handling with Stored Procedures in SQL 2000. ... The data inserted into the permanent table Hot is missing after the rollback....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This isn’t a bug. Metadata and data updates are intended to be separate, although I can see why there are cases where you’d assume that they are not.
If you update the schema and commit data in a single job, then it isn’t unreasonable to assume the schema change would be rolled back. But if I concurrently add a column while someone else commits, then a rollback should be independent. Expectations can go both ways.
While expectations differ, Iceberg never rolls back to a previous schema because that operation is unsafe. For example, if someone deletes a required column and then tries to roll that back, there may have been data written without that column. You can recover the column, but you need to make it optional (or in the future, set a read default).
From end-user perspective, there is a difference between
querying table state at given snapshot – at least in Trino, this uses the “schema current at that time”, so includes columns that have been dropped since then
query table state after rollback_to_snapshot – if this uses current schema, this doesn’t include columns that have been dropped since the snapshot
Now consider example
As a user, i would expect to see
order_data
column back in my table. Per this issue, i understand this wouldn’t be the case. As a user I would call it a data loss (and so a bug).cc @alexjo2144 @electrum