Core: Metadata min/max stats nulled after updating partition spec and rewriting manifests
See original GitHub issueApache Iceberg version
0.14.0 (latest release)
Query engine
Spark
Please describe the bug 🐞
If we create a table with format-version = 2
and replace the partition spec after table creation, we are unable to read data from the table after executing a rewrite_manifests
If we look in the metadata after the manifest rewrite, the min/max stats for the partition has been nulled in both the manifest and the manifest list. This means that all queries on the table using metadata pruning will not return any results.
The issue does not appear if the table has the correct partition spec from creation, or if using format-version = 1
See the attached script for steps to reproduce, and similar steps to show that it works correctly with format 1 tables, or when the partition spec is specified on table creation and not modified after.
We are using Spark 3.3.0 with Iceberg 0.14.0, and reproduce the issue easily with a few steps in spark shell on a newly created table as in the attached script.
Issue Analytics
- State:
- Created a year ago
- Comments:5 (5 by maintainers)
Top GitHub Comments
This has been fixed in https://github.com/apache/iceberg/pull/5691, thanks @rdblue. And thanks @dotjdk for reporting, much appreciated. Otherwise we wouldn’t have caught this 🐛
Found the root issue:
It turns out that we select all the partition fields (the old and new ones), but we only update the statistics on the current partition keys, and this one is null. PR follows.