question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Core: Metadata min/max stats nulled after updating partition spec and rewriting manifests

See original GitHub issue

Apache Iceberg version

0.14.0 (latest release)

Query engine

Spark

Please describe the bug 🐞

If we create a table with format-version = 2 and replace the partition spec after table creation, we are unable to read data from the table after executing a rewrite_manifests

If we look in the metadata after the manifest rewrite, the min/max stats for the partition has been nulled in both the manifest and the manifest list. This means that all queries on the table using metadata pruning will not return any results.

The issue does not appear if the table has the correct partition spec from creation, or if using format-version = 1

See the attached script for steps to reproduce, and similar steps to show that it works correctly with format 1 tables, or when the partition spec is specified on table creation and not modified after.

We are using Spark 3.3.0 with Iceberg 0.14.0, and reproduce the issue easily with a few steps in spark shell on a newly created table as in the attached script.

script.scala.zip

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
Fokkocommented, Sep 2, 2022

This has been fixed in https://github.com/apache/iceberg/pull/5691, thanks @rdblue. And thanks @dotjdk for reporting, much appreciated. Otherwise we wouldn’t have caught this 🐛

1reaction
Fokkocommented, Aug 29, 2022

Found the root issue: image

It turns out that we select all the partition fields (the old and new ones), but we only update the statistics on the current partition keys, and this one is null. PR follows.

Read more comments on GitHub >

github_iconTop Results From Across the Web

issues - The Mail Archive
2022/08/26 [GitHub] [iceberg] Fokko commented on issue #5641: Core: Metadata min/max stats nulled after updating partition spec and rewriting manifests ...
Read more >
Taking Query Optimizations to the Next Level with Iceberg
Manifests are Avro files that contain file-level metadata and statistics. The diagram below provides a logical view of how readers interact ...
Read more >
Impala 4.0 Change Log
ERROR; [IMPALA-9946] - Use table id when comparing the the transactional state of the table; [IMPALA-9956] - Inlining functions in Sorter::Partition() gives ...
Read more >
Kubernetes Hardening Guide
implement recommended hardening measures and mitigations when deploying. Kubernetes. This guide details the following mitigations:.
Read more >
grafana/CHANGELOG.md at main - GitHub
Bug fixes · Access Control: Clear user's permission cache after resource creation. · Azure Monitor: Fix empty/errored responses for Logs variables. · Azure...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found