Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error happened after deleting a partitioned column

See original GitHub issue

error message:

 Caused by: java.lang.NullPointerException: Cannot find source column: 3
	at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:953) ~[iceberg-bundled-guava-0.13.2.jar:na]
	at org.apache.iceberg.PartitionSpec$Builder.add(PartitionSpec.java:503) ~[iceberg-api-0.13.2.jar:na]
	at org.apache.iceberg.PartitionSpecParser.buildFromJsonFields(PartitionSpecParser.java:155) ~[iceberg-core-0.13.2.jar:na]
	at org.apache.iceberg.PartitionSpecParser.fromJson(PartitionSpecParser.java:78) ~[iceberg-core-0.13.2.jar:na]
	at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:357) ~[iceberg-core-0.13.2.jar:na]
	at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:288) ~[iceberg-core-0.13.2.jar:na]

json of metadata file contains information of schemas/partition-specs/sort-orders. But there is no link between schemas and partition-specs, thus deleting a partitioned column will raise error while building history partition-specs, because source-id could not be found in current schema. I think that schema-id should be add to json of partition-specs. part of metadata file:

    "last-column-id":3,
    "current-schema-id":1,
    "schemas":[
        {
            "type":"struct",
            "schema-id":0,
            "fields":[
                {
                    "id":1,
                    "name":"name1",
                    "required":false,
                    "type":"string"
                },
                {
                    "id":2,
                    "name":"name2",
                    "required":false,
                    "type":"string"
                },
                {
                    "id":3,
                    "name":"name3",
                    "required":false,
                    "type":"string"
                }
            ]
        },
        {
            "type":"struct",
            "schema-id":1,
            "fields":[
                {
                    "id":1,
                    "name":"name1",
                    "required":false,
                    "type":"string"
                },
                {
                    "id":2,
                    "name":"name2",
                    "required":false,
                    "type":"string"
                }
            ]
        }
    ],
    "default-spec-id":1,
    "partition-specs":[
        {
            "spec-id":0,
            "fields":[
                {
                    "name":"name3",
                    "transform":"identity",
                    "source-id":3,
                    "field-id":1000
                }
            ]
        },
        {
            "spec-id":1,
            "fields":[

            ]
        }
    ],
    "last-partition-id":1000

Issue Analytics

State:
Created a year ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

lvyanquancommented, Aug 2, 2022

@lvyanquan:

Do you have a testcase or sample SQL to reproduce this? As I want to know when do we build the history partition-specs

we can reproduce this error using the following sql (spark3.2, iceberg0.13 or 0.14), prod is the name of catalog:

CREATE TABLE prod.db.sample (id bigint, data string, category string) USING iceberg PARTITIONED BY (category) TBLPROPERTIES(‘format-version’ = ‘2’);

ALTER TABLE prod.db.sample DROP PARTITION FIELD category;

ALTER TABLE prod.db.sample DROP COLUMN category; Even though I deleted this column using JAVA API, I met NullPointerException when using this table.

0reactions

Fokkocommented, Sep 21, 2022

Hey all, I have a PR ready: https://github.com/apache/iceberg/pull/5707 This doesn’t lookup the historical columns anymore.

Top Results From Across the Web

Dataloss occurs when deleting a field in cmdb hierarchy ...

This error happens when that table has a column mapped to the same storage alias as was used by the column that was...

FIX: Exception occurs during parallel processing of partitions if ...

Fixes an issue that triggers an exception during parallel processing of partitions if a column is re-encoded. Occurs in a SQL Server 2016...

Data Loaded wrongly into Hive Partitioned table after adding a ...

When you add new column it is being added as the last non-partition column, partition columns remain the last ones, they are not...

Dropping a Partition from a Table that Contains Data and ...

Issue the ALTER TABLE DROP PARTITION statement without maintaining global indexes. Afterward, you must rebuild any global indexes (whether partitioned or not) ...

Delete/update on hadoop partitioned table in Hive

Steps as below. 1) Create Temp table with same columns. 2) Overwrite table with required row data. 3)Drop Hive partitions and HDFS directory....