question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error happened after deleting a partitioned column

See original GitHub issue

error message:

 Caused by: java.lang.NullPointerException: Cannot find source column: 3
	at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:953) ~[iceberg-bundled-guava-0.13.2.jar:na]
	at org.apache.iceberg.PartitionSpec$Builder.add(PartitionSpec.java:503) ~[iceberg-api-0.13.2.jar:na]
	at org.apache.iceberg.PartitionSpecParser.buildFromJsonFields(PartitionSpecParser.java:155) ~[iceberg-core-0.13.2.jar:na]
	at org.apache.iceberg.PartitionSpecParser.fromJson(PartitionSpecParser.java:78) ~[iceberg-core-0.13.2.jar:na]
	at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:357) ~[iceberg-core-0.13.2.jar:na]
	at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:288) ~[iceberg-core-0.13.2.jar:na]

json of metadata file contains information of schemas/partition-specs/sort-orders. But there is no link between schemas and partition-specs, thus deleting a partitioned column will raise error while building history partition-specs, because source-id could not be found in current schema. I think that schema-id should be add to json of partition-specs. part of metadata file:

    "last-column-id":3,
    "current-schema-id":1,
    "schemas":[
        {
            "type":"struct",
            "schema-id":0,
            "fields":[
                {
                    "id":1,
                    "name":"name1",
                    "required":false,
                    "type":"string"
                },
                {
                    "id":2,
                    "name":"name2",
                    "required":false,
                    "type":"string"
                },
                {
                    "id":3,
                    "name":"name3",
                    "required":false,
                    "type":"string"
                }
            ]
        },
        {
            "type":"struct",
            "schema-id":1,
            "fields":[
                {
                    "id":1,
                    "name":"name1",
                    "required":false,
                    "type":"string"
                },
                {
                    "id":2,
                    "name":"name2",
                    "required":false,
                    "type":"string"
                }
            ]
        }
    ],
    "default-spec-id":1,
    "partition-specs":[
        {
            "spec-id":0,
            "fields":[
                {
                    "name":"name3",
                    "transform":"identity",
                    "source-id":3,
                    "field-id":1000
                }
            ]
        },
        {
            "spec-id":1,
            "fields":[

            ]
        }
    ],
    "last-partition-id":1000

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
lvyanquancommented, Aug 2, 2022

@lvyanquan:

Do you have a testcase or sample SQL to reproduce this? As I want to know when do we build the history partition-specs

we can reproduce this error using the following sql (spark3.2, iceberg0.13 or 0.14), prod is the name of catalog:

CREATE TABLE prod.db.sample (id bigint, data string, category string) USING iceberg PARTITIONED BY (category) TBLPROPERTIES(‘format-version’ = ‘2’);

ALTER TABLE prod.db.sample DROP PARTITION FIELD category;

ALTER TABLE prod.db.sample DROP COLUMN category; Even though I deleted this column using JAVA API, I met NullPointerException when using this table.

0reactions
Fokkocommented, Sep 21, 2022

Hey all, I have a PR ready: https://github.com/apache/iceberg/pull/5707 This doesn’t lookup the historical columns anymore.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Dataloss occurs when deleting a field in cmdb hierarchy ...
This error happens when that table has a column mapped to the same storage alias as was used by the column that was...
Read more >
FIX: Exception occurs during parallel processing of partitions if ...
Fixes an issue that triggers an exception during parallel processing of partitions if a column is re-encoded. Occurs in a SQL Server 2016...
Read more >
Data Loaded wrongly into Hive Partitioned table after adding a ...
When you add new column it is being added as the last non-partition column, partition columns remain the last ones, they are not...
Read more >
Dropping a Partition from a Table that Contains Data and ...
Issue the ALTER TABLE DROP PARTITION statement without maintaining global indexes. Afterward, you must rebuild any global indexes (whether partitioned or not) ...
Read more >
Delete/update on hadoop partitioned table in Hive
Steps as below. 1) Create Temp table with same columns. 2) Overwrite table with required row data. 3)Drop Hive partitions and HDFS directory....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found