Error happened after deleting a partitioned column
See original GitHub issueerror message:
Caused by: java.lang.NullPointerException: Cannot find source column: 3
at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:953) ~[iceberg-bundled-guava-0.13.2.jar:na]
at org.apache.iceberg.PartitionSpec$Builder.add(PartitionSpec.java:503) ~[iceberg-api-0.13.2.jar:na]
at org.apache.iceberg.PartitionSpecParser.buildFromJsonFields(PartitionSpecParser.java:155) ~[iceberg-core-0.13.2.jar:na]
at org.apache.iceberg.PartitionSpecParser.fromJson(PartitionSpecParser.java:78) ~[iceberg-core-0.13.2.jar:na]
at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:357) ~[iceberg-core-0.13.2.jar:na]
at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:288) ~[iceberg-core-0.13.2.jar:na]
json of metadata file contains information of schemas/partition-specs/sort-orders. But there is no link between schemas and partition-specs, thus deleting a partitioned column will raise error while building history partition-specs, because source-id could not be found in current schema. I think that schema-id should be add to json of partition-specs. part of metadata file:
"last-column-id":3,
"current-schema-id":1,
"schemas":[
{
"type":"struct",
"schema-id":0,
"fields":[
{
"id":1,
"name":"name1",
"required":false,
"type":"string"
},
{
"id":2,
"name":"name2",
"required":false,
"type":"string"
},
{
"id":3,
"name":"name3",
"required":false,
"type":"string"
}
]
},
{
"type":"struct",
"schema-id":1,
"fields":[
{
"id":1,
"name":"name1",
"required":false,
"type":"string"
},
{
"id":2,
"name":"name2",
"required":false,
"type":"string"
}
]
}
],
"default-spec-id":1,
"partition-specs":[
{
"spec-id":0,
"fields":[
{
"name":"name3",
"transform":"identity",
"source-id":3,
"field-id":1000
}
]
},
{
"spec-id":1,
"fields":[
]
}
],
"last-partition-id":1000
Issue Analytics
- State:
- Created a year ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
Dataloss occurs when deleting a field in cmdb hierarchy ...
This error happens when that table has a column mapped to the same storage alias as was used by the column that was...
Read more >FIX: Exception occurs during parallel processing of partitions if ...
Fixes an issue that triggers an exception during parallel processing of partitions if a column is re-encoded. Occurs in a SQL Server 2016...
Read more >Data Loaded wrongly into Hive Partitioned table after adding a ...
When you add new column it is being added as the last non-partition column, partition columns remain the last ones, they are not...
Read more >Dropping a Partition from a Table that Contains Data and ...
Issue the ALTER TABLE DROP PARTITION statement without maintaining global indexes. Afterward, you must rebuild any global indexes (whether partitioned or not) ...
Read more >Delete/update on hadoop partitioned table in Hive
Steps as below. 1) Create Temp table with same columns. 2) Overwrite table with required row data. 3)Drop Hive partitions and HDFS directory....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
we can reproduce this error using the following sql (spark3.2, iceberg0.13 or 0.14), prod is the name of catalog:
CREATE TABLE prod.db.sample (id bigint, data string, category string) USING iceberg PARTITIONED BY (category) TBLPROPERTIES(‘format-version’ = ‘2’);
ALTER TABLE prod.db.sample DROP PARTITION FIELD category;
ALTER TABLE prod.db.sample DROP COLUMN category; Even though I deleted this column using JAVA API, I met NullPointerException when using this table.
Hey all, I have a PR ready: https://github.com/apache/iceberg/pull/5707 This doesn’t lookup the historical columns anymore.