Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Delete/Update fails for tables with more than 1000 columns

See original GitHub issue

Apache Iceberg version

1.1.0 (latest release)

Query engine

None

Please describe the bug 🐞

In case the columns in the table are more than 1000, the update & delete operations on a V2 tables fails due to collision with the partition field id’s which starts from 1000.

Caused by: java.lang.IllegalArgumentException: Multiple entries with same key: 1001=q581316 and 1001=_partition.month at org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap.conflictException(ImmutableMap.java:376)
 at org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:370)
 at org.apache.iceberg.relocated.com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:153)
 at org.apache.iceberg.relocated.com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:115)
 at org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap$Builder.buildOrThrow(ImmutableMap.java:574)
 at org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:538)
 at org.apache.iceberg.types.IndexByName.byId(IndexByName.java:80)
 at org.apache.iceberg.types.TypeUtil.indexNameById(TypeUtil.java:161)
 at org.apache.iceberg.Schema.lazyIdToName(Schema.java:167)
 at org.apache.iceberg.Schema.<init>(Schema.java:108)
 at org.apache.iceberg.Schema.<init>(Schema.java:91)
 at org.apache.iceberg.Schema.<init>(Schema.java:83)
 at org.apache.iceberg.Schema.<init>(Schema.java:79)
 at org.apache.iceberg.mr.hive.IcebergAcidUtil.createFileReadSchemaWithVirtualColums(IcebergAcidUtil.java:89)
 at org.apache.iceberg.mr.mapreduce.IcebergInputFormat$IcebergRecordReader.readSchema(IcebergInputFormat.java:497)
 at org.apache.iceberg.mr.mapreduce.IcebergInputFormat$IcebergRecordReader.initialize(IcebergInputFormat.java:258)
 at org.apache.iceberg.mr.mapred.AbstractMapredIcebergRecordReader.<init>(AbstractMapredIcebergRecordReader.java:40)
 at org.apache.iceberg.mr.mapred.MapredIcebergInputFormat$MapredIcebergRecordReader.<init>(MapredIcebergInputFormat.java:89)
 at org.apache.iceberg.mr.mapred.MapredIcebergInputFormat.getRecordReader(MapredIcebergInputFormat.java:79)
 at org.apache.iceberg.mr.hive.HiveIcebergInputFormat.getRecordReader(HiveIcebergInputFormat.java:170)
 at org.apache.hadoop.hive.ql.io.RecordReaderWrapper.create(RecordReaderWrapper.java:72)
 at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:461) ... 27 more ]]

Issue Analytics

State:
Created 9 months ago
Comments:5 (2 by maintainers)

Top GitHub Comments

2reactions

ayushtkncommented, Dec 6, 2022

Can think of making this configurable or may be increase it to 10K atleast:

 // IDs for partition fields start at 1000
 private static final int PARTITION_DATA_ID_START = 1000;

1reaction

gaborkaszabcommented, Dec 7, 2022

@TuroczyX The agreement here is that there is no need to make this configurable and hardcoding to 10k is enough. See PR: https://github.com/apache/iceberg/pull/6369