question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] Error upserting bucketType UPDATE for partition

See original GitHub issue

20210131210931.inflight.zip Hello,

Hudi Version: 0.7.0 Spark: 3.0.1 Emr 6.2.0

Spark Submit: spark-submit --deploy-mode cluster --conf spark.executor.cores=5 --conf spark.executor.memoryOverhead=3000 --conf spark.executor.memory=32g --conf spark.yarn.maxAppAttempts=1 --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --jars s3://dl/lib/spark-daria_2.12-0.38.2.jar --packages org.apache.spark:spark-avro_2.12:2.4.4,org.apache.hudi:hudi-spark-bundle_2.12:0.7.0 --class TableProcessorWrapper s3://dl/code/projects/data_projects/batch_processor_engine/batch-processor-engine_2.12-3.0.1_0.5.jar courier_api_group02

Hudi Options: Map(hoodie.datasource.hive_sync.database -> raw_courier_api_hudi, hoodie.parquet.small.file.limit -> 67108864, hoodie.copyonwrite.record.size.estimate -> 1024, hoodie.datasource.write.precombine.field -> LineCreatedTimestamp, hoodie.datasource.hive_sync.partition_fields -> created_year_month_brt_partition, hoodie.datasource.hive_sync.partition_extractor_class -> org.apache.hudi.hive.MultiPartKeysValueExtractor, hoodie.parquet.max.file.size -> 134217728, hoodie.parquet.block.size -> 67108864, hoodie.datasource.hive_sync.table -> order, hoodie.datasource.write.operation -> upsert, hoodie.datasource.hive_sync.enable -> true, hoodie.datasource.write.recordkey.field -> id, hoodie.table.name -> order, hoodie.datasource.hive_sync.jdbcurl -> jdbc:hive2://emr:10000, hoodie.datasource.write.hive_style_partitioning -> true, hoodie.datasource.write.table.name -> order, hoodie.datasource.write.keygenerator.class -> org.apache.hudi.keygen.SimpleKeyGenerator, hoodie.upsert.shuffle.parallelism -> 50, hoodie.datasource.write.partitionpath.field -> created_year_month_brt_partition)

Error: `diagnostics: User class threw exception: java.lang.Exception: Error on Table: order, Error Message: org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 28.0 failed 4 times, most recent failure: Lost task 7.3 in stage 28.0 (TID 530, ip-10-0-29-119.us-west-2.compute.internal, executor 5): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :7 at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:279) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$execute$ecf5068c$1(BaseSparkCommitActionExecutor.java:135) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:889) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:889) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:362) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1388) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360) at org.apache.spark.rdd.RDD.iterator(RDD.scala:311) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:127) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:102) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:308) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:299) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:272) … 28 more Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:143) at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100) … 31 more Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141) … 32 more Caused by: org.apache.hudi.exception.HoodieException: operation has failed at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:247) at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:226) at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52) at org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:277) at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36) at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121) at java.util.concurrent.FutureTask.run(FutureTask.java:266) … 3 more Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file s3://dl/courier_api/order/created_year_month_brt_partition=202012/a71490e9-d2e7-4ecf-b48a-6b7046770841-0_43-11441-0_20210131205623.parquet at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251) at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132) at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136) at org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49) at org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45) at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) … 4 more Caused by: java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainLongDictionary at org.apache.parquet.column.Dictionary.decodeToBinary(Dictionary.java:41) at org.apache.parquet.avro.AvroConverters$BinaryConverter.setDictionary(AvroConverters.java:75) at org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:341) at org.apache.parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:80) at org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:75) at org.apache.parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:271) at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147) at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109) at org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165) at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109) at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137) at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222) … 11 more

Driver stacktrace: at jobs.TableProcessor.start(TableProcessor.scala:101) at TableProcessorWrapper$.$anonfun$main$2(TableProcessorWrapper.scala:23) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) at scala.util.Success.$anonfun$map$1(Try.scala:255) at scala.util.Success.map(Try.scala:213) at scala.concurrent.Future.$anonfun$map$1(Future.scala:292) at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)

 ApplicationMaster host: ip-10-0-19-128.us-west-2.compute.internal
 ApplicationMaster RPC port: 45559
 queue: default
 start time: 1612127355095
 final status: FAILED
 tracking URL: http://ip-10-0-29-186.us-west-2.compute.internal:20888/proxy/application_1612125097081_0004/
 user: hadoop`

I attach the inflight commit. 20210131210931.inflight.zip

Captura de Tela 2021-01-31 às 18 40 18 Captura de Tela 2021-01-31 às 18 39 28

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:18 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
rubenssotocommented, Feb 3, 2021

@nsivabalan I only have this problem in one table, so, would be good it works in the future, but for now, it’s fine.

thanks for asking, you are the best! 😃

0reactions
nsivabalancommented, Aug 31, 2021

Closing this due to no activity and since we could not reproduce. Please re-open if are you are still having issues.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Error upserting bucketType UPDATE for partition #, No value ...
HoodieUpsertException : Error upserting bucketType UPDATE for partition :5 at com.uber.hoodie.table.HoodieCopyOnWriteTable.
Read more >
[SUPPORT\] Error upserting bucketType UPDATE for partition
HoodieUpsertException: Error upserting bucketType UPDATE for partition :7 at org.apache.hudi.table.action.commit.
Read more >
[#HUDI-301] Failed to update a non-partition MOR table
We met this exception when trying to update a field for a non-partition MOR table. org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType ...
Read more >
org.apache.hudi.exception.HoodieUpsertException: Failed to ...
HoodieUpsertException: Failed to upsert for commit time - PySpark Unable to ... back to azure storage I am getting the following error.
Read more >
HoodieKeyException Is Reported When Data Is ... - 华为云
Is it possible to use a nullable field that contains null records as a primary key when creating a Hudi table?No. HoodieKeyException will...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found