[SUPPORT] Failed Job - doing partition and writing data - in Hudi 0.11.0
See original GitHub issueDescribe the problem you faced
We’re seeing some weird behaviour since upgrading to Hudi 0.11.0-amzn (i.e. EMR 6.7.0) from Hudi 0.10.0-amzn (EMR 6.6.0): Deltastreamer jobs are seemingly succeeding, but have a failed job now.
Is this important? What does this failure mean here - it seems to just be a column stats issue which I guess is not a hard fail, can take action to fix this? For reference, we enabled:
hoodie.metadata.enable=true
hoodie.metadata.index.column.stats.enable=true
hoodie.metadata.index.bloom.filter.enable=true
in the upgrade.
Job aborted due to stage failure: Task 5 in stage 69.0 failed 10 times, most recent failure: Lost task 5.9 in stage 69.0 (TID 9119) (ip-10-0-35-115.eu-west-1.compute.internal executor 3): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :5 +details
Job aborted due to stage failure: Task 5 in stage 69.0 failed 10 times, most recent failure: Lost task 5.9 in stage 69.0 (TID 9119) (ip-10-0-35-115.eu-west-1.compute.internal executor 3): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :5
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:329)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:244)
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:133)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1474)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while appending records to s3://<REDACTED>/.hoodie/metadata/column_stats/.col-stats-0000_20220809191455283.log.3_5-69-9119
at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:410)
at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:382)
at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:84)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322)
... 28 more
Caused by: java.lang.IllegalStateException: Writing multiple records with same key not supported for org.apache.hudi.common.table.log.block.HoodieHFileDataBlock
at org.apache.hudi.common.util.ValidationUtils.checkState(ValidationUtils.java:67)
at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:136)
at org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:131)
at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:158)
at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:404)
... 31 more
Driver stacktrace:
To Reproduce
Steps to reproduce the behavior:
Unknown at this time
Expected behavior
No error occurs in the Spark job.
Environment Description
-
Hudi version : 0.11.0-amzn
-
Spark version : 3.2.1-amzn
-
Hive version : 3.1.3
-
Hadoop version : 3.2.1
-
Storage (HDFS/S3/GCS…) : S3
-
Running on Docker? (yes/no) : no
Issue Analytics
- State:
- Created a year ago
- Comments:8 (3 by maintainers)
Top Results From Across the Web
Troubleshooting - Apache Hudi
Job 1 : Triggers the input data read, converts to HoodieRecord object and then stops at obtaining a spread of input records to...
Read more >Writing Data | Apache Hudi
The hudi-spark module offers the DataSource API to write (and read) a Spark DataFrame ... Record keys uniquely identify a record/row within each...
Read more >Configurations - Apache Hudi
This page covers the different ways of configuring your job to write/read Hudi tables. At a high level, you can control behaviour at...
Read more >FAQs | Apache Hudi
Your current job is rewriting entire table/partition to deal with updates, ... When writing data into Hudi, you model the records like how...
Read more >CLI | Apache Hudi
This is documentation for Apache Hudi 0.11.0, which is no longer actively maintained. For up-to-date documentation, see the latest version (0.12.1).
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
thanks for confirming.
wrt enabling debug logs, I know the usual way to enable for any spark job.
Sample log4j props file
Once again, this turned into errors when leaving the metadata table enabled:
Logs from the first executor failure shown above stderr.gz
Logs from the second executor failure shown above stderr2.gz