Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] Failed Job - doing partition and writing data - in Hudi 0.11.0

See original GitHub issue

Describe the problem you faced

We’re seeing some weird behaviour since upgrading to Hudi 0.11.0-amzn (i.e. EMR 6.7.0) from Hudi 0.10.0-amzn (EMR 6.6.0): Deltastreamer jobs are seemingly succeeding, but have a failed job now.

image (1)

Is this important? What does this failure mean here - it seems to just be a column stats issue which I guess is not a hard fail, can take action to fix this? For reference, we enabled:

hoodie.metadata.enable=true
hoodie.metadata.index.column.stats.enable=true
hoodie.metadata.index.bloom.filter.enable=true

in the upgrade.

Job aborted due to stage failure: Task 5 in stage 69.0 failed 10 times, most recent failure: Lost task 5.9 in stage 69.0 (TID 9119) (ip-10-0-35-115.eu-west-1.compute.internal executor 3): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :5 +details

Job aborted due to stage failure: Task 5 in stage 69.0 failed 10 times, most recent failure: Lost task 5.9 in stage 69.0 (TID 9119) (ip-10-0-35-115.eu-west-1.compute.internal executor 3): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :5
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:329)
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:244)
	at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
	at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386)
	at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498)
	at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:133)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1474)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while appending records to s3://<REDACTED>/.hoodie/metadata/column_stats/.col-stats-0000_20220809191455283.log.3_5-69-9119
	at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:410)
	at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:382)
	at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:84)
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322)
	... 28 more
Caused by: java.lang.IllegalStateException: Writing multiple records with same key not supported for org.apache.hudi.common.table.log.block.HoodieHFileDataBlock
	at org.apache.hudi.common.util.ValidationUtils.checkState(ValidationUtils.java:67)
	at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:136)
	at org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:131)
	at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:158)
	at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:404)
	... 31 more

Driver stacktrace:

To Reproduce

Steps to reproduce the behavior:

Unknown at this time

Expected behavior

No error occurs in the Spark job.

Environment Description

Hudi version : 0.11.0-amzn
Spark version : 3.2.1-amzn
Hive version : 3.1.3
Hadoop version : 3.2.1
Storage (HDFS/S3/GCS…) : S3
Running on Docker? (yes/no) : no

Issue Analytics

State:
Created a year ago
Comments:8 (3 by maintainers)

Top GitHub Comments

1reaction

nsivabalancommented, Aug 15, 2022

thanks for confirming.

wrt enabling debug logs, I know the usual way to enable for any spark job.

--conf spark.driver.extraJavaOptions="-Dlog4j.configuration=file:/home/hadoop/log4j.properties" --conf spark.executor.extraJavaOptions="-Dlog4j.configuration=file:/home/hadoop/log4j.properties"

Sample log4j props file

log4j.rootCategory=WARN, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Set the default spark-shell log level to WARN. When running the spark-shell, the
# log level for this class is used to overwrite the root logger's log level, so that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=WARN

# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR

# hudi entries
log4j.logger.org.apache.hudi=DEBUG

0reactions

Limesscommented, Oct 16, 2022

Once again, this turned into errors when leaving the metadata table enabled:

Screenshot 2022-10-16 at 23-03-44 delta-streamer-articles_hudi_copy_on_write - Details for Stage 7 (Attempt 0)

Logs from the first executor failure shown above stderr.gz

Logs from the second executor failure shown above stderr2.gz

Top Results From Across the Web

Troubleshooting - Apache Hudi

Job 1 : Triggers the input data read, converts to HoodieRecord object and then stops at obtaining a spread of input records to...

Writing Data | Apache Hudi

The hudi-spark module offers the DataSource API to write (and read) a Spark DataFrame ... Record keys uniquely identify a record/row within each...

Configurations - Apache Hudi

This page covers the different ways of configuring your job to write/read Hudi tables. At a high level, you can control behaviour at...

FAQs | Apache Hudi

Your current job is rewriting entire table/partition to deal with updates, ... When writing data into Hudi, you model the records like how...

CLI | Apache Hudi

This is documentation for Apache Hudi 0.11.0, which is no longer actively maintained. For up-to-date documentation, see the latest version (0.12.1).

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

[SUPPORT] Failed Job - doing partition and writing data - in Hudi 0.11.0

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

[SUPPORT] HELP :: Using TWO FIELDS to precombine :: 'hoodie.datasource.write.precombine.field': "column1,column2"

[SUPPORT] Hudi error while running HoodieMultiTableDeltaStreamer: Commit 20220809112130103 failed and rolled-back !