question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] Failed Job - doing partition and writing data - in Hudi 0.11.0

See original GitHub issue

Describe the problem you faced

We’re seeing some weird behaviour since upgrading to Hudi 0.11.0-amzn (i.e. EMR 6.7.0) from Hudi 0.10.0-amzn (EMR 6.6.0): Deltastreamer jobs are seemingly succeeding, but have a failed job now.

image

image (1)

Is this important? What does this failure mean here - it seems to just be a column stats issue which I guess is not a hard fail, can take action to fix this? For reference, we enabled:

  • hoodie.metadata.enable=true
  • hoodie.metadata.index.column.stats.enable=true
  • hoodie.metadata.index.bloom.filter.enable=true

in the upgrade.

Job aborted due to stage failure: Task 5 in stage 69.0 failed 10 times, most recent failure: Lost task 5.9 in stage 69.0 (TID 9119) (ip-10-0-35-115.eu-west-1.compute.internal executor 3): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :5 +details

Job aborted due to stage failure: Task 5 in stage 69.0 failed 10 times, most recent failure: Lost task 5.9 in stage 69.0 (TID 9119) (ip-10-0-35-115.eu-west-1.compute.internal executor 3): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :5
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:329)
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:244)
	at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
	at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386)
	at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498)
	at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:133)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1474)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while appending records to s3://<REDACTED>/.hoodie/metadata/column_stats/.col-stats-0000_20220809191455283.log.3_5-69-9119
	at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:410)
	at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:382)
	at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:84)
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322)
	... 28 more
Caused by: java.lang.IllegalStateException: Writing multiple records with same key not supported for org.apache.hudi.common.table.log.block.HoodieHFileDataBlock
	at org.apache.hudi.common.util.ValidationUtils.checkState(ValidationUtils.java:67)
	at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:136)
	at org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:131)
	at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:158)
	at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:404)
	... 31 more

Driver stacktrace:

To Reproduce

Steps to reproduce the behavior:

Unknown at this time

Expected behavior

No error occurs in the Spark job.

Environment Description

  • Hudi version : 0.11.0-amzn

  • Spark version : 3.2.1-amzn

  • Hive version : 3.1.3

  • Hadoop version : 3.2.1

  • Storage (HDFS/S3/GCS…) : S3

  • Running on Docker? (yes/no) : no

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
nsivabalancommented, Aug 15, 2022

thanks for confirming.

wrt enabling debug logs, I know the usual way to enable for any spark job.

--conf spark.driver.extraJavaOptions="-Dlog4j.configuration=file:/home/hadoop/log4j.properties" --conf spark.executor.extraJavaOptions="-Dlog4j.configuration=file:/home/hadoop/log4j.properties"

Sample log4j props file

log4j.rootCategory=WARN, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
​
# Set the default spark-shell log level to WARN. When running the spark-shell, the
# log level for this class is used to overwrite the root logger's log level, so that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=WARN
​
# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
​
# hudi entries
log4j.logger.org.apache.hudi=DEBUG
0reactions
Limesscommented, Oct 16, 2022

Once again, this turned into errors when leaving the metadata table enabled:

Screenshot 2022-10-16 at 23-03-44 delta-streamer-articles_hudi_copy_on_write - Details for Stage 7 (Attempt 0)

Logs from the first executor failure shown above stderr.gz

Logs from the second executor failure shown above stderr2.gz

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting - Apache Hudi
Job 1 : Triggers the input data read, converts to HoodieRecord object and then stops at obtaining a spread of input records to...
Read more >
Writing Data | Apache Hudi
The hudi-spark module offers the DataSource API to write (and read) a Spark DataFrame ... Record keys uniquely identify a record/row within each...
Read more >
Configurations - Apache Hudi
This page covers the different ways of configuring your job to write/read Hudi tables. At a high level, you can control behaviour at...
Read more >
FAQs | Apache Hudi
Your current job is rewriting entire table/partition to deal with updates, ... When writing data into Hudi, you model the records like how...
Read more >
CLI | Apache Hudi
This is documentation for Apache Hudi 0.11.0, which is no longer actively maintained. For up-to-date documentation, see the latest version (0.12.1).
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found