Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] insert_ovewrite_table failed on archiving

See original GitHub issue

Tips before filing an issue

Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced Configs

hoodie.cleaner.incremental.mode -> true
hoodie.insert.shuffle.parallelism -> 20
hoodie.datasource.write.precombine.field -> daas_internal_ts
hoodie.clean.automatic -> false
hoodie.datasource.write.operation -> insert_overwrite_table
hoodie.datasource.write.recordkey.field -> guid
hoodie.table.name -> xxxxx
hoodie.datasource.write.table.type -> MERGE_ON_READ
hoodie.datasource.write.hive_style_partitioning -> true
hoodie.consistency.check.enabled -> true
hoodie.cleaner.policy -> KEEP_LATEST_FILE_VERSIONS
hoodie.datasource.write.keygenerator.class -> org.apache.hudi.keygen.ComplexKeyGenerator
hoodie.keep.max.commits -> 3
hoodie.cleaner.commits.retained -> 1
hoodie.keep.min.commits -> 2
hoodie.datasource.write.partitionpath.field -> 
hoodie.compact.inline.max.delta.commits -> 1

after write 5 times we got below error

21/03/23 09:29:38 ERROR MonitoringUtils$: sendFailureMetric - One of the batch jobs failed
org.apache.hudi.exception.HoodieCommitException: Failed to archive commits
	at org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:322)
	at org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:138)
	at org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:426)
	at org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:188)
	at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:110)
	at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:442)
	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:218)
	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134)
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:156)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:83)
	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:84)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
	at jp.ne.paypay.daas.dataloader.writer.HudiDataWriter.write(HudiDataWriter.scala:272)
	at jp.ne.paypay.daas.dataloader.writer.HudiDataWriter.insertOverrideTable(HudiDataWriter.scala:161)
	at jp.ne.paypay.daas.dataloader.FileSystemJob$.mainProcedure(FileSystemJob.scala:107)
	at jp.ne.paypay.daas.dataloader.FileSystemJob$.main(FileSystemJob.scala:38)
	at jp.ne.paypay.daas.dataloader.FileSystemJob.main(FileSystemJob.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
Caused by: java.lang.IllegalArgumentException: Positive number of partitions required
	at org.apache.spark.rdd.ParallelCollectionRDD$.slice(ParallelCollectionRDD.scala:119)
	at org.apache.spark.rdd.ParallelCollectionRDD.getPartitions(ParallelCollectionRDD.scala:97)
	at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:253)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
	at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:253)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
	at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:945)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:944)
	at org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:361)
	at org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:360)
	at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
	at org.apache.hudi.client.common.HoodieSparkEngineContext.map(HoodieSparkEngineContext.java:73)
	at org.apache.hudi.client.ReplaceArchivalHelper.deleteReplacedFileGroups(ReplaceArchivalHelper.java:72)
	at org.apache.hudi.table.HoodieTimelineArchiveLog.deleteReplacedFileGroups(HoodieTimelineArchiveLog.java:341)
	at org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:303)
	... 36 more

What we found: Seems first commit don’t have any partitionToReplaceFileIds, and it was required when archiving files

client.getCommitTimeline.getInstants.forEach(instant => {
      val metadata = HoodieReplaceCommitMetadata.fromBytes(client.getActiveTimeline.getInstantDetails(instant).get, classOf[HoodieReplaceCommitMetadata])
      val a = metadata.getPartitionToReplaceFileIds()
      println(s"${instant.toString}: size : ${a.size}")
    })

// Exiting paste mode, now interpreting.

[20210323080718__replacecommit__COMPLETED]: size : 0
[20210323081449__replacecommit__COMPLETED]: size : 1
[20210323082046__replacecommit__COMPLETED]: size : 1
[20210323082758__replacecommit__COMPLETED]: size : 1
[20210323084004__replacecommit__COMPLETED]: size : 1
[20210323085044__replacecommit__COMPLETED]: size : 1
[20210323085823__replacecommit__COMPLETED]: size : 1
[20210323090550__replacecommit__COMPLETED]: size : 1
[20210323091700__replacecommit__COMPLETED]: size : 1

A clear and concise description of the problem.

To Reproduce

Steps to reproduce the behavior:

set the config as we set
write 5 times

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Hudi version : 0.7.0
Spark version : 2.4.4
Hive version : x
Hadoop version : 3.1.0
Storage (HDFS/S3/GCS…) : S3
Running on Docker? (yes/no) : no

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

Issue Analytics

State:
Created 2 years ago
Comments:23 (23 by maintainers)

Top GitHub Comments

2reactions

n3nashcommented, Apr 8, 2021

@ssdong Thanks for opening the PR! Closing this issue now

0reactions

ssdongcommented, Apr 7, 2021

Hi @satishkotha @jsbali ! I’ve created the pull request for this issue. Had observed more when going down the road and I’ve tried my best to clarify them and hopefully had written a detailed enough description for the PR. Let me know. Thanks!

Top Results From Across the Web

INSERT OVERWRITE not allowed" when BDE Mapping ...

In an environment where Hive has been configured to support transactions (in order to allow updates on the Hive tables), BDE Mappings ...

[SUPPORT] INSERT OVERWRITE operation does not work ...

Disclaimer: Creating and inserting into external hive tables stored on S3. The INSERT OVERWRITE operation does not work when using spark SQL.

Insert overwrite query failing with Execution Error, return code ...

I am a running a insert overwrite query as below. Insert overwrite directory '/org/data/tmp/webapptempspace/UC3/log' select a. * from a join b on ucase(a.name) ......

Error insert overwrite from orc table to avro table?

I think insert statement would be as : INSERT OVERWRITE TABLE cm PARTITION (year) SELECT a,b,p_year FROM orctable.

Presto Known Limitations - Product Documentation

Presto does not support INSERT OVERWRITE Statements. Make sure that you delete the table before using INSERT INTO.