question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] insert_ovewrite_table failed on archiving

See original GitHub issue

Tips before filing an issue

  • Have you gone through our FAQs?

  • Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced Configs

hoodie.cleaner.incremental.mode -> true
hoodie.insert.shuffle.parallelism -> 20
hoodie.datasource.write.precombine.field -> daas_internal_ts
hoodie.clean.automatic -> false
hoodie.datasource.write.operation -> insert_overwrite_table
hoodie.datasource.write.recordkey.field -> guid
hoodie.table.name -> xxxxx
hoodie.datasource.write.table.type -> MERGE_ON_READ
hoodie.datasource.write.hive_style_partitioning -> true
hoodie.consistency.check.enabled -> true
hoodie.cleaner.policy -> KEEP_LATEST_FILE_VERSIONS
hoodie.datasource.write.keygenerator.class -> org.apache.hudi.keygen.ComplexKeyGenerator
hoodie.keep.max.commits -> 3
hoodie.cleaner.commits.retained -> 1
hoodie.keep.min.commits -> 2
hoodie.datasource.write.partitionpath.field -> 
hoodie.compact.inline.max.delta.commits -> 1

after write 5 times we got below error

21/03/23 09:29:38 ERROR MonitoringUtils$: sendFailureMetric - One of the batch jobs failed
org.apache.hudi.exception.HoodieCommitException: Failed to archive commits
	at org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:322)
	at org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:138)
	at org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:426)
	at org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:188)
	at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:110)
	at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:442)
	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:218)
	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134)
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:156)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:83)
	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:84)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
	at jp.ne.paypay.daas.dataloader.writer.HudiDataWriter.write(HudiDataWriter.scala:272)
	at jp.ne.paypay.daas.dataloader.writer.HudiDataWriter.insertOverrideTable(HudiDataWriter.scala:161)
	at jp.ne.paypay.daas.dataloader.FileSystemJob$.mainProcedure(FileSystemJob.scala:107)
	at jp.ne.paypay.daas.dataloader.FileSystemJob$.main(FileSystemJob.scala:38)
	at jp.ne.paypay.daas.dataloader.FileSystemJob.main(FileSystemJob.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
Caused by: java.lang.IllegalArgumentException: Positive number of partitions required
	at org.apache.spark.rdd.ParallelCollectionRDD$.slice(ParallelCollectionRDD.scala:119)
	at org.apache.spark.rdd.ParallelCollectionRDD.getPartitions(ParallelCollectionRDD.scala:97)
	at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:253)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
	at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:253)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
	at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:945)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:944)
	at org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:361)
	at org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:360)
	at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
	at org.apache.hudi.client.common.HoodieSparkEngineContext.map(HoodieSparkEngineContext.java:73)
	at org.apache.hudi.client.ReplaceArchivalHelper.deleteReplacedFileGroups(ReplaceArchivalHelper.java:72)
	at org.apache.hudi.table.HoodieTimelineArchiveLog.deleteReplacedFileGroups(HoodieTimelineArchiveLog.java:341)
	at org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:303)
	... 36 more

What we found: Seems first commit don’t have any partitionToReplaceFileIds, and it was required when archiving files

client.getCommitTimeline.getInstants.forEach(instant => {
      val metadata = HoodieReplaceCommitMetadata.fromBytes(client.getActiveTimeline.getInstantDetails(instant).get, classOf[HoodieReplaceCommitMetadata])
      val a = metadata.getPartitionToReplaceFileIds()
      println(s"${instant.toString}: size : ${a.size}")
    })

// Exiting paste mode, now interpreting.

[20210323080718__replacecommit__COMPLETED]: size : 0
[20210323081449__replacecommit__COMPLETED]: size : 1
[20210323082046__replacecommit__COMPLETED]: size : 1
[20210323082758__replacecommit__COMPLETED]: size : 1
[20210323084004__replacecommit__COMPLETED]: size : 1
[20210323085044__replacecommit__COMPLETED]: size : 1
[20210323085823__replacecommit__COMPLETED]: size : 1
[20210323090550__replacecommit__COMPLETED]: size : 1
[20210323091700__replacecommit__COMPLETED]: size : 1

A clear and concise description of the problem.

To Reproduce

Steps to reproduce the behavior:

  1. set the config as we set
  2. write 5 times

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

  • Hudi version : 0.7.0

  • Spark version : 2.4.4

  • Hive version : x

  • Hadoop version : 3.1.0

  • Storage (HDFS/S3/GCS…) : S3

  • Running on Docker? (yes/no) : no

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:23 (23 by maintainers)

github_iconTop GitHub Comments

2reactions
n3nashcommented, Apr 8, 2021

@ssdong Thanks for opening the PR! Closing this issue now

0reactions
ssdongcommented, Apr 7, 2021

Hi @satishkotha @jsbali ! I’ve created the pull request for this issue. Had observed more when going down the road and I’ve tried my best to clarify them and hopefully had written a detailed enough description for the PR. Let me know. Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

INSERT OVERWRITE not allowed" when BDE Mapping ...
​In an environment where Hive has been configured to support transactions (in order to allow updates on the Hive tables), BDE Mappings ...
Read more >
[SUPPORT] INSERT OVERWRITE operation does not work ...
Disclaimer: Creating and inserting into external hive tables stored on S3. The INSERT OVERWRITE operation does not work when using spark SQL.
Read more >
Insert overwrite query failing with Execution Error, return code ...
I am a running a insert overwrite query as below. Insert overwrite directory '/org/data/tmp/webapptempspace/UC3/log' select a. * from a join b on ucase(a.name) ......
Read more >
Error insert overwrite from orc table to avro table?
I think insert statement would be as : INSERT OVERWRITE TABLE cm PARTITION (year) SELECT a,b,p_year FROM orctable.
Read more >
Presto Known Limitations - Product Documentation
Presto does not support INSERT OVERWRITE Statements. Make sure that you delete the table before using INSERT INTO.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found