[SUPPORT] SaveMode.Append fails on renamed hudi tables
See original GitHub issueDescribe the problem you faced
Hello team, we recently upgraded from emr-5.30.2 to 5.31.1 and noticed failure in our pipelines doing incremental append to hudi tables.
Issue : SaveMode.Append throws exception and fails on renamed hudi tables affects hudi 0.6 and above.
To Reproduce
Steps to reproduce the behavior:
- Create a hudi table with s3 path
- Rename the table using
spark.sql(s"ALTER TABLE $oldTableName RENAME TO $newTableName")
- Use spark df.write with
mode("append")
to save intonewTableName
- Exception is thrown
Expected behavior
SaveMode.Append works for renamed tables, when using new table name DataSourceWriteOptions.HIVE_TABLE_OPT_KEY -> $newTableName.
Environment Description EMR-5.31.1
-
Hudi version : Hudi 0.6
-
Spark version : 2.4.6
-
Hive version : 2.3.7
-
Hadoop version : 2.10.0
-
Storage (HDFS/S3/GCS…) : S3
-
Running on Docker? (yes/no) : No
Additional context
Related code : HoodieSparkSqlWriter.scala#L295
HiveTableConfig.tableName is set from .hoodie/hoodie.properties file. When the table is renamed with spark sql, HoodieSparkSqlWriter is still expecting the existing table name from HiveTableConfig to match the new table name.
Stacktrace
org.apache.hudi.exception.HoodieException: hoodie table with name <old_table_name> already exists at s3://<table-path>
at org.apache.hudi.HoodieSparkSqlWriter$.handleSaveModes(HoodieSparkSqlWriter.scala:297)
at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:109)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:173)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:169)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:197)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:194)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:169)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:114)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:112)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:677)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:677)
at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$executeQuery$1(SQLExecution.scala:83)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:94)
at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141)
at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$withMetrics(SQLExecution.scala:178)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:93)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:200)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:92)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:677)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:286)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:272)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:230)
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
@ranjitha-shenoy i also guess so. Maybe this is a bug for hudi 0.6, and we can’t patch a bugfix for this old version.
@YannByron I have not been able to test it with hudi 0.10, but I believe the introduction of table name check HoodieSparkSqlWriter.scala#L295 started the issue, of not being able to append on renamed tables.