Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] SaveMode.Append fails on renamed hudi tables

See original GitHub issue

Describe the problem you faced

Hello team, we recently upgraded from emr-5.30.2 to 5.31.1 and noticed failure in our pipelines doing incremental append to hudi tables.

Issue : SaveMode.Append throws exception and fails on renamed hudi tables affects hudi 0.6 and above.

To Reproduce

Steps to reproduce the behavior:

Create a hudi table with s3 path
Rename the table using spark.sql(s"ALTER TABLE $oldTableName RENAME TO $newTableName")
Use spark df.write with mode("append") to save into newTableName
Exception is thrown

Expected behavior

SaveMode.Append works for renamed tables, when using new table name DataSourceWriteOptions.HIVE_TABLE_OPT_KEY -> $newTableName.

Environment Description EMR-5.31.1

Hudi version : Hudi 0.6
Spark version : 2.4.6
Hive version : 2.3.7
Hadoop version : 2.10.0
Storage (HDFS/S3/GCS…) : S3
Running on Docker? (yes/no) : No

Additional context

Related code : HoodieSparkSqlWriter.scala#L295

HiveTableConfig.tableName is set from .hoodie/hoodie.properties file. When the table is renamed with spark sql, HoodieSparkSqlWriter is still expecting the existing table name from HiveTableConfig to match the new table name.

Stacktrace

org.apache.hudi.exception.HoodieException: hoodie table with name <old_table_name> already exists at s3://<table-path>
  at org.apache.hudi.HoodieSparkSqlWriter$.handleSaveModes(HoodieSparkSqlWriter.scala:297)
  at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:109)
  at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125)
  at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:173)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:169)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:197)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:194)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:169)
  at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:114)
  at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:112)
  at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:677)
  at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:677)
  at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$executeQuery$1(SQLExecution.scala:83)
  at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:94)
  at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141)
  at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$withMetrics(SQLExecution.scala:178)
  at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:93)
  at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:200)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:92)
  at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:677)
  at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:286)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:272)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:230)

Issue Analytics

State:
Created 2 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

YannByroncommented, Jan 26, 2022

@ranjitha-shenoy i also guess so. Maybe this is a bug for hudi 0.6, and we can’t patch a bugfix for this old version.

0reactions

ranjitha-shenoycommented, Jan 25, 2022

@YannByron I have not been able to test it with hudi 0.10, but I believe the introduction of table name check HoodieSparkSqlWriter.scala#L295 started the issue, of not being able to append on renamed tables.

Top Results From Across the Web

Writing Data | Apache Hudi

Note: After the initial creation of a table, this value must stay consistent when writing to (updating) the table using the Spark SaveMode.Append...

Schema Evolution - Apache Hudi

Schema evolution allows users to easily change the current schema of a Hudi table to adapt to the data that is changing over...

FAQs | Apache Hudi

When querying/reading data, Hudi just presents itself as a json-like hierarchical table, everyone is used to querying using Hive/Spark/Presto over Parquet/Json/ ...

All Configurations | Apache Hudi

Comma separated list of file paths to read within a Hudi table. ... .option(HoodieWriteConfig.TABLE_NAME, tableName) .mode(SaveMode.Append) .save(basePath);

FAQs - Apache Hudi

Your current job is rewriting entire table/partition to deal with updates, ... Even for append-only data streams, Hudi supports key based ...