Error when upgrading to hudi 0.12.0 from 0.9.0
See original GitHub issueWe are using spark 3.1.2 with hudi 0.9.0 in our application and AWS S3 and AWS Glue Catalog to store and expose the data ingested. As part of a source data change where some of the new records are now coming in as null but this column exists in the table schema as it was built based on earlier records which had values against these columns. Based on the some of the issues reported (eg: HUDI-4276], we identified that this issue could be resolved with upgrading to hudi 0.12.0. When upgrading hudi we are facing below error. Can you please provide info on what is causing this issue? (Only pom version changes have been done, no code changes)
Error:
org.apache.spark.sql.adapter.Spark3_1Adapter
java.lang.ClassNotFoundException: org.apache.spark.sql.adapter.Spark3_1Adapter
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.apache.hudi.SparkAdapterSupport.sparkAdapter(SparkAdapterSupport.scala:39)
at org.apache.hudi.SparkAdapterSupport.sparkAdapter$(SparkAdapterSupport.scala:29)
at org.apache.hudi.HoodieSparkUtils$.sparkAdapter$lzycompute(HoodieSparkUtils.scala:65)
at org.apache.hudi.HoodieSparkUtils$.sparkAdapter(HoodieSparkUtils.scala:65)
at org.apache.hudi.AvroConversionUtils$.convertStructTypeToAvroSchema(AvroConversionUtils.scala:150)
at org.apache.hudi.HoodieSparkSqlWriter$.bulkInsertAsRow(HoodieSparkSqlWriter.scala:540)
at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:178)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:183)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
pom.xml
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.12.12</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.1.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.1.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.12</artifactId>
<version>3.1.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hudi</groupId>
<artifactId>hudi-spark3-bundle_2.12</artifactId>
<version>0.12.0</version>
</dependency>
<dependency>
<groupId>org.mongodb.spark</groupId>
<artifactId>mongo-spark-connector_2.12</artifactId>
<version>3.0.1</version>
</dependency>
Issue Analytics
- State:
- Created a year ago
- Comments:8 (3 by maintainers)
Top Results From Across the Web
Troubleshooting - Apache Hudi
This error generally occurs when the schema has evolved in backwards incompatible way by deleting some column 'col1' and we are trying to...
Read more >Older Releases | Apache Hudi
This is a bug fix only release and no special migration steps needed when upgrading from 0.5.2. If you are upgrading from earlier...
Read more >Release 0.12.1 - Apache Hudi
Release 0.12.1 (docs). Migration Guide. This release (0.12.1) does not introduce any new table version, thus no migration is needed if you are...
Read more >Release 0.12.0 | Apache Hudi
Please take note of the following updates before upgrading to Hudi 0.12.0. ... After 0.9.0, due to some refactoring, fallback partition changed to...
Read more >Release 0.10.0 | Apache Hudi
With 0.10.0, we have made some foundational fix to metadata table and so as part of upgrade, any existing metadata table is cleaned...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@nsivabalan EMR 6.8 supports only 0.11.1 but we needed to get on 0.12.0 to have the null value issue resolved. so I upgraded spark to 3.3.0 and used the below hudi spark bundle to get it working now. Thanks for your inputs.
@navbalaraman The issue due to partition field was fixed recently by https://github.com/apache/hudi/pull/7132 Please reopen another issue if it still persists.