question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error when upgrading to hudi 0.12.0 from 0.9.0

See original GitHub issue

We are using spark 3.1.2 with hudi 0.9.0 in our application and AWS S3 and AWS Glue Catalog to store and expose the data ingested. As part of a source data change where some of the new records are now coming in as null but this column exists in the table schema as it was built based on earlier records which had values against these columns. Based on the some of the issues reported (eg: HUDI-4276], we identified that this issue could be resolved with upgrading to hudi 0.12.0. When upgrading hudi we are facing below error. Can you please provide info on what is causing this issue? (Only pom version changes have been done, no code changes)

Error:

org.apache.spark.sql.adapter.Spark3_1Adapter
java.lang.ClassNotFoundException: org.apache.spark.sql.adapter.Spark3_1Adapter
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	at org.apache.hudi.SparkAdapterSupport.sparkAdapter(SparkAdapterSupport.scala:39)
	at org.apache.hudi.SparkAdapterSupport.sparkAdapter$(SparkAdapterSupport.scala:29)
	at org.apache.hudi.HoodieSparkUtils$.sparkAdapter$lzycompute(HoodieSparkUtils.scala:65)
	at org.apache.hudi.HoodieSparkUtils$.sparkAdapter(HoodieSparkUtils.scala:65)
	at org.apache.hudi.AvroConversionUtils$.convertStructTypeToAvroSchema(AvroConversionUtils.scala:150)
	at org.apache.hudi.HoodieSparkSqlWriter$.bulkInsertAsRow(HoodieSparkSqlWriter.scala:540)
	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:178)
	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:183)
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)

pom.xml

<dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>2.12.12</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.12</artifactId>
      <version>3.1.2</version>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.12</artifactId>
      <version>3.1.2</version>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-hive_2.12</artifactId>
      <version>3.1.2</version>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.hudi</groupId>
      <artifactId>hudi-spark3-bundle_2.12</artifactId>
      <version>0.12.0</version>
    </dependency>
    <dependency>
      <groupId>org.mongodb.spark</groupId>
      <artifactId>mongo-spark-connector_2.12</artifactId>
      <version>3.0.1</version>
    </dependency>

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
navbalaramancommented, Nov 4, 2022

@nsivabalan EMR 6.8 supports only 0.11.1 but we needed to get on 0.12.0 to have the null value issue resolved. so I upgraded spark to 3.3.0 and used the below hudi spark bundle to get it working now. Thanks for your inputs.

<dependency>
  <groupId>org.apache.hudi</groupId>
  <artifactId>hudi-spark3.3-bundle_2.12</artifactId>
  <version>0.12.0</version>
</dependency>
0reactions
codopecommented, Nov 29, 2022

@navbalaraman The issue due to partition field was fixed recently by https://github.com/apache/hudi/pull/7132 Please reopen another issue if it still persists.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting - Apache Hudi
This error generally occurs when the schema has evolved in backwards incompatible way by deleting some column 'col1' and we are trying to...
Read more >
Older Releases | Apache Hudi
This is a bug fix only release and no special migration steps needed when upgrading from 0.5.2. If you are upgrading from earlier...
Read more >
Release 0.12.1 - Apache Hudi
Release 0.12.1 (docs). Migration Guide​. This release (0.12.1) does not introduce any new table version, thus no migration is needed if you are...
Read more >
Release 0.12.0 | Apache Hudi
Please take note of the following updates before upgrading to Hudi 0.12.0. ... After 0.9.0, due to some refactoring, fallback partition changed to...
Read more >
Release 0.10.0 | Apache Hudi
With 0.10.0, we have made some foundational fix to metadata table and so as part of upgrade, any existing metadata table is cleaned...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found