question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT]Need info on the versions of hudi dependent jars which can be used with Glue 3.0

See original GitHub issue

Describe the problem you faced Need to use higher version of Spark libraries, so as to support casting of array<string> to array<null> type, because we dont know which combination of sprak-hudi-bundle jars and spark-avro jars wold work, im stuck with Glue 2.0 and Spark 2.4. The jars used for creating Hudi tables on glue catalog as of now are as follows : Setup/Env config:

AWS Glue 2.0, Python 3, Spark 2 external dependent jars for connecting AWS glue and Hudi:

  1. httpclient-4.5.9.jar
  2. hudi-spark-bundle_2.11-0.8.0.jar
  3. spark-avro_2.11-2.4.4.jar

A clear and concise description of the problem.

Have a use case where in we need to update the schema of received records to with empty array as value in few columns to array<null> type.

A clear and concise description of what you expected to happen. Link for reference of the issue https://stackoverflow.com/questions/72294587/how-to-automate-casting-of-empty-arraystring-elements-to-arraystruct-eleme

Ultimately we want to know the which versions of hudi-spark-bundle.jar, spark-avro.jars to be used so that we can switch to Glue 3.0 which internally works on Spark 3.1.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:17 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
xushiyancommented, Aug 7, 2022

utilities bundle contains spark 3.1 implicitly. We recommend to change to utilities-slim bundle, when you also put spark bundle there. Checkout the release notes https://hudi.apache.org/releases/release-0.11.0/#slim-utilities-bundle

1reaction
tjtollcommented, May 21, 2022

For Glue 3.0 we use: hudi-spark3-bundle_2.12-0.9.0.jar spark-avro_2.12-3.1.2.jar calcite-core-1.16.0.jar

Switch out the hudi-spark3-bundle2.12 for .10 or .11 if you want those instead.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Using the Hudi framework in AWS Glue
To use a version of Hudi that AWS Glue doesn't support, specify your own Hudi JAR files using the --extra-jars job parameter. Do...
Read more >
Spark Guide - Apache Hudi
This guide provides a quick peek at Hudi's capabilities using spark-shell. Using Spark datasources, we will walk through.
Read more >
Hudi 0.11 + AWS Glue doesn't work yet. | by Life-is-short--so
One way to use the higher version of Hudi is to use custom JARs in ... Hudi tables can sync to AWS Glue...
Read more >
Process Apache Hudi, Delta Lake, Apache Iceberg ... - Noise
This post focuses on Apache Hudi, Delta Lake, and Apache Iceberg, and summarizes ... In Glue job, you can configure in Dependent JARs...
Read more >
Create Custom Connector - AWS Glue Immersion day
For more information on these Hudi dependencies, check the Hudi documentation . Now we will create an AWS Glue Custom Connector for Apache...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found