[SUPPORT]Need info on the versions of hudi dependent jars which can be used with Glue 3.0
See original GitHub issueDescribe the problem you faced Need to use higher version of Spark libraries, so as to support casting of array<string> to array<null> type, because we dont know which combination of sprak-hudi-bundle jars and spark-avro jars wold work, im stuck with Glue 2.0 and Spark 2.4. The jars used for creating Hudi tables on glue catalog as of now are as follows : Setup/Env config:
AWS Glue 2.0, Python 3, Spark 2 external dependent jars for connecting AWS glue and Hudi:
- httpclient-4.5.9.jar
- hudi-spark-bundle_2.11-0.8.0.jar
- spark-avro_2.11-2.4.4.jar
A clear and concise description of the problem.
Have a use case where in we need to update the schema of received records to with empty array as value in few columns to array<null> type.
A clear and concise description of what you expected to happen. Link for reference of the issue https://stackoverflow.com/questions/72294587/how-to-automate-casting-of-empty-arraystring-elements-to-arraystruct-eleme
Ultimately we want to know the which versions of hudi-spark-bundle.jar
, spark-avro.jars
to be used so that we can switch to Glue 3.0 which internally works on Spark 3.1.
Issue Analytics
- State:
- Created a year ago
- Comments:17 (3 by maintainers)
Top GitHub Comments
utilities bundle contains spark 3.1 implicitly. We recommend to change to utilities-slim bundle, when you also put spark bundle there. Checkout the release notes https://hudi.apache.org/releases/release-0.11.0/#slim-utilities-bundle
For Glue 3.0 we use: hudi-spark3-bundle_2.12-0.9.0.jar spark-avro_2.12-3.1.2.jar calcite-core-1.16.0.jar
Switch out the hudi-spark3-bundle2.12 for .10 or .11 if you want those instead.