question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] How to use hudi-defaults.conf with Glue

See original GitHub issue

Describe the problem you faced

I tried to use Hudi hudi-defaults.conf with Glue and tried to set the path of the file using Spark Config and Python Environment config and it doesn’t work. I checked this issue https://github.com/apache/hudi/pull/4167 but i can’t find a clear idea about how to use it.

Spark Config: pyspark

spark = SparkSession.builder.config('spark.serializer','org.apache.spark.serializer.KryoSerializer')
.config('spark.sql.hive.convertMetastoreParquet','false')
.config('spark.yarn.appMasterEnv.HUDI_CONF_DIR',args['HUDI_CONF_DIR'])
.config('spark.executorEnv.HUDI_CONF_DIR',args['HUDI_CONF_DIR']).getOrCreate()

Env Config:

HUDI_CONF_DIR='s3://glue-development-bucket/scripts/hudi-conf/hudi-default.conf'
os.environ['HUDI_CONF_DIR'] = args['HUDI_CONF_DIR']

I am getting the same error every time, I am not sure if there is a clear example about how to use this feature with spark or Glue.

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

  • Hudi version : 0.10.1

  • Spark version : 3.1.1

  • Hive version : 2.3.7

  • Storage (HDFS/S3/GCS…) : S3

  • Running on Docker? (yes/no) : no

Add the stacktrace of the error.

2022-04-19 00:34:37,012 WARN [Thread-10] config.DFSPropertiesConfiguration (DFSPropertiesConfiguration.java:getConfPathFromEnv(188)): Cannot find HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
--
2022-04-19 00:34:37,085 WARN [Thread-10] config.DFSPropertiesConfiguration (DFSPropertiesConfiguration.java:addPropsFromFile(131)): Properties file file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file

In the test example, it uses DFSPropertiesConfiguration.refreshGlobalProps(); to refresh, but i am not sure how to use this with pyspark config.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:17 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
moustafaalaacommented, Apr 20, 2022

The warning disappeared, I will verify it is working fine and share the output. It still there. I added the details in the next comment

0reactions
nsivabalancommented, Aug 28, 2022

closing it out since the linked PR is landed. thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Using the Hudi framework in AWS Glue
To enable Hudi for AWS Glue, complete the following tasks: ... Create a key named --conf for your AWS Glue job, and set...
Read more >
All Configurations | Apache Hudi
By default, Hudi would load the configuration file under /etc/hudi/conf directory. You can specify a different configuration directory location by setting the ...
Read more >
AWS Glue - EMR Containers Best Practices Guides
Starting from Hudi 0.9.0, we can synchronize Hudi table's latest schema to Glue catalog via the Hive Metastore Service (HMS) in hive sync...
Read more >
Using Apache Hudi with EMR Serverless - 亚马逊云科技
When you use the Amazon Glue Data Catalog as your metastore, you can specify the following configuration properties for your Hudi job. --conf...
Read more >
AWS Glue configurations | dbt Developer Hub
Incremental models​ · append (default): Insert new records without updating or overwriting any existing data. · insert_overwrite : If partition_by ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found