Hive database not auto created when syncing
See original GitHub issueDescribe the problem you faced
There is a new flag in hudi 0.7.0 or later - hoodie.datasource.hive_sync.auto_create_database
. Based on documentation it defaults to true
which would be consistent with previous behaviour that created hive dbs if they don’t exist.
It seems that 0.7.0
and 0.8.0
will actually default to false
when writing a dataframe as hudi if the flag is not specified at all because of how this code is written here https://github.com/apache/hudi/blob/release-0.7.0/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L377 so we end up with the following error instead org.apache.hudi.hive.HoodieHiveSyncException: Failed to check if database exists your_db_name
To Reproduce
Steps to reproduce the behavior:
- Write a dataframe to hudi without specyfing the new
hoodie.datasource.hive_sync.auto_create_database
flag but set other settings to sync to hive - it will not sync to hive and will give HoodieHiveSyncException exception instead
Expected behavior
On 0.6.0
this creates a hive db and in 0.7.0
it doesn’t anymore. When the flag is explicitly provided and set to “true” it works fine so maybe docs could be updated to reflect it? This would still be a surprising behaviour as it wasn’t required before.
Environment Description
- Hudi version :
0.7.0
and0.8.0
Additional context
Delta-streamer works fine with 0.7.0
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (5 by maintainers)
Top GitHub Comments
Closing this ticket since the issue is resolved. Thanks @veenaypatil !
@n3nash yes, I will pick it up and add a note in documentation