Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hive database not auto created when syncing

See original GitHub issue

Describe the problem you faced

There is a new flag in hudi 0.7.0 or later - hoodie.datasource.hive_sync.auto_create_database. Based on documentation it defaults to true which would be consistent with previous behaviour that created hive dbs if they don’t exist.

It seems that 0.7.0 and 0.8.0 will actually default to false when writing a dataframe as hudi if the flag is not specified at all because of how this code is written here https://github.com/apache/hudi/blob/release-0.7.0/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L377 so we end up with the following error instead org.apache.hudi.hive.HoodieHiveSyncException: Failed to check if database exists your_db_name

To Reproduce

Steps to reproduce the behavior:

Write a dataframe to hudi without specyfing the new hoodie.datasource.hive_sync.auto_create_database flag but set other settings to sync to hive
it will not sync to hive and will give HoodieHiveSyncException exception instead

Expected behavior

On 0.6.0 this creates a hive db and in 0.7.0 it doesn’t anymore. When the flag is explicitly provided and set to “true” it works fine so maybe docs could be updated to reflect it? This would still be a surprising behaviour as it wasn’t required before.

Environment Description

Hudi version : 0.7.0 and 0.8.0

Additional context Delta-streamer works fine with 0.7.0

Issue Analytics

State:
Created 2 years ago
Comments:7 (5 by maintainers)

Top GitHub Comments

1reaction

n3nashcommented, Jun 16, 2021

Closing this ticket since the issue is resolved. Thanks @veenaypatil !

1reaction

veenaypatilcommented, Jun 11, 2021

@n3nash yes, I will pick it up and add a note in documentation

Top Results From Across the Web

Hive table is not reflecting in atlas automatically.

when i run the import-hive.sh file manually tables getting sync with atlas and when i debug more i found that when i am...

Automatic Hive catalog syncing to the Big SQL catalog - IBM

Important: Tables that are created under the Hive default schema are not automatically synced; you must synchronize these tables manually if you want...

Spark and Hive table schema out of sync after external overwrite

On checking I found it was a known issue with CDH spark 2.2.0 version. Workaround for that was to run the below command...

[SUPPORT] Unable to sync with external hive metastore via ...

The hive sync succeeds according to logs, but not able to see the new table in the metastore. Instead only seeing the existing...

Synchronizing Data from External Data Source to Hive ...

Step 1: Create a Data Synchronization Task¶. In the EnOS Management Console, select Data Synchronization from the left navigation menu.