[SUPPORT] Can not create a Path from an empty string on unpartitioned table
See original GitHub issueDescribe the problem you faced
-
Issue trying to create unpartitioned tables to hive metastore (in aws glue data catalog) using hudi (Tested on
0.6.0
,0.7.0
and0.8.0
) -
Using hudi on AWS EMR, with pyspark
-
Previous fix is implemented on new versions, but it continues failing
-
Hudi config for unpartitioned tables
hudiConfig = {
"hoodie.datasource.write.precombine.field": <column>,
"hoodie.datasource.write.recordkey.field": _PRIMARY_KEY_COLUMN,
"hoodie.datasource.write.keygenerator.class": 'org.apache.hudi.keygen.NonpartitionedKeyGenerator',
"hoodie.datasource.hive_sync.partition_extractor_class": 'org.apache.hudi.hive.NonPartitionedExtractor',
"hoodie.datasource.write.hive_style_partitioning": "true",
"className": "org.apache.hudi",
"hoodie.datasource.hive_sync.use_jdbc": "false",
"hoodie.consistency.check.enabled": "true",
"hoodie.datasource.hive_sync.database": DB_NAME,
"hoodie.datasource.hive_sync.enable": "true",
"hoodie.datasource.hive_sync.support_timestamp": "true",
}
To Reproduce
Steps to reproduce the behavior:
- Run hudi with hive integration
- Try to create an unpartitioned table, with config previously specified
Expected behavior
The table would be created without throw the exception, without any partition or default
partitionpath
Environment Description
-
Hudi version :
0.6.0
,0.7.0
and0.8.0
-
Spark version :
2.4.7
-
Hive version : Aws glue data catalog integration on EMR
-
Hadoop version : Amazon Hadoop distribution
-
Storage (HDFS/S3/GCS…) : S3
-
Running on Docker? (yes/no) : no
Stacktrace
org.apache.hudi.hive.HoodieHiveSyncException: Failed to get update last commit time synced to 20210407181606
at org.apache.hudi.hive.HoodieHiveClient.updateLastCommitTimeSynced(HoodieHiveClient.java:496)
at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:150)
at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94)
at org.apache.hudi.HoodieSparkSqlWriter$.org$apache$hudi$HoodieSparkSqlWriter$$syncHive(HoodieSparkSqlWriter.scala:355)
at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:403)
at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:399)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:399)
at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:460)
at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:217)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:173)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:169)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:197)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:194)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:169)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:114)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:112)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$executeQuery$1(SQLExecution.scala:83)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:94)
at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141)
at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$withMetrics(SQLExecution.scala:178)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:93)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:200)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:92)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Can not create a Path from an empty string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:168)
at org.apache.hadoop.fs.Path.<init>(Path.java:180)
at org.apache.hadoop.hive.metastore.Warehouse.getDatabasePath(Warehouse.java:172)
at org.apache.hadoop.hive.metastore.Warehouse.getTablePath(Warehouse.java:184)
at org.apache.hadoop.hive.metastore.Warehouse.getFileStatusesForUnpartitionedTable(Warehouse.java:520)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.updateUnpartitionedTableStatsFast(MetaStoreUtils.java:180)
at com.amazonaws.glue.shims.AwsGlueSparkHiveShims.updateTableStatsFast(AwsGlueSparkHiveShims.java:62)
at com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.alterTable(GlueMetastoreClientDelegate.java:552)
at com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.alter_table(AWSCatalogMetastoreClient.java:400)
at com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.alter_table(AWSCatalogMetastoreClient.java:385)
at org.apache.hudi.hive.HoodieHiveClient.updateLastCommitTimeSynced(HoodieHiveClient.java:494)
... 46 more
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
'Can not create a Path from an empty string' Error for 'CREATE ...
The issue happens when a database is created without specified location: CREATE DATABASE db_name;. To fix the issue, specify location when ...
Read more >[GitHub] [hudi] n3nash commented on issue #2797: [SUPPORT] Can ...
[GitHub] [hudi] n3nash commented on issue #2797: [SUPPORT] Can not create a Path from an empty string on unpartitioned table · 2021-06-04 Thread...
Read more >AWS Glue – Can not create a Path from an empty string
Here are the steps I took to solve the Can not create a Path from an empty string error in my Glue job:...
Read more >COPY INTO <table> - Snowflake Documentation
COPY INTO <table>¶. Loads data from staged files to an existing table. The files must already be staged in one of the following...
Read more >Create and use tables | BigQuery - Google Cloud
Click Create table. Note: When you create an empty table using the Google Cloud console, you cannot add a label, description, or expiration...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I am facing the same issue. Please can you share the fix. I am using Hudi version 0.8.
Issue (https://github.com/apache/hudi/issues/2801) might be a duplicate.
However while creating an unpartitioned table, my dataframe.write is getting succeeded but I am not able to query the data via hive. Although spark read are working fine for me though. (Testing via spark shell and I am using jdbc to connect to hive)