Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] Can not create a Path from an empty string on unpartitioned table

See original GitHub issue

Describe the problem you faced

Issue trying to create unpartitioned tables to hive metastore (in aws glue data catalog) using hudi (Tested on 0.6.0, 0.7.0 and 0.8.0 )
Using hudi on AWS EMR, with pyspark
Previous fix is implemented on new versions, but it continues failing
Hudi config for unpartitioned tables

hudiConfig = {
   "hoodie.datasource.write.precombine.field": <column>,
   "hoodie.datasource.write.recordkey.field": _PRIMARY_KEY_COLUMN,
   "hoodie.datasource.write.keygenerator.class": 'org.apache.hudi.keygen.NonpartitionedKeyGenerator',
   "hoodie.datasource.hive_sync.partition_extractor_class": 'org.apache.hudi.hive.NonPartitionedExtractor',
   "hoodie.datasource.write.hive_style_partitioning": "true",
   "className": "org.apache.hudi",
   "hoodie.datasource.hive_sync.use_jdbc": "false",
   "hoodie.consistency.check.enabled": "true",
   "hoodie.datasource.hive_sync.database": DB_NAME,
   "hoodie.datasource.hive_sync.enable": "true",
   "hoodie.datasource.hive_sync.support_timestamp": "true",
}

To Reproduce

Steps to reproduce the behavior:

Run hudi with hive integration
Try to create an unpartitioned table, with config previously specified

Expected behavior

The table would be created without throw the exception, without any partition or default partitionpath

Environment Description

Hudi version : 0.6.0, 0.7.0 and 0.8.0
Spark version : 2.4.7
Hive version : Aws glue data catalog integration on EMR
Hadoop version : Amazon Hadoop distribution
Storage (HDFS/S3/GCS…) : S3
Running on Docker? (yes/no) : no

Stacktrace

org.apache.hudi.hive.HoodieHiveSyncException: Failed to get update last commit time synced to 20210407181606
   at org.apache.hudi.hive.HoodieHiveClient.updateLastCommitTimeSynced(HoodieHiveClient.java:496)
   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:150)
   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94)
   at org.apache.hudi.HoodieSparkSqlWriter$.org$apache$hudi$HoodieSparkSqlWriter$$syncHive(HoodieSparkSqlWriter.scala:355)
   at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:403)
   at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:399)
   at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
   at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:399)
   at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:460)
   at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:217)
   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134)
   at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
   at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
   at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:173)
   at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:169)
   at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:197)
   at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:194)
   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:169)
   at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:114)
   at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:112)
   at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
   at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
   at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$executeQuery$1(SQLExecution.scala:83)
   at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:94)
   at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141)
   at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$withMetrics(SQLExecution.scala:178)
   at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:93)
   at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:200)
   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:92)
   at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696)
   at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305)
   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291)
   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:498)
   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   at py4j.Gateway.invoke(Gateway.java:282)
   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   at py4j.commands.CallCommand.execute(CallCommand.java:79)
   at py4j.GatewayConnection.run(GatewayConnection.java:238)
   at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Can not create a Path from an empty string
   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:168)
   at org.apache.hadoop.fs.Path.<init>(Path.java:180)
   at org.apache.hadoop.hive.metastore.Warehouse.getDatabasePath(Warehouse.java:172)
   at org.apache.hadoop.hive.metastore.Warehouse.getTablePath(Warehouse.java:184)
   at org.apache.hadoop.hive.metastore.Warehouse.getFileStatusesForUnpartitionedTable(Warehouse.java:520)
   at org.apache.hadoop.hive.metastore.MetaStoreUtils.updateUnpartitionedTableStatsFast(MetaStoreUtils.java:180)
   at com.amazonaws.glue.shims.AwsGlueSparkHiveShims.updateTableStatsFast(AwsGlueSparkHiveShims.java:62)
   at com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.alterTable(GlueMetastoreClientDelegate.java:552)
   at com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.alter_table(AWSCatalogMetastoreClient.java:400)
   at com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.alter_table(AWSCatalogMetastoreClient.java:385)
   at org.apache.hudi.hive.HoodieHiveClient.updateLastCommitTimeSynced(HoodieHiveClient.java:494)
   ... 46 more

Issue Analytics

State:
Created 2 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

pranotishanbhagcommented, Jun 10, 2021

I am facing the same issue. Please can you share the fix. I am using Hudi version 0.8.

1reaction

aditiwari01commented, Apr 10, 2021

Issue (https://github.com/apache/hudi/issues/2801) might be a duplicate.

However while creating an unpartitioned table, my dataframe.write is getting succeeded but I am not able to query the data via hive. Although spark read are working fine for me though. (Testing via spark shell and I am using jdbc to connect to hive)