[SUPPORT] Read hive Table fail when HoodieCatalog used
See original GitHub issuehudi 0.11.0 spark 3.2.1
when hive_sync then read.table("table_name")
raise an error pyspark.sql.utils.AnalysisException: Table does not support reads
.
The error does’t raise when --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
is not set.
pyspark --packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.0 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
sc.setLogLevel("WARN")
dataGen = sc._jvm.org.apache.hudi.QuickstartUtils.DataGenerator()
inserts = sc._jvm.org.apache.hudi.QuickstartUtils.convertToStringList(
dataGen.generateInserts(10)
)
from pyspark.sql.functions import expr
df = spark.read.json(spark.sparkContext.parallelize(inserts, 10)).withColumn(
"part", expr("'foo'")
)
tableName = "test_hudi_pyspark"
basePath = f"/tmp/{tableName}"
hudi_options = {
"hoodie.table.name": tableName,
"hoodie.datasource.write.recordkey.field": "uuid",
"hoodie.datasource.write.partitionpath.field": "part",
"hoodie.datasource.write.table.name": tableName,
"hoodie.datasource.write.operation": "upsert",
"hoodie.datasource.write.precombine.field": "ts",
"hoodie.upsert.shuffle.parallelism": 2,
"hoodie.insert.shuffle.parallelism": 2,
"hoodie.datasource.hive_sync.database": "default",
"hoodie.datasource.hive_sync.table": tableName,
"hoodie.datasource.hive_sync.mode": "hms",
"hoodie.datasource.hive_sync.enable": "true",
"hoodie.datasource.hive_sync.partition_fields": "part",
"hoodie.datasource.hive_sync.partition_extractor_class": "org.apache.hudi.hive.MultiPartKeysValueExtractor",
}
(df.write.format("hudi").options(**hudi_options).mode("overwrite").save(basePath))
spark.read.format("hudi").load(basePath).count() # WORKS
spark.table("default.test_hudi_pyspark").count() # RAISE AN ERROR !!
ERROR: pyspark.sql.utils.AnalysisException: Table does not support reads: default.test_hudi_pyspark
I debugged it a bit and the hudi catalog for load table uses the super.loadTable which is not aware of hudi ?
override def loadTable(ident: Identifier): Table = {
try {
super.loadTable(ident) match {
case v1: V1Table if sparkAdapter.isHoodieTable(v1.catalogTable) =>
HoodieInternalV2Table(
spark,
v1.catalogTable.location.toString,
catalogTable = Some(v1.catalogTable),
tableIdentifier = Some(ident.toString))
case o => o // this case is used
}
} catch {
case e: Exception =>
throw e
}
}
Issue Analytics
- State:
- Created a year ago
- Comments:12 (12 by maintainers)
Top Results From Across the Web
Error while running query on HIVE; - Cloudera Community
Solved: HI All, I am unable to run the simple query on HIVE i.e. describe ... Can you please check database name, which...
Read more >spark throws error when reading hive transaction table
The issue you are trying to reading Transactional table (transactional = true) into Spark. Officially Spark not yet supported for Hive-ACID ...
Read more >Hive Tables - Spark 3.3.1 Documentation
Specifying storage format for Hive tables; Interacting with Different Versions of Hive Metastore. Spark SQL also supports reading and writing data stored in ......
Read more >Why Cannot I Query Newly Inserted Data in a Parquet Hive ...
Why cannot I query newly inserted data in a parquet Hive table using SparkSQL? This problem occurs in the following scenarios:For partitioned tables...
Read more >Integrating Apache Hive Metastores with Snowflake
Supported Hive Operations and Table Types. Hive and Snowflake Data Types. Supported File Formats and Options. Unsupported Hive Commands, Features, and Use ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
hmmm, interesting. @XuQianJin-Stars : Can you assist here please.
Closing the issue, @parisni please reopen if you have new problems.