Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] Read hive Table fail when HoodieCatalog used

See original GitHub issue

hudi 0.11.0 spark 3.2.1

when hive_sync then read.table("table_name") raise an error pyspark.sql.utils.AnalysisException: Table does not support reads. The error does’t raise when --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog' is not set.

pyspark   --packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.0   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'   --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'

sc.setLogLevel("WARN")
dataGen = sc._jvm.org.apache.hudi.QuickstartUtils.DataGenerator()
inserts = sc._jvm.org.apache.hudi.QuickstartUtils.convertToStringList(
    dataGen.generateInserts(10)
)
from pyspark.sql.functions import expr

df = spark.read.json(spark.sparkContext.parallelize(inserts, 10)).withColumn(
    "part", expr("'foo'")
)
tableName = "test_hudi_pyspark"
basePath = f"/tmp/{tableName}"

hudi_options = {
    "hoodie.table.name": tableName,
    "hoodie.datasource.write.recordkey.field": "uuid",
    "hoodie.datasource.write.partitionpath.field": "part",
    "hoodie.datasource.write.table.name": tableName,
    "hoodie.datasource.write.operation": "upsert",
    "hoodie.datasource.write.precombine.field": "ts",
    "hoodie.upsert.shuffle.parallelism": 2,
    "hoodie.insert.shuffle.parallelism": 2,
    "hoodie.datasource.hive_sync.database": "default",
    "hoodie.datasource.hive_sync.table": tableName,
    "hoodie.datasource.hive_sync.mode": "hms",
    "hoodie.datasource.hive_sync.enable": "true",
    "hoodie.datasource.hive_sync.partition_fields": "part",
    "hoodie.datasource.hive_sync.partition_extractor_class": "org.apache.hudi.hive.MultiPartKeysValueExtractor",
}
(df.write.format("hudi").options(**hudi_options).mode("overwrite").save(basePath))
spark.read.format("hudi").load(basePath).count() # WORKS 
spark.table("default.test_hudi_pyspark").count() # RAISE AN ERROR !!

ERROR: pyspark.sql.utils.AnalysisException: Table does not support reads: default.test_hudi_pyspark

I debugged it a bit and the hudi catalog for load table uses the super.loadTable which is not aware of hudi ?

  override def loadTable(ident: Identifier): Table = {
    try {
      super.loadTable(ident) match {
        case v1: V1Table if sparkAdapter.isHoodieTable(v1.catalogTable) =>
          HoodieInternalV2Table(
            spark,
            v1.catalogTable.location.toString,
            catalogTable = Some(v1.catalogTable),
            tableIdentifier = Some(ident.toString))
        case o => o // this case is used
      }
    } catch {
      case e: Exception =>
        throw e
    }
  }

Issue Analytics

State:
Created a year ago
Comments:12 (12 by maintainers)

Top GitHub Comments

1reaction

nsivabalancommented, May 12, 2022

hmmm, interesting. @XuQianJin-Stars : Can you assist here please.

0reactions

leesfcommented, Jun 3, 2022

Closing the issue, @parisni please reopen if you have new problems.

Top Results From Across the Web

Error while running query on HIVE; - Cloudera Community

Solved: HI All, I am unable to run the simple query on HIVE i.e. describe ... Can you please check database name, which...

spark throws error when reading hive transaction table

The issue you are trying to reading Transactional table (transactional = true) into Spark. Officially Spark not yet supported for Hive-ACID ...

Hive Tables - Spark 3.3.1 Documentation

Specifying storage format for Hive tables; Interacting with Different Versions of Hive Metastore. Spark SQL also supports reading and writing data stored in ......

Why Cannot I Query Newly Inserted Data in a Parquet Hive ...

Why cannot I query newly inserted data in a parquet Hive table using SparkSQL? This problem occurs in the following scenarios:For partitioned tables...

Integrating Apache Hive Metastores with Snowflake

Supported Hive Operations and Table Types. Hive and Snowflake Data Types. Supported File Formats and Options. Unsupported Hive Commands, Features, and Use ......