Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] How to create a hudi table without suffix in snapshot read mode using SparkSQL

See original GitHub issue

@YannByron Hello

https://issues.apache.org/jira/browse/HUDI-4487 this fix to create rt/ro manually，but it forbid to use options(hoodie.query.as.ro.table = ‘false’) in spark sql to create table,

Now how can i create a hudi table without suffix in snapshot read mode using SparkSQL. we just want to use hudi tale like a rdbms table, query a table can read all the data, not using suffix table name( which will make users increase learning costs and confuse SQL usage)

CREATE TABLE IF NOT EXISTS `default`.`hudi_test_snapshot_mode` (
     `id` INT
    ,`name` STRING
    ,`age` INT
    ,`sync_time` TIMESTAMP
) USING HUDI
OPTIONS(
  `hoodie.query.as.ro.table` = 'false
)
TBLPROPERTIES (
     type = 'mor'
    ,primaryKey = 'id'
    ,preCombineField = 'sync_time'
    ,`hoodie.compaction.payload.class` = 'org.apache.hudi.common.model.OverwriteWithLatestAvroPayload'
    ,`hoodie.datasource.write.hive_style_partitioning` = 'false'
    ,`hoodie.table.keygenerator.class` = 'org.apache.hudi.keygen.NonpartitionedKeyGenerator'
    ,`hoodie.index.type` = 'GLOBAL_BLOOM'
)

Hudi version : 0.12.1
Spark version : 3.1.3
Hive version : 3.1.0
Hadoop version : 3.1.1
Storage (HDFS/S3/GCS…) : HDFS
Running on Docker? (yes/no) : no

Additional context

Add any other context about the problem here.

Stacktrace

Exception in thread "main" org.apache.spark.sql.AnalysisException: Creating ro/rt table need the existence of the base table.
	at org.apache.spark.sql.hudi.command.CreateHoodieTableCommand.run(CreateHoodieTableCommand.scala:74)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
	at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
	at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3700)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3698)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)```

Issue Analytics

State:
Created 10 months ago
Comments:7 (2 by maintainers)

Top GitHub Comments

1reaction

JoshuaZhuCNcommented, Nov 24, 2022

I found another way to solve my problem. Create a table according to the rule of 0.12, and then use alter table set tblproperties to realize RT table functions

0reactions

JoshuaZhuCNcommented, Nov 29, 2022

@JoshuaZhuCN

use alter table set tblproperties to realize RT table functions

It’s a tricky way, not a suitable one. IMO, RO/RT just a query mode to a base table. I will be worry that it is confusing if there are a RT table without _rt suffix and a RO table with _ro table, or vice versa.

@YannByron find a problem in this way, we must execute “refresh table xxx” after each data change before we can query the latest data. Otherwise, we can only query the data before the change ( not the data in the base file), which is very confusing.

https://github.com/apache/hudi/issues/7322

Top Results From Across the Web

Spark Guide - Apache Hudi

Create Table . Scala; Python; Spark SQL. // scala // No separate create table command required in spark. First batch of write to...

Spark Guide - Apache Hudi

This guide provides a quick peek at Hudi's capabilities using spark-shell. Using Spark datasources, we will walk through code snippets that allows you...

Spark Guide - Apache Hudi

This guide provides a quick peek at Hudi's capabilities using spark-shell. Using Spark datasources, we will walk through.

All Configurations | Apache Hudi

For Snapshot query on merge on read table, control whether we invoke the ... The following set of configurations help validate new data...

Writing Data | Apache Hudi

Generate some new trips, overwrite the table logically at the Hudi metadata level. The Hudi cleaner will eventually clean up the previous table...