question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] How to create a hudi table without suffix in snapshot read mode using SparkSQL

See original GitHub issue

@YannByron Hello

https://issues.apache.org/jira/browse/HUDI-4487 this fix to create rt/ro manually,but it forbid to use options(hoodie.query.as.ro.table = ‘false’) in spark sql to create table,

Now how can i create a hudi table without suffix in snapshot read mode using SparkSQL. we just want to use hudi tale like a rdbms table, query a table can read all the data, not using suffix table name( which will make users increase learning costs and confuse SQL usage)

CREATE TABLE IF NOT EXISTS `default`.`hudi_test_snapshot_mode` (
     `id` INT
    ,`name` STRING
    ,`age` INT
    ,`sync_time` TIMESTAMP
) USING HUDI
OPTIONS(
  `hoodie.query.as.ro.table` = 'false
)
TBLPROPERTIES (
     type = 'mor'
    ,primaryKey = 'id'
    ,preCombineField = 'sync_time'
    ,`hoodie.compaction.payload.class` = 'org.apache.hudi.common.model.OverwriteWithLatestAvroPayload'
    ,`hoodie.datasource.write.hive_style_partitioning` = 'false'
    ,`hoodie.table.keygenerator.class` = 'org.apache.hudi.keygen.NonpartitionedKeyGenerator'
    ,`hoodie.index.type` = 'GLOBAL_BLOOM'
)
  • Hudi version : 0.12.1

  • Spark version : 3.1.3

  • Hive version : 3.1.0

  • Hadoop version : 3.1.1

  • Storage (HDFS/S3/GCS…) : HDFS

  • Running on Docker? (yes/no) : no

Additional context

Add any other context about the problem here.

Stacktrace

Exception in thread "main" org.apache.spark.sql.AnalysisException: Creating ro/rt table need the existence of the base table.
	at org.apache.spark.sql.hudi.command.CreateHoodieTableCommand.run(CreateHoodieTableCommand.scala:74)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
	at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
	at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3700)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3698)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)```

Issue Analytics

  • State:open
  • Created 10 months ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
JoshuaZhuCNcommented, Nov 24, 2022

I found another way to solve my problem. Create a table according to the rule of 0.12, and then use alter table set tblproperties to realize RT table functions

0reactions
JoshuaZhuCNcommented, Nov 29, 2022

@JoshuaZhuCN

use alter table set tblproperties to realize RT table functions

It’s a tricky way, not a suitable one. IMO, RO/RT just a query mode to a base table. I will be worry that it is confusing if there are a RT table without _rt suffix and a RO table with _ro table, or vice versa.

@YannByron find a problem in this way, we must execute “refresh table xxx” after each data change before we can query the latest data. Otherwise, we can only query the data before the change ( not the data in the base file), which is very confusing.

image

image

https://github.com/apache/hudi/issues/7322

Read more comments on GitHub >

github_iconTop Results From Across the Web

Spark Guide - Apache Hudi
Create Table ​. Scala; Python; Spark SQL. // scala // No separate create table command required in spark. First batch of write to...
Read more >
Spark Guide - Apache Hudi
This guide provides a quick peek at Hudi's capabilities using spark-shell. Using Spark datasources, we will walk through code snippets that allows you...
Read more >
Spark Guide - Apache Hudi
This guide provides a quick peek at Hudi's capabilities using spark-shell. Using Spark datasources, we will walk through.
Read more >
All Configurations | Apache Hudi
For Snapshot query on merge on read table, control whether we invoke the ... The following set of configurations help validate new data...
Read more >
Writing Data | Apache Hudi
Generate some new trips, overwrite the table logically at the Hudi metadata level. The Hudi cleaner will eventually clean up the previous table...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found