question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] UPDATE command doest not working on Spark SQL

See original GitHub issue

I’ve tried use SparkSQL for update rows in my table, but I’m receiving the below error:

183073 [Thread-3] WARN  org.apache.hadoop.hive.conf.HiveConf  - HiveConf of name hive.stats.jdbc.timeout does not exist
183075 [Thread-3] WARN  org.apache.hadoop.hive.conf.HiveConf  - HiveConf of name hive.stats.retries.wait does not exist
184478 [Thread-3] WARN  org.apache.hadoop.hive.metastore.ObjectStore  - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
184478 [Thread-3] WARN  org.apache.hadoop.hive.metastore.ObjectStore  - setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore UNKNOWN@172.17.0.2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/spark/python/pyspark/sql/session.py", line 723, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
  File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
  File "/opt/spark/python/pyspark/sql/utils.py", line 111, in deco
    return f(*a, **kw)
  File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o27.sql.
: java.lang.UnsupportedOperationException: UPDATE TABLE is not supported temporarily.
	at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:716)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63)
	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:489)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
	at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:67)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78)

To Reproduce

I saved a dataframe as Hudi format and load it to Hudi table

spark.sql('create table events using hudi options (primaryKey = "id", preCombinedField = "updated_at", type ="cow") location "/tmp/data/delta/events"')

Then I tried update a row

spark.sql('update events set name = "eita" where id = 244603')

Environment Description

  • Hudi version : 0.9.0

  • Spark version : 3.1.2

  • Storage (HDFS/S3/GCS…) : Local

  • Running on Docker? (yes/no) : yes

My setup https://github.com/jasondavindev/delta-lake-dms-cdc/blob/main/apps/hudi.py

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
xushiyancommented, Nov 28, 2021

@jasondavindev you’d need to use java 8 for hudi. pls follow readme for build instructions

0reactions
jasondavindevcommented, Nov 29, 2021

@xushiyan Thanks! I built the image, but when I trying write a dataframe, I receive the error

>>> df.write.format('hudi').options(**hudi_options).save('/tmp/data/sample')
37491 [Thread-3] WARN  org.apache.hudi.common.config.DFSPropertiesConfiguration  - Cannot find HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
37500 [Thread-3] ERROR org.apache.hudi.common.config.DFSPropertiesConfiguration  - Error reading in properties from dfs
37500 [Thread-3] WARN  org.apache.hudi.common.config.DFSPropertiesConfiguration  - Didn't find config file under default conf file dir: file:/etc/hudi/conf
38382 [Thread-3] WARN  org.apache.hudi.metadata.HoodieBackedTableMetadata  - Metadata table was not found at path /tmp/data/sample/.hoodie/metadata
38400 [Thread-3] WARN  org.apache.hudi.metadata.HoodieBackedTableMetadata  - Metadata table was not found at path /tmp/data/sample/.hoodie/metadata
41212 [Thread-3] WARN  org.apache.hudi.metadata.HoodieBackedTableMetadata  - Metadata table was not found at path /tmp/data/sample/.hoodie/metadata
41217 [Thread-3] WARN  org.apache.hudi.metadata.HoodieBackedTableMetadata  - Metadata table was not found at path /tmp/data/sample/.hoodie/metadata
41972 [Executor task launch worker for task 0.0 in stage 49.0 (TID 44)] ERROR org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor  - Error upserting bucketType UPDATE for partition :0
java.lang.ExceptionInInitializerError
	at org.apache.hadoop.hbase.io.hfile.LruBlockCache.<clinit>(LruBlockCache.java:935)
	at org.apache.hadoop.hbase.io.hfile.CacheConfig.getL1(CacheConfig.java:553)
	at org.apache.hadoop.hbase.io.hfile.CacheConfig.instantiateBlockCache(CacheConfig.java:660)
	at org.apache.hadoop.hbase.io.hfile.CacheConfig.<init>(CacheConfig.java:246)
	at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:100)
	at org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:120)
	at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:164)
	at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:375)
	at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:353)
	at org.apache.hudi.table.action.deltacommit.AbstractSparkDeltaCommitActionExecutor.handleUpdate(AbstractSparkDeltaCommitActionExecutor.java:84)
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:313)
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$execute$ecf5068c$1(BaseSparkCommitActionExecutor.java:172)
	at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
	at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386)
	at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1440)
	at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1350)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1414)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1237)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.RuntimeException: Unexpected version format: 11.0.13
	at org.apache.hadoop.hbase.util.ClassSize.<clinit>(ClassSize.java:119)
	... 39 more

I found a issue related to this error, but it was a compatibility issue (0.4.x version). You can see my application here https://github.com/jasondavindev/delta-lake-dms-cdc/blob/main/apps/hudi_update.py

Using 0.9.0 version was written successfully, but not update

Read more comments on GitHub >

github_iconTop Results From Across the Web

update query in Spark SQL - Stack Overflow
Spark SQL doesn't support UPDATE statements yet. Hive has started supporting UPDATE since hive version 0.14. But even with Hive, it supports ...
Read more >
Spark SQL - Update Command - Cloudera Community - 136799
Solved: I am trying to update the value of a record using spark sql in spark shell I get executed the command -...
Read more >
UPDATE | Databricks on AWS
Updates the column values for the rows that match a predicate. When no predicate is provided, update the column values for all rows....
Read more >
Spark SQL Upgrading Guide - Spark 2.4.0 Documentation
In Spark version 2.3 and earlier, HAVING without GROUP BY is treated as WHERE. This means, SELECT 1 FROM range(10) HAVING true is...
Read more >
Table deletes, updates, and merges - Delta Lake Documentation
You can update data that matches a predicate in a Delta table. ... See Configure SparkSession for the steps to enable support for...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found