question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] Update hudi table(using SparkSQL) failed when the column contains `null` value in other records

See original GitHub issue

Describe the problem you faced

Update hudi table(using SparkSQL) failed when the column contains null value in other records, as the following image:

image

To Reproduce

Steps to reproduce the behavior:

  1. init table & data
    -- create table like this
    create table hudi.update_null_test_cow (
        id bigint,
        name string,
        note string,
        dt string,
        ts bigint
    ) using hudi
    tblproperties(
        type = 'cow',
        primaryKey = 'id',
        preCombineField = 'ts'
    ) partitioned by (dt)
    location 's3://my-test-bucket/hudi/update_null_test_cow';
    
    -- insert a record
    insert into hudi.update_null_test_cow partition(dt = '2022-06-08') select 1 as id, 'john doe' as name, '' as note,  1000 as ts;
    
  2. let’s try to update the record, which will be updated successfully
    update hudi.update_null_test_cow set note = 'some note' where id = 1;
    
  3. add a new record, which contains a null value
    insert into hudi.update_null_test_cow partition(dt = '2022-06-08') select 2 as id, 'foobar' as name, null as note,  1000 as ts;
    
  4. update the record id:1, which will fail
    update hudi.update_null_test_cow set note = 'some other note' where id = 1;
    

Expected behavior

The update query can proceed successfully.

Environment Description

  • Hudi version : 0.10.1

  • Spark version : 3.2.0

  • Hive version : 3.1.2

  • Hadoop version : 3.2.0

  • Storage (HDFS/S3/GCS…) : S3

  • Running on Docker? (yes/no) : no

Additional context

The whole environment is bundled in Amazon EMR 6.6.0.

Stacktrace

Error happens in sql: 
update hudi.update_null_test_cow set note = 'some other note' where id = 1
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 171.0 failed 4 times, most recent failure: Lost task 0.3 in stage 171.0 (TID 8477) (executor 11): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322)
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$execute$ecf5068c$1(BaseSparkCommitActionExecutor.java:174)
	at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
	at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386)
	at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498)
	at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:133)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1474)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key id:2 from old file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-124-6252_20220608044435168.parquet to new file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-171-8477_20220608044759733.parquet with writerSchema {
  "type" : "record",
  "name" : "update_null_test_cow_record",
  "namespace" : "hoodie.update_null_test_cow",
  "fields" : [ {
    "name" : "_hoodie_commit_time",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_commit_seqno",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_record_key",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_partition_path",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_file_name",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "id",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "name",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "note",
    "type" : "string"
  }, {
    "name" : "ts",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "dt",
    "type" : [ "null", "string" ],
    "default" : null
  } ]
}
	at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:102)
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:351)
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:342)
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:315)
	... 28 more
Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key id:2 from old file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-124-6252_20220608044435168.parquet to new file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-171-8477_20220608044759733.parquet with writerSchema {
  "type" : "record",
  "name" : "update_null_test_cow_record",
  "namespace" : "hoodie.update_null_test_cow",
  "fields" : [ {
    "name" : "_hoodie_commit_time",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_commit_seqno",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_record_key",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_partition_path",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_file_name",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "id",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "name",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "note",
    "type" : "string"
  }, {
    "name" : "ts",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "dt",
    "type" : [ "null", "string" ],
    "default" : null
  } ]
}
	at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:147)
	at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100)
	... 31 more
Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key id:2 from old file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-124-6252_20220608044435168.parquet to new file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-171-8477_20220608044759733.parquet with writerSchema {
  "type" : "record",
  "name" : "update_null_test_cow_record",
  "namespace" : "hoodie.update_null_test_cow",
  "fields" : [ {
    "name" : "_hoodie_commit_time",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_commit_seqno",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_record_key",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_partition_path",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_file_name",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "id",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "name",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "note",
    "type" : "string"
  }, {
    "name" : "ts",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "dt",
    "type" : [ "null", "string" ],
    "default" : null
  } ]
}
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
	... 32 more
Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key id:2 from old file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-124-6252_20220608044435168.parquet to new file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-171-8477_20220608044759733.parquet with writerSchema {
  "type" : "record",
  "name" : "update_null_test_cow_record",
  "namespace" : "hoodie.update_null_test_cow",
  "fields" : [ {
    "name" : "_hoodie_commit_time",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_commit_seqno",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_record_key",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_partition_path",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_file_name",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "id",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "name",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "note",
    "type" : "string"
  }, {
    "name" : "ts",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "dt",
    "type" : [ "null", "string" ],
    "default" : null
  } ]
}
	at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:356)
	at org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:122)
	at org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:112)
	at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
	at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	... 3 more
Caused by: java.lang.RuntimeException: Null-value for required field: note
	at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:200)
	at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:171)
	at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
	at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
	at org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:95)
	at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:351)
	... 8 more

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2559)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2508)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2507)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2507)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1149)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1149)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1149)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2747)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2689)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2678)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:938)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2215)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2236)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2255)
	at org.apache.spark.rdd.RDD.$anonfun$take$1(RDD.scala:1449)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
	at org.apache.spark.rdd.RDD.take(RDD.scala:1422)
	at org.apache.spark.rdd.RDD.$anonfun$isEmpty$1(RDD.scala:1557)
	at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
	at org.apache.spark.rdd.RDD.isEmpty(RDD.scala:1557)
	at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:657)
	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:287)
	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:169)
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:115)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
	at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:112)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:108)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:519)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:83)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:519)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:495)
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:108)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:95)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:93)
	at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:136)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:848)
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:382)
	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:355)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:247)
	at org.apache.spark.sql.hudi.command.UpdateHoodieTableCommand.run(UpdateHoodieTableCommand.scala:79)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:115)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
	at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:112)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:108)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:519)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:83)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:519)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:495)
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:108)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:95)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:93)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:221)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:101)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:98)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.zeppelin.spark.SparkSqlInterpreter.internalInterpret(SparkSqlInterpreter.java:106)
	at org.apache.zeppelin.interpreter.AbstractInterpreter.interpret(AbstractInterpreter.java:55)
	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:110)
	at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:849)
	at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:741)
	at org.apache.zeppelin.scheduler.Job.run(Job.java:172)
	at org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:132)
	at org.apache.zeppelin.scheduler.FIFOScheduler.lambda$runJobInScheduler$0(FIFOScheduler.java:42)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322)
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$execute$ecf5068c$1(BaseSparkCommitActionExecutor.java:174)
	at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
	at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386)
	at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498)
	at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:133)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1474)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
	... 3 more
Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key id:2 from old file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-124-6252_20220608044435168.parquet to new file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-171-8477_20220608044759733.parquet with writerSchema {
  "type" : "record",
  "name" : "update_null_test_cow_record",
  "namespace" : "hoodie.update_null_test_cow",
  "fields" : [ {
    "name" : "_hoodie_commit_time",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_commit_seqno",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_record_key",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_partition_path",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_file_name",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "id",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "name",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "note",
    "type" : "string"
  }, {
    "name" : "ts",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "dt",
    "type" : [ "null", "string" ],
    "default" : null
  } ]
}
	at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:102)
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:351)
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:342)
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:315)
	... 28 more
Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key id:2 from old file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-124-6252_20220608044435168.parquet to new file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-171-8477_20220608044759733.parquet with writerSchema {
  "type" : "record",
  "name" : "update_null_test_cow_record",
  "namespace" : "hoodie.update_null_test_cow",
  "fields" : [ {
    "name" : "_hoodie_commit_time",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_commit_seqno",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_record_key",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_partition_path",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_file_name",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "id",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "name",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "note",
    "type" : "string"
  }, {
    "name" : "ts",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "dt",
    "type" : [ "null", "string" ],
    "default" : null
  } ]
}
	at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:147)
	at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100)
	... 31 more
Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key id:2 from old file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-124-6252_20220608044435168.parquet to new file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-171-8477_20220608044759733.parquet with writerSchema {
  "type" : "record",
  "name" : "update_null_test_cow_record",
  "namespace" : "hoodie.update_null_test_cow",
  "fields" : [ {
    "name" : "_hoodie_commit_time",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_commit_seqno",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_record_key",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_partition_path",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_file_name",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "id",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "name",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "note",
    "type" : "string"
  }, {
    "name" : "ts",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "dt",
    "type" : [ "null", "string" ],
    "default" : null
  } ]
}
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
	... 32 more
Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key id:2 from old file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-124-6252_20220608044435168.parquet to new file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-171-8477_20220608044759733.parquet with writerSchema {
  "type" : "record",
  "name" : "update_null_test_cow_record",
  "namespace" : "hoodie.update_null_test_cow",
  "fields" : [ {
    "name" : "_hoodie_commit_time",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_commit_seqno",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_record_key",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_partition_path",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_file_name",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "id",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "name",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "note",
    "type" : "string"
  }, {
    "name" : "ts",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "dt",
    "type" : [ "null", "string" ],
    "default" : null
  } ]
}
	at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:356)
	at org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:122)
	at org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:112)
	at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
	at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	... 3 more
Caused by: java.lang.RuntimeException: Null-value for required field: note
	at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:200)
	at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:171)
	at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
	at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
	at org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:95)
	at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:351)
	... 8 more

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:11 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
nsivabalancommented, Aug 28, 2022

@XuQianJin-Stars @minihippo : can you assist here please.

1reaction
nsivabalancommented, Jun 13, 2022

@XuQianJin-Stars @minihippo : Can you assist here on setting null defaults from spark-sql.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Spark Guide - Apache Hudi
This guide provides a quick peek at Hudi's capabilities using spark-shell. Using Spark datasources, we will walk through.
Read more >
Amazon EMR 6.8 supports Apache Hudi 0.11.1 and ... - AWS
Apache Hudi 0.11.1 on Amazon EMR 6.8 includes support for Spark 3.3.0, ... adding support for update or delete records in Hudi tables...
Read more >
Apache Spark job fails with Parquet column cannot be ...
You are reading data in Parquet format and writing to a Delta table when you get a Parquet column cannot be converted error...
Read more >
HoodieKeyException Is Reported When Data Is ... - 华为云
Is it possible to use a nullable field that contains null records as a primary key when creating a Hudi table?No. HoodieKeyException will...
Read more >
Writing spark DataFrame In Apache Hudi Table - Stack Overflow
Here is a working sample for your question in pyspark: from pyspark.sql import SparkSession from pyspark.sql.functions import lit spark ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found