Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] [connector] Use Hudi source Serializable Exception

See original GitHub issue

Search before asking

I had searched in the issues and found no similar issues.

What happened

Use Hudi source Serializable Exception！

SeaTunnel Version

2.0.5-SNAPSHOT

SeaTunnel Config

env {
  spark.app.name = "SeaTunnel"
  spark.master = local
}

source {
  hudi {
      hoodie.datasource.read.paths = "path"
      result_table_name="view_20220215"
  }
}

transform {
}

sink {
  Console {}
}

Running Command

org.apache.seatunnel.example.spark.LocalSparkExample
Use this class to run locally

Error Exception

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Failed to serialize task 1, not attempting to retry it. Exception during serialization: java.io.NotSerializableException: org.apache.hadoop.fs.Path
Serialization stack:
	- object not serializable (class: org.apache.hadoop.fs.Path, value: file:/G:/tmp/2022)
	- element of array (index: 0)
	- array (class [Ljava.lang.Object;, size 1)
	- field (class: scala.collection.mutable.WrappedArray$ofRef, name: array, type: class [Ljava.lang.Object;)
	- object (class scala.collection.mutable.WrappedArray$ofRef, WrappedArray(file:/G:/tmp/2022))
	- writeObject data (class: org.apache.spark.rdd.ParallelCollectionPartition)
	- object (class org.apache.spark.rdd.ParallelCollectionPartition, org.apache.spark.rdd.ParallelCollectionPartition@6e3)
	- field (class: org.apache.spark.scheduler.ResultTask, name: partition, type: interface org.apache.spark.Partition)
	- object (class org.apache.spark.scheduler.ResultTask, ResultTask(1, 0))
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1887)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1875)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1874)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1874)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2108)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2057)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2046)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)

Flink or Spark Version

2.0.5-SNAPSHOT run，unmodified version

Java or Scala Version

2.0.5-SNAPSHOT run，unmodified version

Screenshots

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project’s Code of Conduct

Issue Analytics

State:
Created 2 years ago
Comments:9 (9 by maintainers)

Top GitHub Comments

1reaction

liujinhui1994commented, Feb 21, 2022

I already know what the problem is, it’s a small problem with seatunnel, I will file a PR to fix it

0reactions

bravekongcommented, Nov 1, 2022

@liujinhui1994 hello，May I ask you a question? I reported an error when using hudi as the source，

22/11/01 15:59:56 INFO TableSchemaResolver: Reading schema from hdfs://cluster:8020/hudi_data/person_infos/default/b7e36c6b-e264-4741-acad-9cde64a35aba-0_0-23-23_20221101145532.parquet
22/11/01 15:59:56 ERROR ApplicationMaster: User class threw exception: java.lang.NoSuchFieldError: NULL_VALUE
java.lang.NoSuchFieldError: NULL_VALUE

Top Results From Across the Web

[jira] [Commented] (HUDI-2230) "Task not serializable ...

infra.apache.org > "Task not serializable" exception due to ... Project: Apache Hudi > Issue Type: Bug > Components: metrics > Affects ...

Issues - Apache Hudi - - ASF JIRA

Bug. HUDI-5352Jackson fails to serialize LocalDate when updating Delta Commit ... It could be for a variety of reasons, like a network or...

AWS Glue + Hudi 0.10.1: Bugs - Medium

ZookeeperBasedLockProvider is used. Async Cleaner is enabled. It's very critical bug. The Cleaner fails because ZookeeperBasedLockProvider is not serializable.

Task not serializable: java.io.NotSerializableException when ...

Spark sees that and since methods cannot be serialized on their own, Spark tries to serialize the whole testing class, so that the...

Posts Tagged Data Lake - Programmatic Ponderings

Spring Boot Redis caching uses Java serialization and deserialization. ... open-source data lakes on AWS using a combination of Apache Kafka, Hudi, Spark, ......