[Bug] [connector] Use Hudi source Serializable Exception
See original GitHub issueSearch before asking
- I had searched in the issues and found no similar issues.
What happened
Use Hudi source Serializable Exception!
SeaTunnel Version
2.0.5-SNAPSHOT
SeaTunnel Config
env {
spark.app.name = "SeaTunnel"
spark.master = local
}
source {
hudi {
hoodie.datasource.read.paths = "path"
result_table_name="view_20220215"
}
}
transform {
}
sink {
Console {}
}
Running Command
org.apache.seatunnel.example.spark.LocalSparkExample
Use this class to run locally
Error Exception
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Failed to serialize task 1, not attempting to retry it. Exception during serialization: java.io.NotSerializableException: org.apache.hadoop.fs.Path
Serialization stack:
- object not serializable (class: org.apache.hadoop.fs.Path, value: file:/G:/tmp/2022)
- element of array (index: 0)
- array (class [Ljava.lang.Object;, size 1)
- field (class: scala.collection.mutable.WrappedArray$ofRef, name: array, type: class [Ljava.lang.Object;)
- object (class scala.collection.mutable.WrappedArray$ofRef, WrappedArray(file:/G:/tmp/2022))
- writeObject data (class: org.apache.spark.rdd.ParallelCollectionPartition)
- object (class org.apache.spark.rdd.ParallelCollectionPartition, org.apache.spark.rdd.ParallelCollectionPartition@6e3)
- field (class: org.apache.spark.scheduler.ResultTask, name: partition, type: interface org.apache.spark.Partition)
- object (class org.apache.spark.scheduler.ResultTask, ResultTask(1, 0))
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1887)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1875)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1874)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1874)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2108)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2057)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2046)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
Flink or Spark Version
2.0.5-SNAPSHOT run,unmodified version
Java or Scala Version
2.0.5-SNAPSHOT run,unmodified version
Screenshots
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project’s Code of Conduct
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (9 by maintainers)
Top Results From Across the Web
[jira] [Commented] (HUDI-2230) "Task not serializable ...
infra.apache.org > "Task not serializable" exception due to ... Project: Apache Hudi > Issue Type: Bug > Components: metrics > Affects ...
Read more >Issues - Apache Hudi - - ASF JIRA
Bug. HUDI-5352Jackson fails to serialize LocalDate when updating Delta Commit ... It could be for a variety of reasons, like a network or...
Read more >AWS Glue + Hudi 0.10.1: Bugs - Medium
ZookeeperBasedLockProvider is used. Async Cleaner is enabled. It's very critical bug. The Cleaner fails because ZookeeperBasedLockProvider is not serializable.
Read more >Task not serializable: java.io.NotSerializableException when ...
Spark sees that and since methods cannot be serialized on their own, Spark tries to serialize the whole testing class, so that the...
Read more >Posts Tagged Data Lake - Programmatic Ponderings
Spring Boot Redis caching uses Java serialization and deserialization. ... open-source data lakes on AWS using a combination of Apache Kafka, Hudi, Spark, ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

I already know what the problem is, it’s a small problem with seatunnel, I will file a PR to fix it
@liujinhui1994 hello,May I ask you a question? I reported an error when using hudi as the source,