question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] [connector] Use Hudi source Serializable Exception

See original GitHub issue

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

Use Hudi source Serializable Exception!

SeaTunnel Version

2.0.5-SNAPSHOT

SeaTunnel Config

env {
  spark.app.name = "SeaTunnel"
  spark.master = local
}

source {
  hudi {
      hoodie.datasource.read.paths = "path"
      result_table_name="view_20220215"
  }
}

transform {
}

sink {
  Console {}
}

Running Command

org.apache.seatunnel.example.spark.LocalSparkExample
Use this class to run locally

Error Exception

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Failed to serialize task 1, not attempting to retry it. Exception during serialization: java.io.NotSerializableException: org.apache.hadoop.fs.Path
Serialization stack:
	- object not serializable (class: org.apache.hadoop.fs.Path, value: file:/G:/tmp/2022)
	- element of array (index: 0)
	- array (class [Ljava.lang.Object;, size 1)
	- field (class: scala.collection.mutable.WrappedArray$ofRef, name: array, type: class [Ljava.lang.Object;)
	- object (class scala.collection.mutable.WrappedArray$ofRef, WrappedArray(file:/G:/tmp/2022))
	- writeObject data (class: org.apache.spark.rdd.ParallelCollectionPartition)
	- object (class org.apache.spark.rdd.ParallelCollectionPartition, org.apache.spark.rdd.ParallelCollectionPartition@6e3)
	- field (class: org.apache.spark.scheduler.ResultTask, name: partition, type: interface org.apache.spark.Partition)
	- object (class org.apache.spark.scheduler.ResultTask, ResultTask(1, 0))
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1887)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1875)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1874)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1874)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2108)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2057)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2046)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)

Flink or Spark Version

2.0.5-SNAPSHOT run,unmodified version

Java or Scala Version

2.0.5-SNAPSHOT run,unmodified version

Screenshots

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
liujinhui1994commented, Feb 21, 2022

I already know what the problem is, it’s a small problem with seatunnel, I will file a PR to fix it

0reactions
bravekongcommented, Nov 1, 2022

@liujinhui1994 hello,May I ask you a question? I reported an error when using hudi as the source,

22/11/01 15:59:56 INFO TableSchemaResolver: Reading schema from hdfs://cluster:8020/hudi_data/person_infos/default/b7e36c6b-e264-4741-acad-9cde64a35aba-0_0-23-23_20221101145532.parquet
22/11/01 15:59:56 ERROR ApplicationMaster: User class threw exception: java.lang.NoSuchFieldError: NULL_VALUE
java.lang.NoSuchFieldError: NULL_VALUE
Read more comments on GitHub >

github_iconTop Results From Across the Web

[jira] [Commented] (HUDI-2230) "Task not serializable ...
infra.apache.org > "Task not serializable" exception due to ... Project: Apache Hudi > Issue Type: Bug > Components: metrics > Affects ...
Read more >
Issues - Apache Hudi - - ASF JIRA
Bug. HUDI-5352Jackson fails to serialize LocalDate when updating Delta Commit ... It could be for a variety of reasons, like a network or...
Read more >
AWS Glue + Hudi 0.10.1: Bugs - Medium
ZookeeperBasedLockProvider is used. Async Cleaner is enabled. It's very critical bug. The Cleaner fails because ZookeeperBasedLockProvider is not serializable.
Read more >
Task not serializable: java.io.NotSerializableException when ...
Spark sees that and since methods cannot be serialized on their own, Spark tries to serialize the whole testing class, so that the...
Read more >
Posts Tagged Data Lake - Programmatic Ponderings
Spring Boot Redis caching uses Java serialization and deserialization. ... open-source data lakes on AWS using a combination of Apache Kafka, Hudi, Spark, ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found