question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT]KryoException when bulk insert into hudi with flink

See original GitHub issue

When bulk insert into hudi with flink, flink job fail with Exception com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException

– hudi table DDL CREATE TEMPORARY TABLE table_one ( imp_date string, id bigint, name string, ts timestamp(3) ) PARTITIONED BY (imp_date) WITH ( ‘connector’ = ‘hudi’, ‘path’ = ${hdfs_path}, ‘write.operation’ = ‘bulk_insert’, ‘table.type’ = ‘MERGE_ON_READ’, ‘hoodie.table.keygenerator.class’ = ‘org.apache.hudi.keygen.SimpleKeyGenerator’, ‘hoodie.datasource.write.recordkey.field’ = ‘id’, ‘write.precombine.field’ = ‘ts’, ‘hive_sync.enable’ = ‘true’, ‘hive_sync.mode’ = ‘hms’, ‘hive_sync.metastore.uris’ = ‘thrift://…’, ‘hive_sync.db’ = ‘hive_db’, ‘hive_sync.table’ = ‘table_one’, ‘hive_sync.partition_fields’ = ‘imp_date’, ‘hive_sync.partition_extractor_class’ = ‘org.apache.hudi.hive.MultiPartKeysValueExtractor’, ‘hoodie.datasource.write.hive_style_partitioning’ = ‘true’, ‘hoodie.metadata.enable’=‘true’ );

– insert SQL insert into table_one select
DATE_FORMAT(ts, ‘yyyyMMdd’) || cast(hour(ts) as string) as dt ,id ,name ,ts from source_table;

Environment Description

  • Hudi version : 0.11 & 0.12

  • Flink version : 1.13

  • Storage (HDFS/S3/GCS…) : HDFS

Stacktrace

com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException Serialization trace: cleaner (org.apache.flink.core.memory.MemorySegment) segments (org.apache.flink.table.data.binary.BinaryRowData) at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:82) at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495) at com.esotericsoftware.kryo.Kryo.writeObjectOrNull(Kryo.java:577) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:320) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:289) at com.esotericsoftware.kryo.Kryo.writeObjectOrNull(Kryo.java:577) at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:68) at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:505) at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.copy(KryoSerializer.java:266) at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:69) at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:46) at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:26) at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50) at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28) at org.apache.flink.table.runtime.util.StreamRecordCollector.collect(StreamRecordCollector.java:44) at org.apache.hudi.sink.bulk.sort.SortOperator.endInput(SortOperator.java:113) at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.endOperatorInput(StreamOperatorWrapper.java:91) at org.apache.flink.streaming.runtime.tasks.OperatorChain.endInput(OperatorChain.java:441) at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:69) at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:427) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:688) at org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:643) at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:654) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:627) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:782) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException at com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:80) at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:488) at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:57) … 28 more

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
hbgstc123commented, Aug 30, 2022

override def getRegistration(klass: Class[_]) = if (isJavaLambda(klass)) { getClassResolver.getRegistration(classOf[ClosureSerializer.Closure]) } else super.getRegistration(klass)

This may be a flink problem, in com.twitter.chill.KryoBase, if the above code enter first branch, it will try to get serializer from a map in classResolver and not checking if the result is null, then it may cause a NPE.

0reactions
danny0405commented, Sep 7, 2022

Thanks, the problem expects to be fixed in #6571, feel free to reopen it if the problem still exists.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Flink Guide - Apache Hudi
This guide helps you quickly start using Flink on Hudi, ... Creates a Flink Hudi table first and insert data into the Hudi...
Read more >
Use Flink Hudi to Build a Streaming Data Lake Platform
Users can efficiently import batch data into the lake format at one time and use the writing program connected to the stream to...
Read more >
[jira] [Commented] (HUDI-2209) Bulk insert for flink writer
... hudi-flink/src/main/java/org/apache/hudi/sink/bulk/BulkInsertWriterHelper.java ########## @@ -0,0 +1,192 @@ +/* + * Licensed to the ...
Read more >
class "org.apache.flink.streaming.api.operators ...
The hudi-flink-bundle jar is archived with scala 2.11, so it's recommended to use flink 1.13.x bundled with scala 2.11. Share. Follow.
Read more >
Create a low-latency source-to-data lake pipeline using ...
Copy -On-Write (COW) – These tables are common for batch processing. ... Configure Flink with Kafka and Hudi table connectors.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found