question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] Read Hudi data with flink-1.13.6 and report java.lang.NoSuchMethodError

See original GitHub issue

Describe the problem you faced It is normal to write data with flink, and you can read the written data with hive, but you can’t read it with flink itself,The following exception is reported:

image

Environment Description

  • Hudi version : 0.11.0

  • Flink version : 1.13.6

  • Hive version : 2.1.1-cdh6.2.0

  • Hadoop version : 3.0.0-cdh6.2.0

  • Storage (HDFS/S3/GCS…) : HDFS

  • Running on Docker? (yes/no) : no

Additional context Hudi CREATE TABLE statement:

CREATE TABLE t2(
    uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED,
    name VARCHAR(10),
    age INT,
    ts timestamp(3),
    part VARCHAR(20)
)
WITH ( 
    'connector' = 'hudi',
    'path' = 'hdfs:///user/hive/warehouse/hudi.db/t2',
    'table.type' = 'MERGE_ON_READ',
    'hoodie.datasource.write.recordkey.field'= 'uuid',
    'write.precombine.field'= 'ts',
    'write.tasks' = '1',
    'write.rate.limit' = '2000',
    'compaction.tasks' = '1',
    'compaction.async.enabled' = 'true',
    'compaction.trigger.strategy' = 'num_commits',
    'compaction.delta_commits' = '1',
    'changleog.enabled' = 'true',
    'read.streaming.enabled'= 'true',
    'read.streaming.check-interval'= '3',
    'hive_sync.enable' = 'true',     -- Required. To enable hive synchronization
    'hive_sync.mode' = 'hms',        -- Required. Setting hive sync mode to hms, default jdbc
    'hive_sync.metastore.uris' = 'thrift://xxx:9083', -- Required. The port need set on hive-site.xml
    'hive_sync.jdbc_url' = 'jdbc://hive2://xxx:10000',
    'hive_sync.table'='t2',                  -- required, hive table name
    'hive_sync.db'='hudi',
    'hive_sync.username' = '',
    'hive_sync.password' = '',
    'hive_sync.support_timstamp' = 'true'
);

Query T2 table with SQL client of Flink:

select * from t2;

Stacktrace More error information

2022-05-10 15:41:49,278 INFO  org.apache.hadoop.io.compress.CodecPool                      [] - Got brand-new decompressor [.gz]
2022-05-10 15:41:49,463 WARN  org.apache.flink.runtime.taskmanager.Task                    [] - split_reader -> NotNullEnforcer(fields=[uuid]) (3/4)#0 (de4312b557275e636b33cacdeca84148) switched from RUNNING to FAILED with failure cause: java.lang.NoSuchMethodError: org.apache.parquet.bytes.BytesInput.toInputStream()Lorg/apache/parquet/bytes/ByteBufferInputStream;
	at org.apache.flink.formats.parquet.vector.reader.AbstractColumnReader.readPageV1(AbstractColumnReader.java:211)
	at org.apache.flink.formats.parquet.vector.reader.AbstractColumnReader.readToVector(AbstractColumnReader.java:156)
	at org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.nextBatch(ParquetColumnarRowSplitReader.java:311)
	at org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.ensureBatch(ParquetColumnarRowSplitReader.java:287)
	at org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.reachedEnd(ParquetColumnarRowSplitReader.java:266)
	at org.apache.hudi.table.format.mor.MergeOnReadInputFormat$BaseFileOnlyFilteringIterator.reachedEnd(MergeOnReadInputFormat.java:509)
	at org.apache.hudi.table.format.mor.MergeOnReadInputFormat.reachedEnd(MergeOnReadInputFormat.java:245)
	at org.apache.hudi.source.StreamReadOperator.consumeAsMiniBatch(StreamReadOperator.java:186)
	at org.apache.hudi.source.StreamReadOperator.processSplits(StreamReadOperator.java:166)
	at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
	at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsNonBlocking(MailboxProcessor.java:359)
	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:323)
	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:202)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:684)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:639)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:623)
	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
	at java.lang.Thread.run(Thread.java:748)

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
punish-yhcommented, Oct 14, 2022

i got same situation, there is my config: Hudi version : 0.11.0

Flink version : 1.13.6

Hive version : 2.1.1-cdh6.3.2

Hadoop version : 3.0.0-cdh6.3.2

Storage (HDFS/S3/GCS…) : HDFS

i append flink-parquet dependency can fix it. if you could not find class conflict maybe try my wey

    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-parquet_2.11</artifactId>
        <version>1.13.6</version>
    </dependency>
0reactions
danny0405commented, Nov 8, 2022

We removed the parquet shade pattern since 0.11.0 for hudi-flink-bundle, maybe we should add it back 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

java.lang.NoSuchMethodError in Flink - Stack Overflow
There is a conflict with dependencies. Apache Flink loads many classes by default into its classpath. Please read this article ...
Read more >
Flink Guide - Apache Hudi
This guide helps you quickly start using Flink on Hudi, and learn different modes for reading/writing Hudi by Flink: Quick Start : Read...
Read more >
FlinkCDC-Hudi:Mysql数据实时入湖全攻略一:初试风云
一、背景FlinkCDC是基于Flink开发的变化数据获取组件(Change data capture),目前支持mysql、PostgreSQL、mongoDB、TiDB、Oracle等数据库的同步。
Read more >
Create a low-latency source-to-data lake pipeline using ...
Create a low-latency source-to-data lake pipeline using Amazon MSK Connect, Apache Flink, and Apache Hudi. by Ali Alemi | on 01 MAR 2022...
Read more >
Technical questions and Answers
Everyone, Flink CDC flink1.13.6 connects to oracle only supports SID mode, ... I use to read Oracle11c data has a bug when submitting...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found