question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] Custom HoodieRecordPayload for use in flink sql

See original GitHub issue
  1. I am trying to use Apache Hudi with Flink sql by following Hudi’s flink guide
  2. The basics are working, but now I need to provide custom implementation of HoodieRecordPayload as suggested on this FAQ.
  3. But when I am passing this config as shown in following listing, it doesn’t work. Basically my custom class (MyHudiPoc.Poc) doesn’t get picked and instead normal behaviour continues.

CREATE TABLE t1(
  uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED,
  name VARCHAR(10),
  age INT,
  ts TIMESTAMP(3),
  `partition` VARCHAR(20)
)
PARTITIONED BY (`partition`)
WITH (
  'connector' = 'hudi',
  'path' = '/tmp/hudi',
  'hoodie.compaction.payload.class' = 'MyHudiPoc.Poc', -- My custom class
  'hoodie.datasource.write.payload.class' = 'MyHudiPoc.Poc',  -- My custom class
  'write.payload.class' = 'MyHudiPoc.Poc',  -- My custom class
  'table.type' = 'MERGE_ON_READ'
);

INSERT INTO t1 VALUES
  ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
  ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
  ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
  ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
  ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
  ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
  ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
  ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');


insert into t1 values
  ('id1','Danny1',27,TIMESTAMP '1970-01-01 00:00:01','par1');

  1. I even tried passing it through /etc/hudi/conf/hudi-default.conf
---
"hoodie.compaction.payload.class": MyHudiPoc.Poc
"hoodie.datasource.write.payload.class": MyHudiPoc.Poc
"write.payload.class": MyHudiPoc.Poc

I am also passing my custom jar while starting flink sql client.

/bin/sql-client.sh embedded \
    -j ../jars/hudi-flink1.15-bundle-0.12.1.jar \
    -j ./plugins/flink-s3-fs-hadoop-1.15.1.jar \
    -j ./plugins/parquet-hive-bundle-1.8.1.jar \
    -j ./plugins/flink-sql-connector-kafka-1.15.1.jar \
    -j my-hudi-poc-1.0-SNAPSHOT.jar \
    shell
  1. I am able to pass my custom class in spark example but not in flink.
  2. Tried with both COW and MOR type of tables.

Any idea what I am doing wrong?

See listing in the question.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
yabha-isomapcommented, Nov 1, 2022

Thanks @complone . Tried with that also, but no luck.

CREATE TABLE t1(
  uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED,
  name VARCHAR(10),
  age INT,
  ts TIMESTAMP(3),
  `partition` VARCHAR(20)
)
PARTITIONED BY (`partition`)
WITH (
  'connector' = 'hudi',
  'path' = '/tmp/hudi',
  'hoodie.compaction.payload.class' = 'gsHudiPoc.Poc', -- My custom class
  'write.payload.class' = 'gsHudiPoc.Poc',  -- My custom class
  'payload.class' = 'gsHudiPoc.Poc',  -- My custom class
  'hoodie.datasource.write.payload.class' = 'gsHudiPoc.Poc',  -- My custom class
  'table.type' = 'COPY_ON_WRITE'
);

Let me try looking into the code of FlinkOptions.java

0reactions
yabha-isomapcommented, Dec 5, 2022

Thanks. I was able to get it to work with DataStream API. One tip for anyone facing this issue, put debugging message in the constructor of the class (and not in any method) to verify if your class is getting picked or not. In my case, class was getting picked but method was not getting called because of some code issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

All Configurations | Apache Hudi
Flink Sql Configs: These configs control the Hudi Flink SQL source/sink ... This can be overridden to a custom class extending HoodieRecordPayload class, ......
Read more >
Configurations - Apache Hudi
Flink Sql Configs: These configs control the Hudi Flink SQL source/sink ... This can be overridden to a custom class extending HoodieRecordPayload class, ......
Read more >
SQL Client | Apache Flink
SQL Client allows users to submit jobs either within the interactive command line or using -f option to execute sql file. In both...
Read more >
Basic Configurations - Apache Hudi
Flink Sql Configs: These configs control the Hudi Flink SQL source/sink ... This can be overridden to a custom class extending HoodieRecordPayload class, ......
Read more >
Writing Data | Apache Hudi
PARTITIONPATH_FIELD_OPT_KEY (Required): Columns to be used for ... you are using default payload of OverwriteWithLatestAvroPayload for HoodieRecordPayload ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found