Help with Reading Kafka topic written using Debezium Connector - Deltastreamer
See original GitHub issueHi Team,
Im facing this use case where I need to ingest data from kafka topic usinf Deltastreamer which is loaded using Debezium connector. So the topic contains schema which contains fields like before, after, ts_ms, op, source
etc. Im providing record key as after.id
and precombine key with after.timestamp
but still the entire debezium output is being ingested.
Please find my properties
hoodie.upsert.shuffle.parallelism=2
hoodie.insert.shuffle.parallelism=2
hoodie.delete.shuffle.parallelism=2
hoodie.bulkinsert.shuffle.parallelism=2
hoodie.embed.timeline.server=true
hoodie.filesystem.view.type=EMBEDDED_KV_STORE
hoodie.compact.inline=false
# Key fields, for kafka example
hoodie.datasource.write.recordkey.field=after.inc_id
hoodie.datasource.write.partitionpath.field=date
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator
# Schema provider props (change to absolute path based on your installation)
#hoodie.deltastreamer.schemaprovider.source.schema.file=/var/demo/config/schema.avsc
#hoodie.deltastreamer.schemaprovider.target.schema.file=/var/demo/config/schema.avsc
# Kafka Source
hoodie.deltastreamer.source.kafka.topic=airflow.public.motor_crash_violation_incidents
#Kafka props
bootstrap.servers=http://xxxxx:29092
auto.offset.reset=earliest
hoodie.deltastreamer.schemaprovider.registry.url=http://xxxxx:8081/subjects/airflow.public.motor_crash_violation_incidents-value/versions/latest
#hoodie.deltastreamer.schemaprovider.registry.targetUrl=http://xxxxx:8081/subjects/airflow.public.motor_crash_violation_incidents-value/versions/latest
schema.registry.url=http://xxxxx:8081
validate.non.null = false
Issue Analytics
- State:
- Created 3 years ago
- Comments:50 (21 by maintainers)
Top Results From Across the Web
Change Data Capture with Debezium and Apache Hudi
The Debezium connector continuously polls the changelogs from the database and writes an AVRO message with the changes for each database row to ......
Read more >Topic Routing :: Debezium Documentation
The topic routing transformation is a Kafka Connect SMT. Use case. The default behavior is that a Debezium connector sends each change event...
Read more >[GitHub] [hudi] bvaradar commented on issue #2149: Help with ...
[GitHub] [hudi] bvaradar commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer.
Read more >Building Open Data Lakes on AWS with Debezium and ...
Build an open-source data lake on AWS using a combination of Debezium, Apache Kafka, Apache Hudi, Apache Spark, and Apache Hive ...
Read more >Debezium, Apache Kafka, Hudi, Spark, and Hive on AWS
In this video demonstration, we will build a simple open data lake on AWS using a combination of open-source software, including Debezium ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@ashishmgofficial : You need to plugin a transformer class to only select the columns you need and record-payload to handle deletions. We are currently in the process of adding the transformer to OSS Hudi but broadly here is how it will look like (thanks to @joshk-kang).
gist :
@vinothchandar Sorry I took so long to respond . It had worked and compiled successfully . I probably had missed something at the time .
Thanks for your response at the time .