question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Persist raw data from kafka topic as it is

See original GitHub issue

Feature request: To save raw data from topic in pinot table.

Use case : We have lots of complex schemas and we are using pinot for saving and retreiving topic data with times stamp and some other fields. We do not want to map all nested columns from complex schema and create pinot schema and use lots of transformation functions. There are some places we want raw data as it is in pinot table.

Sample data :

{ "header": { "tid": "12wee", "rid": 1, "timestamp": 1647347092337 }, "status": "200_SUCCESS", "jasData": { "sdata": -22.89122, "cnn": 0.823469, "kli": 2.238848, "olp": [ { "ovPerc": 0.032486767, "hg": 30.0, "abshi": 6.661863 } ], "terrkl": { "ovPerc": 0.9675132, "dist": [ -25.17232, -25.17232, -25.130081 ] }, "bcut": 2.77 }, "rgData": { "pre": 102033.33, "pv": 0.16, "t": 287.36, "timestamp": 1647347069000 }, "timestamp": 1647347092337 } }

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:9 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
saumya2700commented, Apr 28, 2022

@Jackie-Jiang This sounds good to me. @saumya2700 If you’re not working on this, I can pick this up?

yes Please go ahead.

1reaction
Jackie-Jiangcommented, Mar 29, 2022

We may consider adding a new config in the IngestionConfig to store the json string of the record into a field. The logic needs to be implemented into the RecordExtractor

Read more comments on GitHub >

github_iconTop Results From Across the Web

How persistence works in an Apache Kafka deployment
Data retention can be controlled by the Kafka server and by per-topic configuration parameters. The retention of the data can be controlled by ......
Read more >
It's Okay To Store Data In Kafka - Confluent
Data in Kafka is persisted to disk, checksummed, and replicated for fault tolerance. Accumulating more stored data doesn't make it slower.
Read more >
Using Kafka as a Temporary Data Store and Data-loss ...
The period during which the data is stored by Kafka is called retention. Theoretically, you can set this period to forever. Kafka also...
Read more >
Read data from Kafka topic and write into local persistent in NiFi
This recipe helps you read data from Kafka topic store and write into local persistent storage in NiFi.
Read more >
Ingesting Raw Data with Kafka-connect and Spark Datasets
Then, since we have Kafka in place, using Kafka-connect allows us to perform this raw data layer ETL without writing a single line...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found