question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[question] Even if I recreate the kafka topic or modify the topic properties, I wonder how consuming can continue to do it.

See original GitHub issue

Hello~ I am using pinot well. Thank you for making a great product. 😃 I have a question because it didn’t work as I thought while using it.

Even if I recreate the kafka topic or modify the topic properties, I wonder how consuming can continue to do it.

I am storing data in Stream ingestion way. for some reason I need to delete the kafka topic and recreate it. The consuming segment seems to have stopped. That is, no more data is stored. I’ve tested it several times with the same scenario, but the consuming segment still stops and no data is saved.

So, I tried several methods to solve this problem, and among various attempts, 1 When I disable the data table 2 delete and recreate the kafka topic 3 enable the data table sometimes the consuming segment recovers its operation. However, it does not always work normally.

Also, I tried reload segment after comsuming segment stopped, but it still didn’t work. In addition, I tried various methods, but consuming stopped as it is. Also, I restarted the docker cluster several times, but the consuming segment still did not work. I tried various methods besides these, but couldn’t find a solution.

My guess is that when the kafka topic is recreated, the data offset is changed and this is what happens. In my opinion, if the offset is well reset within the pinot consumer, even if the kafka topic is recreated, it is normal when the comsuming segment continues to accumulate data well. Even if I recreate the kafka topic or modify the properties, I wonder how consuming can continue to do it. I don’t know the internal logic well, but I looked at the org.apache.pinot.core.realtime.impl.kafka2.KafkaConsumerFactory, KafkaPartitionLevelConsumer, KafkaStreamLevelConsumer class codes, but couldn’t find any problem.

If the consuming segment stops, re-creating the table may be a way, but since the previously stored data is lost, I am looking for a way to keep the comsuming segment operating normally and not lose data without re-creating the table.

Note that The docker image version currently used is as follows.

{
  "pinot-protobuf": "0.8.0-SNAPSHOT-46009e152b8f56c244e415beefa81dbc626de7cb",
  "pinot-kafka-2.0": "0.8.0-SNAPSHOT-46009e152b8f56c244e415beefa81dbc626de7cb",
  "pinot-avro": "0.8.0-SNAPSHOT-46009e152b8f56c244e415beefa81dbc626de7cb",
  "pinot-distribution": "0.8.0-SNAPSHOT-46009e152b8f56c244e415beefa81dbc626de7cb",
  "pinot-csv": "0.8.0-SNAPSHOT-46009e152b8f56c244e415beefa81dbc626de7cb",
  "pinot-s3": "0.8.0-SNAPSHOT-46009e152b8f56c244e415beefa81dbc626de7cb",
  "pinot-yammer": "0.8.0-SNAPSHOT-46009e152b8f56c244e415beefa81dbc626de7cb",
  "pinot-segment-uploader-default": "0.8.0-SNAPSHOT-46009e152b8f56c244e415beefa81dbc626de7cb",
  "pinot-batch-ingestion-standalone": "0.8.0-SNAPSHOT-46009e152b8f56c244e415beefa81dbc626de7cb",
  "pinot-confluent-avro": "0.8.0-SNAPSHOT-46009e152b8f56c244e415beefa81dbc626de7cb",
  "pinot-thrift": "0.8.0-SNAPSHOT-46009e152b8f56c244e415beefa81dbc626de7cb",
  "pinot-orc": "0.8.0-SNAPSHOT-46009e152b8f56c244e415beefa81dbc626de7cb",
  "pinot-batch-ingestion-spark": "0.8.0-SNAPSHOT-46009e152b8f56c244e415beefa81dbc626de7cb",
  "pinot-azure": "0.8.0-SNAPSHOT-46009e152b8f56c244e415beefa81dbc626de7cb",
  "pinot-gcs": "0.8.0-SNAPSHOT-46009e152b8f56c244e415beefa81dbc626de7cb",
  "pinot-batch-ingestion-hadoop": "0.8.0-SNAPSHOT-46009e152b8f56c244e415beefa81dbc626de7cb",
  "pinot-hdfs": "0.8.0-SNAPSHOT-46009e152b8f56c244e415beefa81dbc626de7cb",
  "pinot-adls": "0.8.0-SNAPSHOT-46009e152b8f56c244e415beefa81dbc626de7cb",
  "pinot-kinesis": "0.8.0-SNAPSHOT-46009e152b8f56c244e415beefa81dbc626de7cb",
  "pinot-json": "0.8.0-SNAPSHOT-46009e152b8f56c244e415beefa81dbc626de7cb",
  "pinot-minion-builtin-tasks": "0.8.0-SNAPSHOT-46009e152b8f56c244e415beefa81dbc626de7cb",
  "pinot-parquet": "0.8.0-SNAPSHOT-46009e152b8f56c244e415beefa81dbc626de7cb",
  "pinot-segment-writer-file-based": "0.8.0-SNAPSHOT-46009e152b8f56c244e415beefa81dbc626de7cb"
}

table confg

{
  "REALTIME": {
    "tableName": "systemMetricLong_REALTIME",
    "tableType": "REALTIME",
    "segmentsConfig": {
      "timeType": "MILLISECONDS",
      "schemaName": "systemMetricLong",
      "retentionTimeUnit": "DAYS",
      "retentionTimeValue": "2",
      "timeColumnName": "timestampInEpoch",
      "replicasPerPartition": "1"
    },
    "tenants": {
      "broker": "DefaultTenant",
      "server": "DefaultTenant"
    },
    "tableIndexConfig": {
      "loadMode": "MMAP",
      "sortedColumn": [
        "applicationName"
      ],
      "autoGeneratedInvertedIndex": false,
      "createInvertedIndexDuringSegmentGeneration": false,
      "streamConfigs": {
        "streamType": "kafka",
        "stream.kafka.consumer.type": "lowlevel",
        "stream.kafka.topic.name": "system-metric-long",
        "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
        "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
        "stream.kafka.broker.list": XXXXXXX
        "realtime.segment.flush.threshold.raws": "0",
        "realtime.segment.flush.threshold.time": "24h",
        "realtime.segment.flush.threshold.segment.size": "50M",
        "stream.kafka.consumer.prop.auto.offset.reset": "smallest"
      },
      "invertedIndexColumns": [
        "tags"
      ],
      "rangeIndexColumns": [
        "timestampInEpoch"
      ],
      "aggregateMetrics": false,
      "nullHandlingEnabled": true,
      "enableDefaultStarTree": false,
      "enableDynamicStarTreeCreation": false
    },
    "metadata": {
      "customConfigs": {}
    },
    "isDimTable": false
  }
}

I know you are busy developing, but I hope you can help. I’ve been looking for a solution for a week, but I can’t find a way.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

4reactions
sajjad-moradicommented, Aug 30, 2022

There are two types of properties when it comes to changing the stream configs:

  1. Changes that modify the underlying stream like topic name or cluster name change.
  2. Stream compatible changes that don’t modify the underlying stream like segment.threshold parameters

The problem with first type is that offsets of different partitions change completely when the underlying stream changes. Pause/resume feature - that recently merged into master (#8986 and #9289) - can help here. For these incompatible parameter changes, the resume request has an option to handle the case of a completely new set of offsets. Operators can now follow a three-step process: First, issue a Pause request. Second, change the consumption parameters. Finally, issue the Resume request with the appropriate option. These steps will preserve the old data and allow the new data to be consumed immediately. All through the operation, queries will continue to be served.

For the 2nd type, force commit endpoint #9197 can be used. The current consuming segments which hold the previous values in stream config will be immediately completed and new consuming segment will be spun off. These new consuming segments will pick up the new values in the stream config.

0reactions
minwoo-jungcommented, Jul 5, 2021

Hi @@mcvsubbu

I am writing to get answers to the above questions. You’re so busy, but can you answer me when you have time?

I am a developer developing an open source pinpoint apm. We are developing to collect system metric data, store and analyze raw data in pinot to show meaningful data to users. We reviewed and looked at pinot for a long time, so we decided that it could be a good repository, and we are developing a metric collection function.

If you give answers to the above questions, I think we can make good functions using pinot. So, have a nice day and thank you:)

Read more comments on GitHub >

github_iconTop Results From Across the Web

[question] Even if I recreate the kafka topic or modify ... - GitHub
Even if I recreate the kafka topic or modify the topic properties, I wonder how consuming can continue to do it. I am...
Read more >
Kafka topic is getting reappeared after 10 sec of deletion
It seems deleting a kafka topic is still has some bugs. * The only way to delete a topic permanently is as follows:...
Read more >
Documentation - Apache Kafka
Topics in Kafka are always multi-subscriber; that is, a topic can have zero, one, ... But the messages are still available for consumption...
Read more >
Kafka Streams Application Reset Tool
You can reset an application and force it to reprocess its data from scratch by using the application reset tool. This can be...
Read more >
Using Lambda with self-managed Apache Kafka
To monitor the throughput of your Kafka topic, you can view the Apache Kafka consumer metrics, such as consumer_lag and consumer_offset . To...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found