question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

consumption halted on realtime table when accessing an offset that has been already deleted from Kafka

See original GitHub issue

the realtime consumption of records on realtime tables stops when given realtime table need to access an offset that has been already wiped by kafka, this can happen because the retention policy in the kafka side is lesser than how often we commit segments in pinot, however when this happens, there’s no easy way for us to recover and start consumption again on the realtime table, the log message in the pinot-servers shows the following:

Fetch position xxxx FetchPosition xxx is out of range for partition resetting offset xxxx

consumer plugin: org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory

logs:

Consumed 0 events from (rate:0.0/s), currentOffset=5011308264, numRowsConsumedSoFar=0, numRowsIndexedSoFar=0
[Consumer clientId=consumer-null-12, groupId=null] Seeking to offset 5011308265 for partition xx-xx-xx-xx-3
Consumed 0 events from (rate:0.0/s), currentOffset=5110008164, numRowsConsumedSoFar=0, numRowsIndexedSoFar=0
[Consumer clientId=consumer-null-1, groupId=null] Seeking to offset 5110008165 for partition xx-xx-xx-xx-1
[Consumer clientId=consumer-null-12, groupId=null] Fetch position FetchPosition{offset=5011308265, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[xxx (id: 0 rack: us], epoch=6}} is out of range for partition xx-xx-xx-xx-3, resetting offset
[Consumer clientId=consumer-null-31, groupId=null] Fetch position FetchPosition{offset=4849504882, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[xxx (id: 2 rack: us)], epoch=5}} is out of range for partition xx-xx-xx-xx-5, resetting offset
[Consumer clientId=consumer-null-12, groupId=null] Resetting offset for partition xx-xx-xx-xx-3 to position FetchPosition{offset=546724258, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[xxx (id: 0 rack: us)], epoch=6}}.
[Consumer clientId=consumer-null-31, groupId=null] Resetting offset for partition xx-xx-xx-xx-5 to position FetchPosition{offset=530075133, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[xxx (id: 2 rack: us)], epoch=5}}.
Consumed 0 events from (rate:0.0/s), currentOffset=5063423275, numRowsConsumedSoFar=0, numRowsIndexedSoFar=0

There should be a way to recover from this scenario without having to recreate the table or having to manually update the zk options for the consuming segment which is hard to find in the zk side.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:13 (13 by maintainers)

github_iconTop GitHub Comments

1reaction
lfernandez93commented, Feb 17, 2022

@richardstartin we have been running latest in our dev environment

1reaction
npawarcommented, Feb 17, 2022

@mcvsubbu validation manager doesn’t fix this. I see validation manager checking for this only in the scenario of if (isAllInstancesInState(instanceStateMap, SegmentStateModel.OFFLINE)) { @richardstartin I think this is a separate issue. There’s no filtering happening, but Pinot is just seeing it as no data coming in, because the consumer is actually unable to get any data. But might be worth trying out with the fix, I havent verified for sure.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[question] Even if I recreate the kafka topic or modify ... - GitHub
for some reason I need to delete the kafka topic and recreate it. The consuming segment seems to have stopped. That is, no...
Read more >
Chapter 4. Kafka Consumers: Reading Data from Kafka
This property controls the behavior of the consumer when it starts reading a partition for which it doesn't have a committed offset or...
Read more >
Data Reprocessing with the Streams API in Kafka - Confluent
The quick answer is you can do this either manually (cumbersome and error-prone) or you can use the new application reset tool for...
Read more >
Documentation - Apache Kafka
The Kafka cluster durably persists all published records—whether or not they have been consumed—using a configurable retention period. For example, if the ...
Read more >
Offset Management For Apache Kafka With Apache Spark ...
Case 2: Long running streaming job had been stopped and new partitions are added to a kafka topic. Function queries the zookeeper to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found