consumption halted on realtime table when accessing an offset that has been already deleted from Kafka
See original GitHub issuethe realtime consumption of records on realtime tables stops when given realtime table need to access an offset that has been already wiped by kafka, this can happen because the retention policy in the kafka side is lesser than how often we commit segments in pinot, however when this happens, there’s no easy way for us to recover and start consumption again on the realtime table, the log message in the pinot-servers shows the following:
Fetch position xxxx FetchPosition xxx is out of range for partition resetting offset xxxx
consumer plugin: org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory
logs:
Consumed 0 events from (rate:0.0/s), currentOffset=5011308264, numRowsConsumedSoFar=0, numRowsIndexedSoFar=0
[Consumer clientId=consumer-null-12, groupId=null] Seeking to offset 5011308265 for partition xx-xx-xx-xx-3
Consumed 0 events from (rate:0.0/s), currentOffset=5110008164, numRowsConsumedSoFar=0, numRowsIndexedSoFar=0
[Consumer clientId=consumer-null-1, groupId=null] Seeking to offset 5110008165 for partition xx-xx-xx-xx-1
[Consumer clientId=consumer-null-12, groupId=null] Fetch position FetchPosition{offset=5011308265, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[xxx (id: 0 rack: us], epoch=6}} is out of range for partition xx-xx-xx-xx-3, resetting offset
[Consumer clientId=consumer-null-31, groupId=null] Fetch position FetchPosition{offset=4849504882, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[xxx (id: 2 rack: us)], epoch=5}} is out of range for partition xx-xx-xx-xx-5, resetting offset
[Consumer clientId=consumer-null-12, groupId=null] Resetting offset for partition xx-xx-xx-xx-3 to position FetchPosition{offset=546724258, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[xxx (id: 0 rack: us)], epoch=6}}.
[Consumer clientId=consumer-null-31, groupId=null] Resetting offset for partition xx-xx-xx-xx-5 to position FetchPosition{offset=530075133, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[xxx (id: 2 rack: us)], epoch=5}}.
Consumed 0 events from (rate:0.0/s), currentOffset=5063423275, numRowsConsumedSoFar=0, numRowsIndexedSoFar=0
There should be a way to recover from this scenario without having to recreate the table or having to manually update the zk options for the consuming segment which is hard to find in the zk side.
Issue Analytics
- State:
- Created 2 years ago
- Comments:13 (13 by maintainers)
Top GitHub Comments
@richardstartin we have been running latest in our dev environment
@mcvsubbu validation manager doesn’t fix this. I see validation manager checking for this only in the scenario of
if (isAllInstancesInState(instanceStateMap, SegmentStateModel.OFFLINE)) {
@richardstartin I think this is a separate issue. There’s no filtering happening, but Pinot is just seeing it as no data coming in, because the consumer is actually unable to get any data. But might be worth trying out with the fix, I havent verified for sure.