Parallel Consumer keeps committing the old offset after OFFSET_OUT_OF_RANGE in auto.offset.reset = latest
See original GitHub issueIssue Description
Hi @astubbs, Thanks for this great library. Recently, I have been consistently seeing the consumer lag on some partitions of the topic when offset become out of range for that partition. This happens even with latest PC version - 0.5.2.2. I’m using KEY ordering.
This might be related to #352. Sample logs below.
- Last committed offset is 86321 and it’s no longer available on broker due to retention policy.
- And it gets reset to latest offset - 88617
12-09-2022 11:48:42.904 TraceId/SpanId: / [pc-broker-poll] INFO org.apache.kafka.clients.consumer.internals.Fetcher.handleOffsetOutOfRange - [Consumer clientId=my-topic-consumer-client, groupId=my-topic-consumer] Fetch position FetchPosition{offset=86321, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[localhost:6668 (id: 1 rack: null)], epoch=596}} is out of range for partition my-topic-4, resetting offset 12-09-2022 11:48:43.148 TraceId/SpanId: / [pc-broker-poll] INFO org.apache.kafka.clients.consumer.internals.SubscriptionState.maybeSeekUnvalidated - [Consumer clientId=my-topic-consumer-client, groupId=my-topic-consumer] Resetting offset for partition my-topic-4 to position FetchPosition{offset=88617, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[localhost:6668 (id: 1 rack: null)], epoch=596}}. ..
New message at offset 88618 get consumed by the consumer
12-09-2022 11:49:32.550 TraceId/SpanId: c65c59c9b8c75d88/b26a1c4eea95894d [pc-pool-3-thread-1] INFO my.test.kafka.consumer.MyConsumer.consume - event=PROCESSING_COMPLETED, Event details: [id=cfd1d7e0-324d-11ed-8e3e-99871a3f782d, name=Event started, partition=4, offset=88618] 12-09-2022 11:49:32.550 TraceId/SpanId: a49e9995c6147711/d466506b2f29b224 [pc-pool-3-thread-1] INFO my.test.kafka.consumer.MyConsumer.consume - event=PROCESSING_COMPLETED, Event details: [id=cbabd4e0-324d-11ed-8e3e-99871a3f782d, name=Event completed, partition=4, offset=88617] 12-09-2022 11:49:32.781 TraceId/SpanId: c2a3cfd96ea76036/37104cfb0345e5b1 [pc-pool-3-thread-3] INFO my.test.kafka.consumer.MyConsumer.consume - event=PROCESSING_COMPLETED, Event details: [id=cff676e0-324d-11ed-8e3e-99871a3f782d, name=Other event created, partition=4, offset=88619] ..
But, old offset 86321 was being repeatedly committed. Hence, the consumer lag happens and it’s continuously growing.
12-09-2022 11:49:35.552 TraceId/SpanId: / [pc-broker-poll] DEBUG io.confluent.parallelconsumer.internal.AbstractOffsetCommitter.retrieveOffsetsAndCommit - Commit starting - find completed work to commit offsets 12-09-2022 11:49:35.552 TraceId/SpanId: / [pc-broker-poll] DEBUG io.confluent.parallelconsumer.internal.AbstractOffsetCommitter.retrieveOffsetsAndCommit - Will commit offsets for 5 partition(s): {my-topic-4=OffsetAndMetadata{offset=86321, leaderEpoch=null, metadata='bgAKAAMACgACAAsBDgACB8Y='}, my-topic-3=OffsetAndMetadata{offset=89148, leaderEpoch=null, metadata='bAAJfgEA'}, my-topic-2=OffsetAndMetadata{offset=88725, leaderEpoch=null, metadata='bAATPIAGAA=='}, my-topic-1=OffsetAndMetadata{offset=107886, leaderEpoch=null, metadata='bAAGIAA='}, my-topic-0=OffsetAndMetadata{offset=89200, leaderEpoch=null, metadata='bgALAAE='}} 12-09-2022 11:49:35.552 TraceId/SpanId: / [pc-broker-poll] DEBUG io.confluent.parallelconsumer.internal.AbstractOffsetCommitter.retrieveOffsetsAndCommit - Begin commit ..
- Is it possible that PC is not aware of “auto.offset.reset=latest” event and highest committable offset somehow remains 86321 and that’s why it gets committed with every commit loop?
- Is it possible for PC to be able to recover from this situation just like normal consumer does and be aware of “auto.offset.reset=latest” and commit the offsets of new messages as it consumes, avoiding consumer lag.
- Under normal circumstances, are there any possible exceptional scenarios where we can run into consumer lag and PC is not able to recover from it hence the lag continue to grow? E.g. what happens if it somehow failed to commit an offset, how can PC recover from this without restarting the consumer?
Issue Analytics
- State:
- Created 5 months ago
- Comments:22 (13 by maintainers)
Hi @astubbs, thanks for your responses and working on the fix.
yes. I will open a new issue.
Currently, I’m a bit busy with BAU works. Will test it in a few days and update.
Please ignore. I just saw you have already released the new version 4 days ago. Will try that out. Thanks!