question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parallel Consumer keeps committing the old offset after OFFSET_OUT_OF_RANGE in auto.offset.reset = latest

See original GitHub issue

Hi @astubbs, Thanks for this great library. Recently, I have been consistently seeing the consumer lag on some partitions of the topic when offset become out of range for that partition. This happens even with latest PC version - 0.5.2.2. I’m using KEY ordering.

This might be related to #352. Sample logs below.

  • Last committed offset is 86321 and it’s no longer available on broker due to retention policy.
  • And it gets reset to latest offset - 88617

12-09-2022 11:48:42.904 TraceId/SpanId: / [pc-broker-poll] INFO org.apache.kafka.clients.consumer.internals.Fetcher.handleOffsetOutOfRange - [Consumer clientId=my-topic-consumer-client, groupId=my-topic-consumer] Fetch position FetchPosition{offset=86321, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[localhost:6668 (id: 1 rack: null)], epoch=596}} is out of range for partition my-topic-4, resetting offset 12-09-2022 11:48:43.148 TraceId/SpanId: / [pc-broker-poll] INFO org.apache.kafka.clients.consumer.internals.SubscriptionState.maybeSeekUnvalidated - [Consumer clientId=my-topic-consumer-client, groupId=my-topic-consumer] Resetting offset for partition my-topic-4 to position FetchPosition{offset=88617, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[localhost:6668 (id: 1 rack: null)], epoch=596}}. ..

New message at offset 88618 get consumed by the consumer

12-09-2022 11:49:32.550 TraceId/SpanId: c65c59c9b8c75d88/b26a1c4eea95894d [pc-pool-3-thread-1] INFO my.test.kafka.consumer.MyConsumer.consume - event=PROCESSING_COMPLETED, Event details: [id=cfd1d7e0-324d-11ed-8e3e-99871a3f782d, name=Event started, partition=4, offset=88618] 12-09-2022 11:49:32.550 TraceId/SpanId: a49e9995c6147711/d466506b2f29b224 [pc-pool-3-thread-1] INFO my.test.kafka.consumer.MyConsumer.consume - event=PROCESSING_COMPLETED, Event details: [id=cbabd4e0-324d-11ed-8e3e-99871a3f782d, name=Event completed, partition=4, offset=88617] 12-09-2022 11:49:32.781 TraceId/SpanId: c2a3cfd96ea76036/37104cfb0345e5b1 [pc-pool-3-thread-3] INFO my.test.kafka.consumer.MyConsumer.consume - event=PROCESSING_COMPLETED, Event details: [id=cff676e0-324d-11ed-8e3e-99871a3f782d, name=Other event created, partition=4, offset=88619] ..

But, old offset 86321 was being repeatedly committed. Hence, the consumer lag happens and it’s continuously growing.

12-09-2022 11:49:35.552 TraceId/SpanId: / [pc-broker-poll] DEBUG io.confluent.parallelconsumer.internal.AbstractOffsetCommitter.retrieveOffsetsAndCommit - Commit starting - find completed work to commit offsets 12-09-2022 11:49:35.552 TraceId/SpanId: / [pc-broker-poll] DEBUG io.confluent.parallelconsumer.internal.AbstractOffsetCommitter.retrieveOffsetsAndCommit - Will commit offsets for 5 partition(s): {my-topic-4=OffsetAndMetadata{offset=86321, leaderEpoch=null, metadata='bgAKAAMACgACAAsBDgACB8Y='}, my-topic-3=OffsetAndMetadata{offset=89148, leaderEpoch=null, metadata='bAAJfgEA'}, my-topic-2=OffsetAndMetadata{offset=88725, leaderEpoch=null, metadata='bAATPIAGAA=='}, my-topic-1=OffsetAndMetadata{offset=107886, leaderEpoch=null, metadata='bAAGIAA='}, my-topic-0=OffsetAndMetadata{offset=89200, leaderEpoch=null, metadata='bgALAAE='}} 12-09-2022 11:49:35.552 TraceId/SpanId: / [pc-broker-poll] DEBUG io.confluent.parallelconsumer.internal.AbstractOffsetCommitter.retrieveOffsetsAndCommit - Begin commit ..

  • Is it possible that PC is not aware of “auto.offset.reset=latest” event and highest committable offset somehow remains 86321 and that’s why it gets committed with every commit loop?
  • Is it possible for PC to be able to recover from this situation just like normal consumer does and be aware of “auto.offset.reset=latest” and commit the offsets of new messages as it consumes, avoiding consumer lag.
  • Under normal circumstances, are there any possible exceptional scenarios where we can run into consumer lag and PC is not able to recover from it hence the lag continue to grow? E.g. what happens if it somehow failed to commit an offset, how can PC recover from this without restarting the consumer?

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:22 (13 by maintainers)

github_iconTop GitHub Comments

1reaction
leonoah86commented, Oct 13, 2022

Hi @astubbs, thanks for your responses and working on the fix.

please open a new issue and we can discuss it there

yes. I will open a new issue.

Can you please test the snapshot version?

Currently, I’m a bit busy with BAU works. Will test it in a few days and update.

0reactions
leonoah86commented, Nov 7, 2022

Please ignore. I just saw you have already released the new version 4 days ago. Will try that out. Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Consumer Auto Offsets Reset Behavior | Learn Kafka with ...
Kafka consumers have a configuration for how to behave when they don't have a previously committed offset. This can happen if the consumer...
Read more >
KAFKA - Effect of auto.offset.reset when same consumer ...
I have auto. offset. reset=earliest and this means that when there is no valid offset details can be retrieved for a consumer group,...
Read more >
kafka.consumer package — kafka-python 1.1.0 documentation
Reset partition offsets upon OffsetOutOfRangeError. Valid values are largest and smallest. Otherwise, do not reset the offsets and raise OffsetOutOfRangeError.
Read more >
kafka - Go Documentation Server
NOTE: The consumer will keep trying to fetch new messages for the partition. * `OffsetsCommitted` - Offset commit results (when `enable.auto.commit` is ...
Read more >
Kafka client terminated with OffsetOutOfRangeException
apache.kafka.clients.consumer.OffsetOutOfRangeException error message. Cause. Your Spark application is trying to fetch expired data offsets ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found