Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Subscribe :: Consume exception: Local: Maximum application poll interval (max.poll.interval.ms) exceeded

See original GitHub issue

Description

We’ve been getting this error since upgrading to beta3:

Subscribe :: Consume exception: Local: Maximum application poll interval (max.poll.interval.ms) exceeded - Local: Maximum application poll interval (max.poll.interval.ms) exceeded (ConsumeException) - Confluent.Kafka.ConsumeException: Local: Maximum application poll interval (max.poll.interval.ms) exceeded
   at Confluent.Kafka.Consumer`2.ConsumeImpl[K,V](Int32 millisecondsTimeout, IDeserializer`1 keyDeserializer, IDeserializer`1 valueDeserializer)
   at Confluent.Kafka.Consumer`2.Consume(CancellationToken cancellationToken)
   at Wayfair.Common.MessageQueue.Kafka.KafkaSubscriberChannel`2.Subscribe(IEnumerable`1 targetQueues, Action`2 messageHandler, ISubscriberEventHandler eventHandler).

We kept upping the max.poll.interval value to try to determine if we have a process that is taking way too long. Currently the value is set to 30 minutes. (This seems far longer than any process we would have running).

We are mostly out of ideas and wanted to see if there was a recommendation for how to solve this issue?

How to reproduce

Use the config specified below. After a fairly long period of time, the error will occur. I’m not sure exactly how long, but in the past, I’ve seen within an hour. Other times I am guessing a few hours to show up.

Checklist

Please provide the following information:

Confluent.Kafka nuget version: v1.0-beta3
Apache Kafka version:
Client configuration:

new ConsumerConfig {
  EnableAutoCommit = true,
  EnableAutoOffsetStore = false,
  HeartbeatIntervalMs = 3000,
  AutoCommitIntervalMs = 5000,
  AutoOffsetReset = AutoOffsetReset.Earliest,
  MaxPollIntervalMs = (int?) TimeSpan.FromMinutes(30).TotalMilliseconds
}

And…

new ConsumerBuilder<TKey, TMessage>(config)
                .SetKeyDeserializer(keyDeserializer)
                .SetValueDeserializer(valueDeserializer)
                .SetErrorHandler(eventHandler.OnError)
                .SetLogHandler(eventHandler.OnLog)
                .SetOffsetsCommittedHandler(eventHandler.OnOffsetsCommitted)
                .SetStatisticsHandler(eventHandler.OnStatistics)
                .SetRebalanceHandler((c, e) =>
                {
                    if (e.IsAssignment)
                    {

                        c.Assign(e.Partitions);
                        eventHandler.OnPartitionsAssigned(c, e);
                    }
                    else
                    {
                        c.Unassign();
                        eventHandler.OnPartitionsRevoked(c, e);
                    }
                })
                .Build())

Operating system: Windows and Linux
Provide logs (with “debug” : “…” as necessary in configuration)
Provide broker log excerpts
Critical issue

Issue Analytics

State:
Created 5 years ago
Comments:14 (8 by maintainers)

Top GitHub Comments

4reactions

edenhillcommented, Apr 7, 2020

The point of max.poll.interval.ms is to provide a heartbeat between the application and the consumer: if the application has not called poll/consume (heartbeated) in this long the application is deemed dead/stalled/stuck/malfunctional and the consumer will leave the group so the assigned partitions can be assigned to a live application instance.

max.poll.interval.ms should thus be set to the maximum (plus some) theoretical processing time.

4reactions

vinodrescommented, Sep 24, 2019

@mhowlett I am running into a similar issue. I have just upgraded the Confluent.Kafka to v 1.1.0. Here is the related log message

Application maximum poll interval (300000ms) exceeded by 375ms (adjust max.poll.interval.ms for long-running message processing): leaving group

My question is, what is the best way to recover from this situation from within the code without recycling the windows service in which the consumer is running.

Some messages are going to take longer to process and instead of adjusting max.poll.interval.ms, is there a way to force the consumer to reconnect when this issue occurs? Is there a way to detect this and then recover from it?

Top Results From Across the Web

Kafka consumer gets stuck after exceeding max.poll. ...

The consumer process hangs and does not consume any more messages. The following error message gets logged. MAXPOLL|rdkafka#consumer-1| [thrd: ...

[Python] How to capture Application maximum poll interval ...

Hello, My microservice uses confluent-kafka-python. Once in a while it fails with this error %4|1654121013.314|MAXPOLL|rdkafka#consumer-1| ...

Long-Running Jobs - Karafka framework documentation

Long-Running Jobs. When working with Kafka, there is a setting called max.poll.interval.ms . It is the maximum delay between invocations of poll() commands....

Kafka Consumer configuration reference

max.poll.interval.ms¶. The maximum delay between invocations of poll() when using consumer group management. This places an upper bound on the amount of time ......

Recommended configurations for Apache Kafka clients

Increase poll processing timeout ( max.poll.interval.ms ); Decrease message batch size to speed up processing; Improve processing ...