question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

KafkaProducer produces corrupt "double-compressed" messages on retry when compression is enabled. KafkaConsumer gets "stuck" consuming them

See original GitHub issue

This is an interesting one.

In all of our topics every day a handful of partitions get “stuck”. Basically the reading of the partition stops at a given message and kafka-python reports that there are no more messages in the given partition (just like it would have consumed all messages), while there are unconsumed messages. The only way to get the consumers moving again is to manually seek the offset forward by stepping over the “stuck” messages and then works again for a few million records and then get stuck again at some later offset.

I have multiple consumers consuming from the same topic and they all get stuck at the same messages of the same topics. Random number of partitions are affected day-to-day.

We are using Kafka broker version 0.9.0.1, kafka-python 1.2.1 (had the same issue with 1.1.1).

The consumer code is very simple (the below code is trying to read only partition #1, which is currently “stuck”):

kafka_consumer = KafkaConsumer(
    group_id=kafka_group_id,
    bootstrap_servers=kafka_servers,
    enable_auto_commit=False,
    consumer_timeout_ms=10000,
    fetch_max_wait_ms=10*1000,
    request_timeout_ms=10*1000
)
topics = [TopicPartition(topic, 1)]
kafka_consumer.assign(topics)

for message in kafka_consumer:
    print(message)

print("Completed")

The above code prints “Completed”, but not the messages, while there is a 5M offset lag in partition 1, so there would be plenty of messages to read. After seeking the consumer offset forward the code works again until it doesn’t get “stuck” again.

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:41 (24 by maintainers)

github_iconTop GitHub Comments

4reactions
dpkpcommented, Jul 16, 2016

Thanks again to everyone for all the hard work tracking this one down!!

1reaction
dpkpcommented, Jul 15, 2016

Absolutely! I’ve already landed to master a fix for producer.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to handle the messages in Kafka producer if the retries ...
Store the User Data in the Database. Send a message to Kafka. Cases: If the producer's request succeeded, i.e message store it in...
Read more >
kafka-python Documentation - Read the Docs
To produce or consume lz4 compressed messages, ... If that happens, the consumer can get stuck trying to fetch a large message on...
Read more >
Documentation - Apache Kafka
The Streams API allows an application to act as a stream processor, consuming an input stream from one or more topics and producing...
Read more >
Kafka Producer Retries | Learn Apache Kafka with Conduktor
Kafka producer retries help ensure messages aren't needlessly dropped. ... It is desirable to enable retries in order to ensure that no messages...
Read more >
Can Your Kafka Consumers Handle a Poison Pill? - Confluent
A poison pill (in the context of Kafka) is a record that has been produced to a Kafka topic and always fails when...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found