KafkaProducer produces corrupt "double-compressed" messages on retry when compression is enabled. KafkaConsumer gets "stuck" consuming them
See original GitHub issueThis is an interesting one.
In all of our topics every day a handful of partitions get “stuck”. Basically the reading of the partition stops at a given message and kafka-python reports that there are no more messages in the given partition (just like it would have consumed all messages), while there are unconsumed messages. The only way to get the consumers moving again is to manually seek the offset forward by stepping over the “stuck” messages and then works again for a few million records and then get stuck again at some later offset.
I have multiple consumers consuming from the same topic and they all get stuck at the same messages of the same topics. Random number of partitions are affected day-to-day.
We are using Kafka broker version 0.9.0.1, kafka-python 1.2.1 (had the same issue with 1.1.1).
The consumer code is very simple (the below code is trying to read only partition #1, which is currently “stuck”):
kafka_consumer = KafkaConsumer(
group_id=kafka_group_id,
bootstrap_servers=kafka_servers,
enable_auto_commit=False,
consumer_timeout_ms=10000,
fetch_max_wait_ms=10*1000,
request_timeout_ms=10*1000
)
topics = [TopicPartition(topic, 1)]
kafka_consumer.assign(topics)
for message in kafka_consumer:
print(message)
print("Completed")
The above code prints “Completed”, but not the messages, while there is a 5M offset lag in partition 1, so there would be plenty of messages to read. After seeking the consumer offset forward the code works again until it doesn’t get “stuck” again.
Issue Analytics
- State:
- Created 7 years ago
- Comments:41 (24 by maintainers)
Top GitHub Comments
Thanks again to everyone for all the hard work tracking this one down!!
Absolutely! I’ve already landed to master a fix for producer.