Consumer offset migration from 0.9.5 KafkaConsumer to 1.x KafkaConsumer
See original GitHub issueWhen using KafkaConsumer
from kafka-python 0.9.5, the consumer offsets are stored in ZooKeeper.
In kafka-python 1.x the offsets are stored in Kafka and it does not pick the offsets up from ZooKeeper, but starts at the end of the topic. This may cause messages to be skipped if the offsets are not migrated. I have tested this with broker version 0.9.
According to Kafka FAQ and documentation the migration is handled by setting dual.commit.enabled=true
in the Java consumer temporarily to force the offset being copied from ZooKeeper to Kafka, then reconfiguring the consumers again to only commit to Kafka. As far as I can tell, kafka-python does not have such option.
What seems to work for us is a simple script that takes the offsets from ZooKeeper and commits them manually to Kafka using the new client, but this needs to be done with all consumers in the consumer group to be down at the same time.
Other options include adding dual commit support to python-kafka or creating new topics for new consumers (the latter requiring to reconfigure producers as well at the time of migration).
I miss a migration guide in the documentation so I’d like to contribute one. What are the recommended steps to upgrade consumers to use the new KafkaConsumer
implementation?
Issue Analytics
- State:
- Created 7 years ago
- Comments:7 (4 by maintainers)
Top GitHub Comments
I vote to close this issue as unlikely to be worth fixing at this point so no use leaving it dangling.
It’s most useful to folks who have complex systems and can’t shut down their consumers. Those folks are likely to have a lot of edge cases where simply adding
dual.commit
won’t solve what they need–for example we want to rename our consumer groups as part of our migration. And most simple implementations have either already migrated or can simply stop their consumers long enough to run a migration script.Personally, if someone does want to add this, I’d also rather see support only added in the
SimpleConsumer
, not inKafkaConsumer
as it adds complexity for a rarely used feature.I was planning to put together a PR for this, but turns out we have multiple consumers internally (custom built, old kafka-python, pykafka, etc), and we’re migrating all of them to using kafka offsets rather than ZK offsets. So a custom script is the better solution for us as well.
I extended another script to support renaming the consumer group as part of the migration, which was another useful feature: https://github.com/apache/kafka/pull/2615