question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Consumer offset migration from 0.9.5 KafkaConsumer to 1.x KafkaConsumer

See original GitHub issue

When using KafkaConsumer from kafka-python 0.9.5, the consumer offsets are stored in ZooKeeper. In kafka-python 1.x the offsets are stored in Kafka and it does not pick the offsets up from ZooKeeper, but starts at the end of the topic. This may cause messages to be skipped if the offsets are not migrated. I have tested this with broker version 0.9.

According to Kafka FAQ and documentation the migration is handled by setting dual.commit.enabled=true in the Java consumer temporarily to force the offset being copied from ZooKeeper to Kafka, then reconfiguring the consumers again to only commit to Kafka. As far as I can tell, kafka-python does not have such option.

What seems to work for us is a simple script that takes the offsets from ZooKeeper and commits them manually to Kafka using the new client, but this needs to be done with all consumers in the consumer group to be down at the same time.

Other options include adding dual commit support to python-kafka or creating new topics for new consumers (the latter requiring to reconfigure producers as well at the time of migration).

I miss a migration guide in the documentation so I’d like to contribute one. What are the recommended steps to upgrade consumers to use the new KafkaConsumer implementation?

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
jeffwidmancommented, Mar 9, 2017

I vote to close this issue as unlikely to be worth fixing at this point so no use leaving it dangling.

It’s most useful to folks who have complex systems and can’t shut down their consumers. Those folks are likely to have a lot of edge cases where simply adding dual.commit won’t solve what they need–for example we want to rename our consumer groups as part of our migration. And most simple implementations have either already migrated or can simply stop their consumers long enough to run a migration script.

Personally, if someone does want to add this, I’d also rather see support only added in the SimpleConsumer, not in KafkaConsumer as it adds complexity for a rarely used feature.

1reaction
jeffwidmancommented, Mar 3, 2017

I was planning to put together a PR for this, but turns out we have multiple consumers internally (custom built, old kafka-python, pykafka, etc), and we’re migrating all of them to using kafka offsets rather than ZK offsets. So a custom script is the better solution for us as well.

I extended another script to support renaming the consumer group as part of the migration, which was another useful feature: https://github.com/apache/kafka/pull/2615

Read more comments on GitHub >

github_iconTop Results From Across the Web

KafkaConsumer (kafka 2.2.0 API)
Offsets and Consumer Position. Kafka maintains a numerical offset for each record in a partition. This offset acts as a unique identifier of...
Read more >
kafka-python Documentation - Read the Docs
KafkaConsumer is a high-level message consumer, ... join a consumer group for dynamic partition assignment and offset commits.
Read more >
Changelog — kafka-python 2.0.2-dev documentation
The KafkaConsumer iterator implementation has been greatly simplified so that it just wraps consumer.poll(). The prior implementation will remain available ...
Read more >
Migrating from Chill 0.6.0 (Kryo 2.21) to 0.9.5 (Kryo 4.0.2) and ...
It turned out, that the message packages were renamed and therefore Kryo was unable to find correct classes.
Read more >
Kafka Consumer | Confluent Documentation
By default, the consumer is configured to auto-commit offsets. Using auto-commit gives you “at least once” delivery: Kafka guarantees that no messages will...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found