question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

what's the problem with consumer group

See original GitHub issue

I use the group as the right way, but it just can’t do the way i want. here is my code,

#!/usr/bin/env python

import sys

from kafka.client import KafkaClient
from kafka.consumer import SimpleConsumer
from kafka.producer import SimpleProducer, KeyedProducer

def main():
    if len(sys.argv) != 2:
        sys.exit(0)

    kafka = KafkaClient("localhost:9092")
    if sys.argv[1] == "put":
        producer = SimpleProducer(kafka)
        resp = producer.send_messages("my-topic", "some message")
        print resp
    elif sys.argv[1] == "get":
        consumer = SimpleConsumer(kafka, "my-foo-group", "my-topic")
        for message in consumer:
            print message

if __name__ == "__main__":
    main()

What i want is , if i send “my-topic” a message, only one consumer can get this message from the group(“my-foo-group”) However, what i found out is, no matter how many consumer process i start, all of them will get this message at the end. Am i wrong or it’s the problem of kafka python client ?

Issue Analytics

  • State:closed
  • Created 9 years ago
  • Comments:9

github_iconTop GitHub Comments

4reactions
mumrahcommented, Aug 22, 2014

Currently, the “high-level” JVM consumers use ZK to coordinate which partitions are read by which threads. Each consuming thread in the JVM consumer will be reading from at least one partition, and these consumer threads can exist across multiple JVMs. This means you can create one logical “consumer group” that consists of several threads across several JVMs, e.g. a topic with 32 partitions could be read by 4 JVMs with 8 threads each and the data would be evenly distributed among the consumers.

The reason we haven’t added this feature is that there is a complex algorithm involving ZooKeeper to make sure a thread is consuming the correct partition at the correct offset. There are plans to redesign this “coordinated consumption” in Kafka so that it does not depend on ZooKeeper. This will make it easier for clients like kafka-python to do this kind of thing.

So, in other words, we’ll have it eventually.

HTH

2reactions
wizzatcommented, Aug 21, 2014

To be clear: Kafka-Python supports offset management and resumption. It does not support having C consumers and P partitions and automatically distributing load without duplicate readers for a message. If you need help getting resuming from an offset working, we’d be glad to help you out.

Read more comments on GitHub >

github_iconTop Results From Across the Web

load balancing - Kafka Issues on consumer group
One important thing we should remember when we work with Apache Kafka is the number of consumers in the same consumer group should...
Read more >
Don't Use Apache Kafka Consumer Groups the Wrong Way!
Having consumers as part of different consumer groups means providing the “publish/subscribe” pattern where the messages from topic partitions ...
Read more >
My Consumer Group Is Not Balanced - Jeppe Andersen Blog
The Kafka consumer groups concept is a great and easy-to-approach abstraction over multi-instance consumption of records from topics.
Read more >
Consumer Group Protocol: Scalability and Fault Tolerance
Consumer Group Protocol. Kafka separates storage from compute. Storage is handled by the brokers and compute is mainly handled by consumers or frameworks ......
Read more >
Complete Guide to Kafka Consumer Group
The maximum number of Consumers is equal to the number of partitions in the topic. If there are more consumers than partitions, then...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found