Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

what's the problem with consumer group

See original GitHub issue

I use the group as the right way, but it just can’t do the way i want. here is my code,

#!/usr/bin/env python

import sys

from kafka.client import KafkaClient
from kafka.consumer import SimpleConsumer
from kafka.producer import SimpleProducer, KeyedProducer

def main():
    if len(sys.argv) != 2:
        sys.exit(0)

    kafka = KafkaClient("localhost:9092")
    if sys.argv[1] == "put":
        producer = SimpleProducer(kafka)
        resp = producer.send_messages("my-topic", "some message")
        print resp
    elif sys.argv[1] == "get":
        consumer = SimpleConsumer(kafka, "my-foo-group", "my-topic")
        for message in consumer:
            print message

if __name__ == "__main__":
    main()

What i want is , if i send “my-topic” a message, only one consumer can get this message from the group(“my-foo-group”) However, what i found out is, no matter how many consumer process i start, all of them will get this message at the end. Am i wrong or it’s the problem of kafka python client ?

Issue Analytics

State:
Created 9 years ago
Comments:9

Top GitHub Comments

4reactions

mumrahcommented, Aug 22, 2014

Currently, the “high-level” JVM consumers use ZK to coordinate which partitions are read by which threads. Each consuming thread in the JVM consumer will be reading from at least one partition, and these consumer threads can exist across multiple JVMs. This means you can create one logical “consumer group” that consists of several threads across several JVMs, e.g. a topic with 32 partitions could be read by 4 JVMs with 8 threads each and the data would be evenly distributed among the consumers.

The reason we haven’t added this feature is that there is a complex algorithm involving ZooKeeper to make sure a thread is consuming the correct partition at the correct offset. There are plans to redesign this “coordinated consumption” in Kafka so that it does not depend on ZooKeeper. This will make it easier for clients like kafka-python to do this kind of thing.

So, in other words, we’ll have it eventually.

HTH

2reactions

wizzatcommented, Aug 21, 2014

To be clear: Kafka-Python supports offset management and resumption. It does not support having C consumers and P partitions and automatically distributing load without duplicate readers for a message. If you need help getting resuming from an offset working, we’d be glad to help you out.

Top Results From Across the Web

load balancing - Kafka Issues on consumer group

One important thing we should remember when we work with Apache Kafka is the number of consumers in the same consumer group should...

Don't Use Apache Kafka Consumer Groups the Wrong Way!

Having consumers as part of different consumer groups means providing the “publish/subscribe” pattern where the messages from topic partitions ...

My Consumer Group Is Not Balanced - Jeppe Andersen Blog

The Kafka consumer groups concept is a great and easy-to-approach abstraction over multi-instance consumption of records from topics.

Consumer Group Protocol: Scalability and Fault Tolerance

Consumer Group Protocol. Kafka separates storage from compute. Storage is handled by the brokers and compute is mainly handled by consumers or frameworks ......

Complete Guide to Kafka Consumer Group

The maximum number of Consumers is equal to the number of partitions in the topic. If there are more consumers than partitions, then...