Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Kafka consumer process the same message several time if scaled

See original GitHub issue

Hi, all. My code: Consumer settings:

from kafka import KafkaConsumer
from kafka.coordinator.assignors.roundrobin import RoundRobinPartitionAssignor

consumer = KafkaConsumer(
    bootstrap_servers=kafka_host + ":" + kafka_port,
    security_protocol='PLAINTEXT',
    group_id='messages',
    partition_assignment_strategy=[RoundRobinPartitionAssignor],
    auto_offset_reset='latest',
    session_timeout_ms=60000,
    enable_auto_commit=True,
    max_poll_records=100
)

Consumer execution:

for msg in consumer:
        message = json.loads(msg.value.decode("utf-8"))
        if message['msg'] == config['some']['message']:
            # do something
            # nothing else there about kafka consumer

Not default settings in kafka broker:

group.max.session.timeout.ms = 70000

Problem: Done the docker-compose up -d --build --scale my-service=3 In another word, we have 3 Consumers that using the same group_id From the first looks like all works fine messages are shared between several consumers:

my-service_1  | 10:07:13.902 INFO     __main__:33  --- Message from kafka - some my message
my-service_2  | 10:07:13.902 INFO     __main__:33  --- Message from kafka - some my message2
my-service_3  | 10:07:13.902 INFO     __main__:33  --- Message from kafka - some my message3

Approx. The number of messages is 100. Time of execution of logic that was activated by messages from 30 sec to 5-6 minutes. After some time of execution I can see:

my-service_1  | 11:09:18.469 INFO     __main__:33  --- Message from kafka - some my message
my-service_2  | 11:45:39.202 INFO     __main__:33  --- Message from kafka - some my message2
my-service_3  | 10:37:13.902 INFO     __main__:33  --- Message from kafka - some my message3

That is the same message in the same scaled service. I’ve checked for duplicates in kafka-broker-topics - all are unique. In other words looks like that messages haven’t marked as processed. And that services can run infinite time and will process those messages again and again. Note: if I add consumer.commit() in the end. I’ll receive error:

CommitFailedError: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max_poll_interval_ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the rebalance timeout with max_poll_interval_ms, or by reducing the maximum size of batches returned in poll() with max_poll_records.

Maybe I’ve set something wrong? Seeking for some fresh ideas. Thanks a lot for your attention to this issue.

Issue Analytics

State:
Created 3 years ago
Comments:5

Top GitHub Comments

2reactions

tvoinarovskyicommented, Jun 15, 2020

Just wondering max_poll_interval is period to handle max_poll_records or these two settings don’t affect each one to another?

That is correct, you are telling the driver, you will spend at most 3600000 ms to process 10 messages. I would recommend just doing max_poll_records=1 and a lower max_poll_interval_ms thou, as max_poll_interval_ms also determines how much time the group will wait on rebalances, meaning if 1 of your services dies others would wait for 1hour for it to go back up…

1reaction

Sabutobicommented, Jun 15, 2020

@tvoinarovskyi Thanks a lot for explanations. The issue is 100% resolved.