Kafka consumer process the same message several time if scaled
See original GitHub issueHi, all. My code: Consumer settings:
from kafka import KafkaConsumer
from kafka.coordinator.assignors.roundrobin import RoundRobinPartitionAssignor
consumer = KafkaConsumer(
bootstrap_servers=kafka_host + ":" + kafka_port,
security_protocol='PLAINTEXT',
group_id='messages',
partition_assignment_strategy=[RoundRobinPartitionAssignor],
auto_offset_reset='latest',
session_timeout_ms=60000,
enable_auto_commit=True,
max_poll_records=100
)
Consumer execution:
for msg in consumer:
message = json.loads(msg.value.decode("utf-8"))
if message['msg'] == config['some']['message']:
# do something
# nothing else there about kafka consumer
Not default settings in kafka broker:
group.max.session.timeout.ms = 70000
Problem:
Done the
docker-compose up -d --build --scale my-service=3
In another word, we have 3 Consumers that using the same group_id
From the first looks like all works fine messages are shared between several consumers:
my-service_1 | 10:07:13.902 INFO __main__:33 --- Message from kafka - some my message
my-service_2 | 10:07:13.902 INFO __main__:33 --- Message from kafka - some my message2
my-service_3 | 10:07:13.902 INFO __main__:33 --- Message from kafka - some my message3
Approx. The number of messages is 100. Time of execution of logic that was activated by messages from 30 sec to 5-6 minutes. After some time of execution I can see:
my-service_1 | 11:09:18.469 INFO __main__:33 --- Message from kafka - some my message
my-service_2 | 11:45:39.202 INFO __main__:33 --- Message from kafka - some my message2
my-service_3 | 10:37:13.902 INFO __main__:33 --- Message from kafka - some my message3
That is the same message in the same scaled service.
I’ve checked for duplicates in kafka-broker-topics - all are unique.
In other words looks like that messages haven’t marked as processed. And that services can run infinite time and will process those messages again and again.
Note: if I add consumer.commit()
in the end.
I’ll receive error:
CommitFailedError: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max_poll_interval_ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the rebalance timeout with max_poll_interval_ms, or by reducing the maximum size of batches returned in poll() with max_poll_records.
Maybe I’ve set something wrong? Seeking for some fresh ideas. Thanks a lot for your attention to this issue.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5
That is correct, you are telling the driver, you will spend at most 3600000 ms to process 10 messages. I would recommend just doing
max_poll_records=1
and a lowermax_poll_interval_ms
thou, asmax_poll_interval_ms
also determines how much time the group will wait on rebalances, meaning if 1 of your services dies others would wait for 1hour for it to go back up…@tvoinarovskyi Thanks a lot for explanations. The issue is 100% resolved.