Cannot start more than 5 consumers sequentially
See original GitHub issueGiven the following code:
var Kafka = require('node-rdkafka');
var _ = require('underscore');
var P = require('bluebird');
P.each(_.range(10), idx => {
console.log(`Connecting consumer ${idx}`);
var consumer = new Kafka.KafkaConsumer({
'metadata.broker.list': 'localhost:9092',
'group.id': `test-group-${idx}`
}, {});
return new P((resolve, reject) => {
consumer
.on('ready', () => {
console.log(`Consumer ${idx} ready`);
consumer.subscribe(['TEST']);
consumer.consume();
resolve(consumer);
})
.on('error', err => {
console.error('Consumer error: ' + err);
});
consumer.connect();
});
});
I get the following output:
Connecting consumer 0
Consumer 0 ready
Connecting consumer 1
Consumer 1 ready
Connecting consumer 2
Consumer 2 ready
Connecting consumer 3
Consumer 3 ready
Connecting consumer 4
(then it hangs).
If the connection is done in parallel (remove return
statement before new P
), it works fine:
Connecting consumer 0
Connecting consumer 1
Connecting consumer 2
Connecting consumer 3
Connecting consumer 4
Connecting consumer 5
Connecting consumer 6
Connecting consumer 7
Connecting consumer 8
Connecting consumer 9
Consumer 3 ready
Consumer 2 ready
Consumer 0 ready
Consumer 1 ready
Consumer 4 ready
Consumer 5 ready
Consumer 6 ready
Consumer 7 ready
Consumer 8 ready
Consumer 9 ready
If it’s not a bug, then what am I doing wrong?
Kafka version 1.0.0.
Issue Analytics
- State:
- Created 6 years ago
- Reactions:1
- Comments:8 (2 by maintainers)
Top Results From Across the Web
Kafka: Single consumer group in multiple instances
In Apache Kafka why can't there be more consumer instances than partitions? In Kafka, a partition can be assigned only to one consumer...
Read more >How to Overcome Data Order Issues in Apache Kafka
Each record added to a partition is assigned an offset, a unique sequential ID. The challenge of receiving data in the order you...
Read more >Chapter 4. Kafka Consumers: Reading Data from Kafka
If we add more consumers to a single group with a single topic than we have partitions, some of the consumers will be...
Read more >How can Kafka consumers parallelise beyond the number of ...
Yes, we may not be able to run more number of consumers beyond the number of partitions. However, parallelism could also be achieved...
Read more >Multi-Threaded Messaging with the Apache Kafka Consumer
This is usually achieved by scaling: using multiple consumers within the same group, each processing data from a subset of topic partitions ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I think I’ve run into the same problem, and tried the same change of the variable, and had the same “doesn’t change things”.
From the debugging however I could see that we clearly fetched all the messages, they just took ages to arrive at my consumer. Profiles showed that the time was “somewhere” in “(idle)”/“syscalls”. So the theory above really made sense, and I checked: Turns out the variable name @webmakersteve suggested was almost right 😃
http://docs.libuv.org/en/latest/threadpool.html:
Setting
UV_THREADPOOL_SIZE
to 8 vastly improved the performance for me.The reason this is happening is because consumers using the consume loop, i.e. using
.consume()
with no parameters, need to hold onto a thread in thelibuv
event loop. If you want to do this you need to increase the libuv threadpool size by settingprocess.env.UV_THREADPOOL
to a number greater than 4.