question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot start more than 5 consumers sequentially

See original GitHub issue

Given the following code:

var Kafka = require('node-rdkafka');
var _ = require('underscore');
var P = require('bluebird');

P.each(_.range(10), idx => {
  console.log(`Connecting consumer ${idx}`);

  var consumer = new Kafka.KafkaConsumer({
    'metadata.broker.list': 'localhost:9092',
    'group.id': `test-group-${idx}`
  }, {});

  return new P((resolve, reject) => {
    consumer
      .on('ready', () => {
        console.log(`Consumer ${idx} ready`);

        consumer.subscribe(['TEST']);
        consumer.consume();

        resolve(consumer);
      })
      .on('error', err => {
        console.error('Consumer error: ' + err);
      });

    consumer.connect();
  });
});

I get the following output:

Connecting consumer 0
Consumer 0 ready
Connecting consumer 1
Consumer 1 ready
Connecting consumer 2
Consumer 2 ready
Connecting consumer 3
Consumer 3 ready
Connecting consumer 4

(then it hangs).

If the connection is done in parallel (remove return statement before new P), it works fine:

Connecting consumer 0
Connecting consumer 1
Connecting consumer 2
Connecting consumer 3
Connecting consumer 4
Connecting consumer 5
Connecting consumer 6
Connecting consumer 7
Connecting consumer 8
Connecting consumer 9
Consumer 3 ready
Consumer 2 ready
Consumer 0 ready
Consumer 1 ready
Consumer 4 ready
Consumer 5 ready
Consumer 6 ready
Consumer 7 ready
Consumer 8 ready
Consumer 9 ready

If it’s not a bug, then what am I doing wrong?

Kafka version 1.0.0.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:1
  • Comments:8 (2 by maintainers)

github_iconTop GitHub Comments

3reactions
ankoncommented, Mar 21, 2018

I think I’ve run into the same problem, and tried the same change of the variable, and had the same “doesn’t change things”.

From the debugging however I could see that we clearly fetched all the messages, they just took ages to arrive at my consumer. Profiles showed that the time was “somewhere” in “(idle)”/“syscalls”. So the theory above really made sense, and I checked: Turns out the variable name @webmakersteve suggested was almost right 😃

http://docs.libuv.org/en/latest/threadpool.html:

Its default size is 4, but it can be changed at startup time by setting the UV_THREADPOOL_SIZE environment variable to any value (the absolute maximum is 128).

Setting UV_THREADPOOL_SIZE to 8 vastly improved the performance for me.

3reactions
webmakerstevecommented, Mar 10, 2018

The reason this is happening is because consumers using the consume loop, i.e. using .consume() with no parameters, need to hold onto a thread in the libuv event loop. If you want to do this you need to increase the libuv threadpool size by setting process.env.UV_THREADPOOL to a number greater than 4.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Kafka: Single consumer group in multiple instances
In Apache Kafka why can't there be more consumer instances than partitions? In Kafka, a partition can be assigned only to one consumer...
Read more >
How to Overcome Data Order Issues in Apache Kafka
Each record added to a partition is assigned an offset, a unique sequential ID. The challenge of receiving data in the order you...
Read more >
Chapter 4. Kafka Consumers: Reading Data from Kafka
If we add more consumers to a single group with a single topic than we have partitions, some of the consumers will be...
Read more >
How can Kafka consumers parallelise beyond the number of ...
Yes, we may not be able to run more number of consumers beyond the number of partitions. However, parallelism could also be achieved...
Read more >
Multi-Threaded Messaging with the Apache Kafka Consumer
This is usually achieved by scaling: using multiple consumers within the same group, each processing data from a subset of topic partitions ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found