Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Consumer 'stream API' and sharing consumers

See original GitHub issue

Environment Information

OS [e.g. Mac, Arch, Windows 10]: Linux x86-64
Node Version [e.g. 8.2.1]: 8.11.3
NPM Version [e.g. 5.4.2]: 5.6.0
C++ Toolchain [e.g. Visual Studio, llvm, g++]: g++
node-rdkafka version [e.g. 2.3.3]: 2.3.4

Steps to Reproduce

Follow the documentation, and decide to use the consumer ‘stream API’ in a project that consumes many topics from the same brokers.
Observe that one needs to bump UV_THREADPOOL_SIZE so that node-rdkafka can handle consuming all these topics (see #363, maybe also #332)

What eventually “dawned” on me was that I should look a bit further into what node-rdkafka/rdkafka/kafka are actually doing, and found that the way we use streams is problematic: Each of the streams requires a new Consumer instance, which will create its own needed threads etc. I’ve now changed the implementation to use the ‘standard API’, with some custom batching/multiplexing logic between a single node-rdkafka Consumer and our own per-topic abstractions.

I think it would be nice to mention in the documentation for the stream API explicitly that this will have considerable overhead when consuming more than a few topics, rather than just saying that the standard API is better “for performance”: We’re actually not having that many messages per second to process, so initially I didn’t connect the dots here.

– EDIT: Fixed on mistaken ‘producer’ that should be ‘Consumer’

Issue Analytics

State:
Created 5 years ago
Comments:7 (5 by maintainers)

Top GitHub Comments

1reaction

Tapppicommented, Aug 9, 2018

Essentially we have a multi-tenant application, and decided[*] to have one topic for events, commands, errors, and “query responses” for each tenant. Opening a stream for each quickly got us into resource consumption issues, which we initially counter-acted by bumping UV_THREADPOOL_SIZE. But, ultimately that just doesn’t scale.

Creating a topic per tenant or customer is generally discouraged with Kafka because it leads you to have an ever increasing number of topics on the broker, which will consume lots of resources regardless of the traffic in those topics. You probably want to build multi-tenancy into your application logic, by multiplexing the messages from fewer topic(s) (per message type, domain, priority or whatever else you want to split by) instead of multiplexing from X tenant topics with varying messages. This way you will keep your resource usage reasonable on consumers and most importantly on brokers. You can then scale with partition count.

0reactions

stale[bot]commented, Dec 10, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

Top Results From Across the Web

Developing Custom Consumers with Shared Throughput ...

One of the methods for developing custom Kinesis Data Streams consumers with shared throughout is to use the Amazon Kinesis Data Streams APIs....

Kafka: Consumer API vs Streams API - Stack Overflow

what is the difference between Consumer API and Streams API? ... If there are single consumers, consume the message process but not spill...

How Kafka Streams Works: A Guide to Stream Processing

Kafka Streams is an abstraction over Apache Kafka® producers and ... a consumer group is simply a group of consumers that share a...

How to use Java's functional Consumer interface example

The functional consumer interface is a key part of the Java Streams API. Here is a simple Consumer interface example to show you...

What is Kafka Streams: A Comprehensive Guide - Hevo Data

Kafka Streams API: Use Cases; Working With Kafka Streams API ... decisions to find the best suitable pricing for individual customers.

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Consumer 'stream API' and sharing consumers

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

KafkaConsumer~consume and KafkaConsumerConsumeNum::KafkaConsumerConsumeNum have different interpretations of the timeout/partition end

Order not being respected