question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Consumer 'stream API' and sharing consumers

See original GitHub issue

Environment Information

  • OS [e.g. Mac, Arch, Windows 10]: Linux x86-64
  • Node Version [e.g. 8.2.1]: 8.11.3
  • NPM Version [e.g. 5.4.2]: 5.6.0
  • C++ Toolchain [e.g. Visual Studio, llvm, g++]: g++
  • node-rdkafka version [e.g. 2.3.3]: 2.3.4

Steps to Reproduce

  1. Follow the documentation, and decide to use the consumer ‘stream API’ in a project that consumes many topics from the same brokers.
  2. Observe that one needs to bump UV_THREADPOOL_SIZE so that node-rdkafka can handle consuming all these topics (see #363, maybe also #332)

What eventually “dawned” on me was that I should look a bit further into what node-rdkafka/rdkafka/kafka are actually doing, and found that the way we use streams is problematic: Each of the streams requires a new Consumer instance, which will create its own needed threads etc. I’ve now changed the implementation to use the ‘standard API’, with some custom batching/multiplexing logic between a single node-rdkafka Consumer and our own per-topic abstractions.

I think it would be nice to mention in the documentation for the stream API explicitly that this will have considerable overhead when consuming more than a few topics, rather than just saying that the standard API is better “for performance”: We’re actually not having that many messages per second to process, so initially I didn’t connect the dots here.

– EDIT: Fixed on mistaken ‘producer’ that should be ‘Consumer’

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
Tapppicommented, Aug 9, 2018

Essentially we have a multi-tenant application, and decided[*] to have one topic for events, commands, errors, and “query responses” for each tenant. Opening a stream for each quickly got us into resource consumption issues, which we initially counter-acted by bumping UV_THREADPOOL_SIZE. But, ultimately that just doesn’t scale.

Creating a topic per tenant or customer is generally discouraged with Kafka because it leads you to have an ever increasing number of topics on the broker, which will consume lots of resources regardless of the traffic in those topics. You probably want to build multi-tenancy into your application logic, by multiplexing the messages from fewer topic(s) (per message type, domain, priority or whatever else you want to split by) instead of multiplexing from X tenant topics with varying messages. This way you will keep your resource usage reasonable on consumers and most importantly on brokers. You can then scale with partition count.

0reactions
stale[bot]commented, Dec 10, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Developing Custom Consumers with Shared Throughput ...
One of the methods for developing custom Kinesis Data Streams consumers with shared throughout is to use the Amazon Kinesis Data Streams APIs....
Read more >
Kafka: Consumer API vs Streams API - Stack Overflow
what is the difference between Consumer API and Streams API? ... If there are single consumers, consume the message process but not spill...
Read more >
How Kafka Streams Works: A Guide to Stream Processing
Kafka Streams is an abstraction over Apache Kafka® producers and ... a consumer group is simply a group of consumers that share a...
Read more >
How to use Java's functional Consumer interface example
The functional consumer interface is a key part of the Java Streams API. Here is a simple Consumer interface example to show you...
Read more >
What is Kafka Streams: A Comprehensive Guide - Hevo Data
Kafka Streams API: Use Cases; Working With Kafka Streams API ... decisions to find the best suitable pricing for individual customers.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found