Consumer 'stream API' and sharing consumers
See original GitHub issueEnvironment Information
- OS [e.g. Mac, Arch, Windows 10]: Linux x86-64
- Node Version [e.g. 8.2.1]: 8.11.3
- NPM Version [e.g. 5.4.2]: 5.6.0
- C++ Toolchain [e.g. Visual Studio, llvm, g++]: g++
- node-rdkafka version [e.g. 2.3.3]: 2.3.4
Steps to Reproduce
- Follow the documentation, and decide to use the consumer ‘stream API’ in a project that consumes many topics from the same brokers.
- Observe that one needs to bump
UV_THREADPOOL_SIZE
so that node-rdkafka can handle consuming all these topics (see #363, maybe also #332)
What eventually “dawned” on me was that I should look a bit further into what node-rdkafka/rdkafka/kafka are actually doing, and found that the way we use streams is problematic: Each of the streams requires a new Consumer instance, which will create its own needed threads etc. I’ve now changed the implementation to use the ‘standard API’, with some custom batching/multiplexing logic between a single node-rdkafka Consumer and our own per-topic abstractions.
I think it would be nice to mention in the documentation for the stream API explicitly that this will have considerable overhead when consuming more than a few topics, rather than just saying that the standard API is better “for performance”: We’re actually not having that many messages per second to process, so initially I didn’t connect the dots here.
– EDIT: Fixed on mistaken ‘producer’ that should be ‘Consumer’
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (5 by maintainers)
Top GitHub Comments
Creating a topic per tenant or customer is generally discouraged with Kafka because it leads you to have an ever increasing number of topics on the broker, which will consume lots of resources regardless of the traffic in those topics. You probably want to build multi-tenancy into your application logic, by multiplexing the messages from fewer topic(s) (per message type, domain, priority or whatever else you want to split by) instead of multiplexing from X tenant topics with varying messages. This way you will keep your resource usage reasonable on consumers and most importantly on brokers. You can then scale with partition count.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.