Pulsar Sink Connector Consume Rate
See original GitHub issueIs your enhancement request related to a problem? Please describe.
I have some JDBC sinks that I recently reconfigured to have a longer timeout and batch size. This helped since the topic had a steady publish rate of about 100 msg/s, so these settings prevent spamming inserts to the database.
In some scenarios, I’d want to start these sinks on data from a day or two ago, and let it catch up to live data. Even with a batch size of 100,000, and timeout of 1 minute, the sink consumes messages very fast from the topic. It would receive 100,000 messages in much less than 1 minute, and it would still spam inserts to the database. I don’t want to increase the batch size too large, because I suspect there will still be issues with waiting on the database to finish inserting, and more large batches being ready to go so soon.
Describe the solution you’d like
An option for sink connectors, similar to --rate
of the pulsar-client
CLI. This would let the user specify a throughput rate for the sink consumer. This can be used to set the consumption rate only slightly higher than the expected live data rate, so the sink can still catch up but not spam inserts as often.
I suppose this works best when there is a steady, expected level of throughput on the topic. I’m not sure if this behaviour can be made more dynamic in any way.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
I can work on it.
Interesting, is there any plan to support this for individual subscriptions, or is there something holding it back?