question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pulsar Sink Connector Consume Rate

See original GitHub issue

Is your enhancement request related to a problem? Please describe.

I have some JDBC sinks that I recently reconfigured to have a longer timeout and batch size. This helped since the topic had a steady publish rate of about 100 msg/s, so these settings prevent spamming inserts to the database.

In some scenarios, I’d want to start these sinks on data from a day or two ago, and let it catch up to live data. Even with a batch size of 100,000, and timeout of 1 minute, the sink consumes messages very fast from the topic. It would receive 100,000 messages in much less than 1 minute, and it would still spam inserts to the database. I don’t want to increase the batch size too large, because I suspect there will still be issues with waiting on the database to finish inserting, and more large batches being ready to go so soon.

Describe the solution you’d like

An option for sink connectors, similar to --rate of the pulsar-client CLI. This would let the user specify a throughput rate for the sink consumer. This can be used to set the consumption rate only slightly higher than the expected live data rate, so the sink can still catch up but not spam inserts as often.

I suppose this works best when there is a steady, expected level of throughput on the topic. I’m not sure if this behaviour can be made more dynamic in any way.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
complonecommented, Nov 11, 2021

I can work on it.

0reactions
Alxander64commented, Feb 2, 2021

Interesting, is there any plan to support this for individual subscriptions, or is there something holding it back?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pulsar | Apache Flink
The Pulsar connector consumes from the latest available message if the message ID does not exist. The start message is included in consuming...
Read more >
Pulsar Sink - Lenses Documentation
A Kafka Connector sink to write events from Kafka to Apache Pulsar. The connector takes the value from the Kafka Connect SinkRecords and ......
Read more >
Google Cloud BigQuery Sink | StreamNative Hub
The Google Cloud BigQuery sink connector pulls data from Pulsar topics and persists ... These sink instances consume messages according to the configured ......
Read more >
Exclusive subscriptions in Pulsar :: DataStax Astra Streaming
Subscriptions in Pulsar describe which consumers are consuming data from a ... in the pulsar-subscription-example repo to connect to your Astra Streaming ...
Read more >
Building a Simple Streaming Real-Time Chat App - Medium
We will utilize a Pulsar IO Connector to sink data from Pulsar topics to Scylla DB. ... First, we will need to create...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found