[Broker] Add a prometheus metric to indicate if a Bookie has been Quarantined
See original GitHub issueIs your feature request related to a problem? Please describe. Recently two of our 9 bookies were quarantined and we did not know until we saw logs in the Broker.
platform-pulsar-broker-1 platform-pulsar-broker 16:53:05.284 [BookKeeperClientScheduler-OrderedScheduler-0-0] WARN org.apache.bookkeeper.client.BookieWatcherImpl -
Bookie platform-pulsar-bookkeeper-1.platform-pulsar-bookkeeper.cogito-load.svc.cluster.local:3181 has been quarantined because of read/write errors.
platform-pulsar-broker-1 platform-pulsar-broker 16:53:05.284 [BookKeeperClientScheduler-OrderedScheduler-0-0] WARN org.apache.bookkeeper.client.BookieWatcherImpl -
Bookie platform-pulsar-bookkeeper-8.platform-pulsar-bookkeeper.cogito-load.svc.cluster.local:3181 has been quarantined because of read/write errors.
When these bookies are quarantined the overall throughput scalability of our system is reduced. We would like some scheme to monitor and alert if a bookie is quarantined.
Describe the solution you’d like We’d like to see some kind of Prometheus metrics added that indicated the number of Bookies that are currently quarantined. Once that metrics is added we can use it in Prometheus alerting to notify us of the problem.
Describe alternatives you’ve considered Can’t think of any
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (4 by maintainers)
Top Results From Across the Web
Pulsar Metrics
The number of bookie clients to be quarantined. If you want to expose this metric, set bookkeeperClientExposeStatsToPrometheus to true in the broker.conf file....
Read more >Apache BookKeeper Observability - Introducing the Metrics
When the task queues start filling it means the bookie is not keeping up with the request rate and when the queues are...
Read more >Pulsar - Datadog Docs
The number of bookie clients to be quarantined. pulsar.topics_count (gauge), The number of Pulsar topics of the namespace owned by this broker. pulsar....
Read more >Pulsar: site2/docs/reference-configuration.md | Fossies
Name Description Default
bookiePort The port on which the bookie server listens. 3181
allowLoopback false
entryLogFilePreallocationEnabled Enable or disable entry logger preallocation true
Read more >FortiSIEM External Systems Configuration Guide - Amazon AWS
Revision 18: Added note to AWS CloudTrail API Configuration ... In release 6.5, some clear communication has been replaced by SSL ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @frankjkelly , we have already added the bookie quarantined metrics, but it doesn’t expose by default, because it is a bookie client metric. The metric name is
pulsar_managedLedger_client_bookkeeper_client_BOOKIE_QUARANTINE
and the type isCounter
You can use the following PromQL to get metrics:If you want to expose this metric, you should turn on it in
broker.conf
I think i can help on it ~