question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Broker] Add a prometheus metric to indicate if a Bookie has been Quarantined

See original GitHub issue

Is your feature request related to a problem? Please describe. Recently two of our 9 bookies were quarantined and we did not know until we saw logs in the Broker.

platform-pulsar-broker-1 platform-pulsar-broker 16:53:05.284 [BookKeeperClientScheduler-OrderedScheduler-0-0] WARN  org.apache.bookkeeper.client.BookieWatcherImpl - 
Bookie platform-pulsar-bookkeeper-1.platform-pulsar-bookkeeper.cogito-load.svc.cluster.local:3181 has been quarantined because of read/write errors.
platform-pulsar-broker-1 platform-pulsar-broker 16:53:05.284 [BookKeeperClientScheduler-OrderedScheduler-0-0] WARN  org.apache.bookkeeper.client.BookieWatcherImpl - 
Bookie platform-pulsar-bookkeeper-8.platform-pulsar-bookkeeper.cogito-load.svc.cluster.local:3181 has been quarantined because of read/write errors.

When these bookies are quarantined the overall throughput scalability of our system is reduced. We would like some scheme to monitor and alert if a bookie is quarantined.

Describe the solution you’d like We’d like to see some kind of Prometheus metrics added that indicated the number of Bookies that are currently quarantined. Once that metrics is added we can use it in Prometheus alerting to notify us of the problem.

Describe alternatives you’ve considered Can’t think of any

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
hangc0276commented, Jan 27, 2022

Hi @frankjkelly , we have already added the bookie quarantined metrics, but it doesn’t expose by default, because it is a bookie client metric. The metric name is pulsar_managedLedger_client_bookkeeper_client_BOOKIE_QUARANTINE and the type is Counter You can use the following PromQL to get metrics:

sum(irate(pulsar_managedLedger_client_bookkeeper_client_BOOKIE_QUARANTINE{job=$job_name, instance=~$instance_name}[1m])) by (instance)

If you want to expose this metric, you should turn on it in broker.conf

bookkeeperClientExposeStatsToPrometheus=true
1reaction
mattisonchaocommented, Jan 26, 2022

I think i can help on it ~

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pulsar Metrics
The number of bookie clients to be quarantined. If you want to expose this metric, set bookkeeperClientExposeStatsToPrometheus to true in the broker.conf file....
Read more >
Apache BookKeeper Observability - Introducing the Metrics
When the task queues start filling it means the bookie is not keeping up with the request rate and when the queues are...
Read more >
Pulsar - Datadog Docs
The number of bookie clients to be quarantined. pulsar.topics_count (gauge), The number of Pulsar topics of the namespace owned by this broker. pulsar....
Read more >
Pulsar: site2/docs/reference-configuration.md | Fossies
Name Description Default bookiePort The port on which the bookie server listens. 3181 allowLoopback false entryLogFilePreallocationEnabled Enable or disable entry logger preallocation true
Read more >
FortiSIEM External Systems Configuration Guide - Amazon AWS
Revision 18: Added note to AWS CloudTrail API Configuration ... In release 6.5, some clear communication has been replaced by SSL ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found