question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Doc] Prometheus alert example does not work

See original GitHub issue

Suggestion / Problem In the examples section you have suggested a few Prometheus Alerts, but I don’t see how the alert named ZookeeperContainerRestartedInTheLast5Minutes could ever fulfill its condition expression.

The expression is as follows: count(count_over_time(container_last_seen{container="zookeeper"}[5m])) > 2 * count(container_last_seen{container="zookeeper",pod=~".+-zookeeper-[0-9]+"})

I have observed these 2 queries and they seem to always move in lock-step (except one has double the value of the other of course). I don’t see how there could ever be a situation where the left side could be bigger than the right side.

I don’t have a suggestion for a fix since I don’t really understand the idea of this alert in the first place.

Documentation Link https://github.com/strimzi/strimzi-kafka-operator/blob/master/examples/metrics/prometheus-install/prometheus-rules.yaml

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:12 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
sknot-rhcommented, Oct 20, 2020

I also tried to deploy metrics cluster on OCP 4.x and I hit the same issue as you had. Some fiddling with time intervals fixed it for me as well.

0reactions
pantaorancommented, Oct 20, 2020

One more thought: While I was able to see the condition fulfilled in Prometheus, I still couldn’t get the alert to trigger. I noticed that here: https://github.com/strimzi/strimzi-kafka-operator/blob/master/examples/metrics/prometheus-install/prometheus-rules.yaml#L120 the condition must be fulfilled for at least 5min to trigger the alert. Can that actually happen when you’re only looking back in time 5min? My guess would be no, and then the alert would still never fire.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Alerting rules - Prometheus.io
Alerting rules allow you to define alert conditions based on Prometheus expression language expressions and to send notifications about firing alerts to an ......
Read more >
No error shown when alertmanager alert config cannot be ...
Open the Prometheus UI and go to the "Alerts" tab. If you created a new alert on the metrics dashboard page, the alert...
Read more >
Simple alert rule logic not working - Google Groups
But to create an alert, the prometheus expression does not work. The value of (time() - pull_sectionlists{instance="example.org:8000") has gone up way above ...
Read more >
Unable to see Alerts in Prometheus Alert Manager
https://prometheus.io/docs/alerting/latest/configuration/ ... in my case the problem was authentication. my alert manager server uses simple ...
Read more >
Alerting | Scylla Docs
Alerts are a Prometheus enhancement to notify that something is wrong in ... It's worse having an alert that does not work, than...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found