question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

kubernetes.pods.running reporting incorrectly

See original GitHub issue

Output of the info page


==============
Agent (v6.4.2)
==============

  Status date: 2018-08-24 00:05:55.602398 UTC
  Pid: 352
  Python Version: 2.7.15
  Logs:
  Check Runners: 2
  Log Level: WARNING

    kubernetes_apiserver
    --------------------
      Total Runs: 53293
      Metric Samples: 0, Total: 0
      Events: 0, Total: 0
      Service Checks: 0, Total: 0
      Average Execution Time : 4ms


(a ton of unrelated and possibly sensitive stuff removed)

Additional environment details (Operating System, Cloud provider, etc): GKE - kubernetes 1.10

Steps to reproduce the issue: Have a k8s cluster monitored by datadog where at least one pod is in a failed state (or anything that’s not running)

Describe the results you received: The metric appears to count pods in all statuses, including failed

Describe the results you expected: Simple fix: the metric is correctly filtered to only pods where status.phase == Running

Enhancement: the metric is replaced by kubernetes.pods.count with status.phase added as a tag, allowing accurate reporting of pods in e.g. Failed state. This would enable more useful metrics and reporting.

Note that a similar metric is exposed in kubernetes_state when its configured, but that shouldn’t excuse the inaccuracy of the other one.

Additional information you deem important (e.g. issue happens only occasionally):

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:10
  • Comments:14 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
cpoolecommented, Jun 13, 2020

I’d also like to add that we consistently see inaccurate measurements for the pods running metrics

the numbers are off by 100% during scaling periods and can take up to 10 minutes to stabilize. Turning off interpolation in the metric graphs shows a sawtooth measurement.

1reaction
alexeyschepincommented, Nov 11, 2020

We face this issue too. kubernetes.pods.running shows only a single pod most of the time. Sometime it changes to a floating-point number (up to 1.4) even when there are definitely several pods running.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Determine the Reason for Pod Failure - Kubernetes
In the YAML file, in the command and args fields, you can see that the container sleeps for 10 seconds and then writes...
Read more >
How to Debug Kubernetes Pending Pods and Scheduling ...
Learn how to debug Pending pods that fail to get scheduled due to resource constraints, taints, affinity rules, and other reasons.
Read more >
How to Troubleshoot an Application in Kubernetes
Look for any warning or error level logs. These are lots from the application running inside the Pod. But if the Pod is...
Read more >
Kubernetes troubleshooting: 6 ways to find and fix issues
OOMKilled means that the pod reached its memory limit, so it restarts. You can see the restart count when you run the describe...
Read more >
Troubleshoot Kubernetes Deployments
Check if the service name you are using is correct. Run these commands to check if the service is registered and the pods...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found