Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

kubernetes.pods.running reporting incorrectly

See original GitHub issue

Output of the info page


==============
Agent (v6.4.2)
==============

  Status date: 2018-08-24 00:05:55.602398 UTC
  Pid: 352
  Python Version: 2.7.15
  Logs:
  Check Runners: 2
  Log Level: WARNING

    kubernetes_apiserver
    --------------------
      Total Runs: 53293
      Metric Samples: 0, Total: 0
      Events: 0, Total: 0
      Service Checks: 0, Total: 0
      Average Execution Time : 4ms


(a ton of unrelated and possibly sensitive stuff removed)

Additional environment details (Operating System, Cloud provider, etc): GKE - kubernetes 1.10

Steps to reproduce the issue: Have a k8s cluster monitored by datadog where at least one pod is in a failed state (or anything that’s not running)

Describe the results you received: The metric appears to count pods in all statuses, including failed

Describe the results you expected: Simple fix: the metric is correctly filtered to only pods where status.phase == Running

Enhancement: the metric is replaced by kubernetes.pods.count with status.phase added as a tag, allowing accurate reporting of pods in e.g. Failed state. This would enable more useful metrics and reporting.

Note that a similar metric is exposed in kubernetes_state when its configured, but that shouldn’t excuse the inaccuracy of the other one.

Additional information you deem important (e.g. issue happens only occasionally):

Issue Analytics

State:
Created 5 years ago
Reactions:10
Comments:14 (1 by maintainers)

Top GitHub Comments

2reactions

cpoolecommented, Jun 13, 2020

I’d also like to add that we consistently see inaccurate measurements for the pods running metrics

the numbers are off by 100% during scaling periods and can take up to 10 minutes to stabilize. Turning off interpolation in the metric graphs shows a sawtooth measurement.

1reaction

alexeyschepincommented, Nov 11, 2020

We face this issue too. kubernetes.pods.running shows only a single pod most of the time. Sometime it changes to a floating-point number (up to 1.4) even when there are definitely several pods running.