Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Agent v6: kube-dns metrics are constantly increasing despite constant traffic

See original GitHub issue

I recently deployed v6.1.0 of the datadog agent to one of my Kubernetes clusters. I added the annotations to my instances of kube-dns so that the agent would collect information from them. I’ve let that run for the better part of two days.

When I look at the metrics for kubedns.request_count, it appears that the number of DNS requests has been increasing steadily over the past 2 days. When I know for a fact the traffic and activity on the cluster has been steady. (It’s the weekend, so not much is going on.)

metric_explorer___datadog

I know that kube-dns exports its metrics in the prometheus format, so that counter metrics are always increasing. (See metrics formats.) But it’s supposed to be the job of the scraper to take the difference between the values at T(n) and T(n-1) to calculate the count of events in that interval.

My assumption is that the prometheus scraper in datadog is supposed to do that subtraction. So I’d expect the graph above to be a flat line with a slope of 0, rather than a steadily increasing line.

Datadog Agent Version: 6.1.0 KubeDNS based on: gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.5 Kubernetes cluster version: 1.5.7 (though I don’t think that should affect the prometheus scraping)

Issue Analytics

State:
Created 5 years ago
Comments:13 (7 by maintainers)

Top GitHub Comments

1reaction

kivagant-bacommented, May 27, 2019

Hello, @antoinepouille

It looks like kubedns.requests_duration.seconds.count still returns accumulated values, not the rate. Same happens for kubedns.requests_duration.seconds.sum, but values are different. How to correctly setup a timeseries visualization for requests_duration?

1reaction

antoinepouillecommented, Mar 26, 2018

@jonmoter Thanks for pulling this out from the code. I could sync up with an engineer from the metrics team to figure this out: this matter is actually causing some issues when using the metrics, so I think we can actually create a second metric that would use the monotonic_count: function to directly forward the counts instead of the current raw value. I will forward this to the team so that we add some work to tackle this, stay tuned.