question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Agent v6: kube-dns metrics are constantly increasing despite constant traffic

See original GitHub issue

I recently deployed v6.1.0 of the datadog agent to one of my Kubernetes clusters. I added the annotations to my instances of kube-dns so that the agent would collect information from them. I’ve let that run for the better part of two days.

When I look at the metrics for kubedns.request_count, it appears that the number of DNS requests has been increasing steadily over the past 2 days. When I know for a fact the traffic and activity on the cluster has been steady. (It’s the weekend, so not much is going on.)

metric_explorer___datadog

I know that kube-dns exports its metrics in the prometheus format, so that counter metrics are always increasing. (See metrics formats.) But it’s supposed to be the job of the scraper to take the difference between the values at T(n) and T(n-1) to calculate the count of events in that interval.

My assumption is that the prometheus scraper in datadog is supposed to do that subtraction. So I’d expect the graph above to be a flat line with a slope of 0, rather than a steadily increasing line.

Datadog Agent Version: 6.1.0 KubeDNS based on: gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.5 Kubernetes cluster version: 1.5.7 (though I don’t think that should affect the prometheus scraping)

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:13 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
kivagant-bacommented, May 27, 2019

Hello, @antoinepouille

It looks like kubedns.requests_duration.seconds.count still returns accumulated values, not the rate. Same happens for kubedns.requests_duration.seconds.sum, but values are different. How to correctly setup a timeseries visualization for requests_duration?

image

1reaction
antoinepouillecommented, Mar 26, 2018

@jonmoter Thanks for pulling this out from the code. I could sync up with an engineer from the metrics team to figure this out: this matter is actually causing some issues when using the metrics, so I think we can actually create a second metric that would use the monotonic_count: function to directly forward the counts instead of the current raw value. I will forward this to the team so that we add some work to tackle this, stay tuned.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Cluster Agent Troubleshooting - Datadog Docs
Cluster Agent status and flare. If you are having issues with the Custom Metrics Server: Make sure you have the aggregation layer and...
Read more >
Best practices for running cost-optimized Kubernetes ...
Setting meaningful probes ensures your application receives traffic only when it is up and running and ready to accept traffic. GKE uses ...
Read more >
How to Troubleshoot Kubernetes Network Issues - AppOptics
This gives us network traffic inside a Kubernetes cluster, so we can introduce issues with it that we can later debug and fix....
Read more >
Debugging DNS Resolution - Kubernetes
This page provides hints on diagnosing DNS problems. Before you begin You need to have a Kubernetes cluster, and the kubectl command-line tool...
Read more >
Routes - Networking | OpenShift Container Platform 3.11
If you are using a different host name you may need to modify its DNS records independently to resolve to the node that...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found