question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to detect the kubelet URL automatically / cannot validate certificate

See original GitHub issue

Output of the info page

Getting the status from the agent.

==============
Agent (v6.6.0)
==============

  Status date: 2018-11-13 23:10:34.603102 UTC
  Pid: 342
  Python Version: 2.7.15
  Logs:
  Check Runners: 4
  Log Level: info

  Paths
  =====
    Config File: /etc/datadog-agent/datadog.yaml
    conf.d: /etc/datadog-agent/conf.d
    checks.d: /etc/datadog-agent/checks.d

  Clocks
  ======
    NTP offset: 1.461ms
    System UTC time: 2018-11-13 23:10:34.603102 UTC

  Host Info
  =========
    bootTime: 2018-11-08 08:50:28.000000 UTC
    kernelVersion: 4.9.0-7-amd64
    os: linux
    platform: debian
    platformFamily: debian
    platformVersion: buster/sid
    procs: 70
    uptime: 133h51m42s
    virtualizationRole: host
    virtualizationSystem: kvm

  Hostnames
  =========
    hostname: reverent-kapitsa-1us
    socket-fqdn: datadog-agent-pxkhm
    socket-hostname: datadog-agent-pxkhm
    hostname provider: container
    unused hostname providers:
      aws: not retrieving hostname from AWS: the host is not an ECS instance, and other providers already retrieve non-default hostnames
      configuration/environment: hostname is empty
      gce: unable to retrieve hostname from GCE: status code 404 trying to GET http://169.254.169.254/computeMetadata/v1/instance/hostname

=========
Collector
=========

  Running Checks
  ==============

    cpu
    ---
        Instance ID: cpu [OK]
        Total Runs: 114
        Metric Samples: 6, Total: 678
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 0s


    disk (1.4.0)
    ------------
        Instance ID: disk:e5dffb8bef24336f [OK]
        Total Runs: 114
        Metric Samples: 190, Total: 21,660
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 197ms


    docker
    ------
        Instance ID: docker [OK]
        Total Runs: 113
        Metric Samples: 216, Total: 23,850
        Events: 0, Total: 6
        Service Checks: 1, Total: 113
        Average Execution Time : 203ms


    file_handle
    -----------
        Instance ID: file_handle [OK]
        Total Runs: 114
        Metric Samples: 5, Total: 570
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 0s


    io
    --
        Instance ID: io [OK]
        Total Runs: 113
        Metric Samples: 39, Total: 4,380
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 0s


    kubelet (2.2.0)
    ---------------
        Instance ID: kubelet:d884b5186b651429 [ERROR]
        Total Runs: 114
        Metric Samples: 0, Total: 0
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 8ms
        Error: Unable to detect the kubelet URL automatically.
        Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/base/checks/base.py", line 366, in run
          self.check(copy.deepcopy(self.instances[0]))
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/kubelet/kubelet.py", line 113, in check
          raise CheckException("Unable to detect the kubelet URL automatically.")
      CheckException: Unable to detect the kubelet URL automatically.

    kubernetes_apiserver
    --------------------
        Instance ID: kubernetes_apiserver [OK]
        Total Runs: 113
        Metric Samples: 0, Total: 0
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 11ms


    load
    ----
        Instance ID: load [OK]
        Total Runs: 114
        Metric Samples: 6, Total: 684
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 2ms


    memory
    ------
        Instance ID: memory [OK]
        Total Runs: 113
        Metric Samples: 17, Total: 1,921
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 0s


    network (1.7.0)
    ---------------
        Instance ID: network:2a218184ebe03606 [OK]
        Total Runs: 114
        Metric Samples: 74, Total: 8,754
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 9ms


    ntp
    ---
        Instance ID: ntp:b4579e02d1981c12 [OK]
        Total Runs: 113
        Metric Samples: 1, Total: 113
        Events: 0, Total: 0
        Service Checks: 1, Total: 113
        Average Execution Time : 2ms


    uptime
    ------
        Instance ID: uptime [OK]
        Total Runs: 114
        Metric Samples: 1, Total: 114
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 2ms

========
JMXFetch
========

  Initialized checks
  ==================
    no checks

  Failed checks
  =============
    no checks

=========
Forwarder
=========

  CheckRunsV1: 113
  Dropped: 0
  DroppedOnInput: 0
  Events: 0
  HostMetadata: 0
  IntakeV1: 11
  Metadata: 0
  Requeued: 0
  Retried: 0
  RetryQueueSize: 0
  Series: 0
  ServiceChecks: 0
  SketchSeries: 0
  Success: 237
  TimeseriesV1: 113

  API Keys status
  ===============
    API key ending with 1ed66 on endpoint https://app.datadoghq.com: API Key valid

==========
Logs Agent
==========

  container_collect_all
  ---------------------
    Type: docker
    Status: Pending

=========
DogStatsD
=========

  Checks Metric Sample: 65,227
  Event: 7
  Events Flushed: 7
  Number Of Flushes: 113
  Series Flushed: 53,494
  Service Check: 1,478
  Service Checks Flushed: 1,578
  Dogstatsd Metric Sample: 11,877

Additional environment details (Operating System, Cloud provider, etc):

Kubernetes 1.12 cluster on DigitalOcean.

Steps to reproduce the issue:

  1. Deploy the Datadog agent using the provider Kubernetes resources.
  2. View logs

Describe the results you received:

[ AGENT ] 2018-11-13 22:42:31 UTC | ERROR | (kubeutil.go:50 in GetKubeletConnectionInfo) | connection to kubelet failed: temporary failure in kubeutil, will retry later: try delay not elapsed yet
[ AGENT ] 2018-11-13 22:42:31 UTC | ERROR | (runner.go:289 in work) | Error running check kubelet: [{"message": "Unable to detect the kubelet URL automatically.", "traceback": "Traceback (most recent call last):\n  File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/base/checks/base.py\", line 366, in run\n    self.check(copy.deepcopy(self.instances[0]))\n  File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/kubelet/kubelet.py\", line 113, in check\n    raise CheckException(\"Unable to detect the kubelet URL automatically.\")\nCheckException: Unable to detect the kubelet URL automatically.\n"}]
[...]
[ AGENT ] 2018-11-13 22:42:39 UTC | ERROR | (autoconfig.go:608 in collect) | Unable to collect configurations from provider Kubernetes: temporary failure in kubeutil, will retry later: cannot connect: https: "Get https://10.133.78.180:10250/pods: x509: cannot validate certificate for 10.133.78.180 because it doesn't contain any IP SANs", http: "Get http://10.133.78.180:10255/pods: dial tcp 10.133.78.180:10255: connect: connection refused"
[ AGENT ] 2018-11-13 22:42:39 UTC | INFO | (autoconfig.go:362 in initListenerCandidates) | kubelet listener cannot start, will retry: temporary failure in kubeutil, will retry later: cannot connect: https: "Get https://10.133.78.180:10250/pods: x509: cannot validate certificate for 10.133.78.180 because it doesn't contain any IP SANs", http: "Get http://10.133.78.180:10255/pods: dial tcp 10.133.78.180:10255: connect: connection refused"

Many dashboard entries remain empty.

Describe the results you expected:

No errors, access to kubelet, functional Kubernetes dashboard.

Additional information you deem important (e.g. issue happens only occasionally):

Seems to be the same problem as #1829, however that issue is closed. Hosted Kubernetes services like DigitalOcean do not allow editing the kubelet configuration as far as I know.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:32
  • Comments:73 (10 by maintainers)

github_iconTop GitHub Comments

15reactions
jcasseecommented, Dec 25, 2018

@mjhuber I opened a ticket on the Datadog issue tracker. Advice was to set DD_KUBELET_TLS_VERIFY=false for now. Hopefully DO will start using real certificates for the Kubelet API.

13reactions
jonhoarecommented, Jul 10, 2020

For anyone having this issue, I have been working through this with Datadogs Support Team.

It appears that AKS has changed the location of the Kubelet Client CA Cert, at least between ASK 1.16.7 and 1.16.9.

The certificate used by AKS is now located on the node at /etc/kubernetes/certs/kubeletserver.crt.

If you are use the helm charts you can set the following values and the new certificates should get loaded correctly.

agents:
  volumes:
    - name: k8s-certs
      hostPath:
        path: /etc/kubernetes/certs
        type: ''
  volumeMounts:
    - name: k8s-certs
      readOnly: true
      mountPath: /etc/kubernetes/certs
datadog:
  env:
    - name: DD_KUBELET_CLIENT_CA
      value: /etc/kubernetes/certs/kubeletserver.crt

After adding and deploying this config, my Datadog agents, Helm 2.3.18 and DockerImage 7.20.2, on AKS 1.16.9 is now working correctly. There are warnings thrown by the agent that the Certificate has no subjectAltName, but metrics are able to be sent to Datadog successfully.

Hopefully a more permanent fix will follow, but this is a good enough fix for now to work again, without having to set DD_KUBELET_TLS_VERIFY=false.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Unable to detect the kubelet URL automatically - Stack Overflow
This was a issue with deployed DataDog daemonset for me: What I did to resolve: Check daemonset if it exists or not:
Read more >
Unable to detect the kubelet URL automatically / cannot ...
Unable to detect the kubelet URL automatically / cannot validate certificate.
Read more >
DataDog Docker Example Doesn't Work - Render community
Errors of the form "Unable to detect the kubelet URL automatically: impossible to reach Kubelet with host: can be safely ignored. The kubelet...
Read more >
How To Detect The Kubelet Url Automatically - ADocLib
Describe what happened: The cluster agent fails to detect the Kubernetes ... Unable to detect the kubelet URL automatically / cannot validate certificate....
Read more >
Kubelet - Datadog Docs
The Kubelet check is included in the Datadog Agent package, ... For network metrics reported at the pod level, containers cannot be excluded...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found