[cilium] Cilium Operator metrics are not collected by default
See original GitHub issueAnd configuring the integration throws errors
Output of the info page
root@datadog-c7rmp:/# agent status
Getting the status from the agent.
2021-09-16 14:33:33 UTC | CORE | WARN | (pkg/util/log/log.go:611 in func1) | Deactivating Autoconfig will disable most components. It's recommended to use autoconfig_exclude_features and autoconfig_include_features to activate/deactivate features selectively
===============
Agent (v7.31.0)
===============
Status date: 2021-09-16 14:33:33.296 UTC (1631802813296)
Agent start: 2021-09-16 14:32:48.846 UTC (1631802768846)
Pid: 1
Go Version: go1.15.13
Python Version: 3.8.11
Build arch: amd64
Agent flavor: agent
Check Runners: 4
Log Level: INFO
Paths
=====
Config File: /etc/datadog-agent/datadog.yaml
conf.d: /etc/datadog-agent/conf.d
checks.d: /etc/datadog-agent/checks.d
Clocks
======
NTP offset: 26µs
System time: 2021-09-16 14:33:33.296 UTC (1631802813296)
Host Info
=========
bootTime: 2021-09-16 11:03:22 UTC (1631790202000)
kernelArch: x86_64
kernelVersion: 5.4.117-58.216.amzn2.x86_64
os: linux
platform: ubuntu
platformFamily: debian
platformVersion: 21.04
procs: 146
uptime: 3h29m37s
Hostnames
=========
ec2-hostname: ip-10-0-67-197.eu-west-2.compute.internal
host_aliases: [ip-10-0-67-197.eu-west-2.compute.internal-sandbox-infra-2036]
hostname: i-07f24cfb6cb2e6b70
instance-id: i-07f24cfb6cb2e6b70
socket-fqdn: datadog-c7rmp
socket-hostname: datadog-c7rmp
host tags:
cluster_name:sandbox-infra-2036
env:sandbox-infra-2036
kube_cluster_name:sandbox-infra-2036
project:sandbox-infra-2036
sla_agreement:false
stack_name:sandbox-infra-2036
stack_type:sandbox
hostname provider: aws
unused hostname providers:
azure: azure_hostname_style is set to 'os'
configuration/environment: hostname is empty
gce: unable to retrieve hostname from GCE: status code 404 trying to GET http://169.254.169.254/computeMetadata/v1/instance/hostname
Metadata
========
cloud_provider: AWS
hostname_source: aws
=========
Collector
=========
Running Checks
==============
cilium (1.7.2)
--------------
Instance ID: cilium:8f735fdfa8bcd6cb [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/cilium.d/auto_conf.yaml
Total Runs: 3
Metric Samples: Last Run: 969, Total: 2,907
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 1, Total: 3
Average Execution Time : 261ms
Last Execution Date : 2021-09-16 14:33:29 UTC (1631802809000)
Last Successful Execution Date : 2021-09-16 14:33:29 UTC (1631802809000)
metadata:
version.major: 1
version.minor: 10
version.patch: 4
version.raw: 1.10.4
version.scheme: semver
cpu
---
Instance ID: cpu [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/cpu.d/conf.yaml.default
Total Runs: 3
Metric Samples: Last Run: 9, Total: 20
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 0s
Last Execution Date : 2021-09-16 14:33:26 UTC (1631802806000)
Last Successful Execution Date : 2021-09-16 14:33:26 UTC (1631802806000)
disk (4.4.0)
------------
Instance ID: disk:e5dffb8bef24336f [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/disk.d/conf.yaml.default
Total Runs: 2
Metric Samples: Last Run: 204, Total: 408
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 23ms
Last Execution Date : 2021-09-16 14:33:18 UTC (1631802798000)
Last Successful Execution Date : 2021-09-16 14:33:18 UTC (1631802798000)
docker
------
Instance ID: docker [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/docker.d/conf.yaml.default
Total Runs: 2
Metric Samples: Last Run: 248, Total: 496
Events: Last Run: 2, Total: 2
Service Checks: Last Run: 1, Total: 2
Average Execution Time : 66ms
Last Execution Date : 2021-09-16 14:33:25 UTC (1631802805000)
Last Successful Execution Date : 2021-09-16 14:33:25 UTC (1631802805000)
file_handle
-----------
Instance ID: file_handle [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/file_handle.d/conf.yaml.default
Total Runs: 3
Metric Samples: Last Run: 5, Total: 15
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 0s
Last Execution Date : 2021-09-16 14:33:32 UTC (1631802812000)
Last Successful Execution Date : 2021-09-16 14:33:32 UTC (1631802812000)
io
--
Instance ID: io [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/io.d/conf.yaml.default
Total Runs: 2
Metric Samples: Last Run: 39, Total: 51
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 0s
Last Execution Date : 2021-09-16 14:33:24 UTC (1631802804000)
Last Successful Execution Date : 2021-09-16 14:33:24 UTC (1631802804000)
kubelet (7.0.0)
---------------
Instance ID: kubelet:5bbc63f3938c02f4 [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/kubelet.d/conf.yaml.default
Total Runs: 2
Metric Samples: Last Run: 958, Total: 1,901
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 4, Total: 8
Average Execution Time : 463ms
Last Execution Date : 2021-09-16 14:33:15 UTC (1631802795000)
Last Successful Execution Date : 2021-09-16 14:33:15 UTC (1631802795000)
load
----
Instance ID: load [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/load.d/conf.yaml.default
Total Runs: 3
Metric Samples: Last Run: 6, Total: 18
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 0s
Last Execution Date : 2021-09-16 14:33:31 UTC (1631802811000)
Last Successful Execution Date : 2021-09-16 14:33:31 UTC (1631802811000)
memory
------
Instance ID: memory [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/memory.d/conf.yaml.default
Total Runs: 2
Metric Samples: Last Run: 18, Total: 36
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 0s
Last Execution Date : 2021-09-16 14:33:23 UTC (1631802803000)
Last Successful Execution Date : 2021-09-16 14:33:23 UTC (1631802803000)
network (2.3.0)
---------------
Instance ID: network:d884b5186b651429 [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/network.d/conf.yaml.default
Total Runs: 3
Metric Samples: Last Run: 73, Total: 219
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 10ms
Last Execution Date : 2021-09-16 14:33:30 UTC (1631802810000)
Last Successful Execution Date : 2021-09-16 14:33:30 UTC (1631802810000)
ntp
---
Instance ID: ntp:d884b5186b651429 [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/ntp.d/conf.yaml.default
Total Runs: 1
Metric Samples: Last Run: 1, Total: 1
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 1, Total: 1
Average Execution Time : 0s
Last Execution Date : 2021-09-16 14:32:56 UTC (1631802776000)
Last Successful Execution Date : 2021-09-16 14:32:56 UTC (1631802776000)
uptime
------
Instance ID: uptime [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/uptime.d/conf.yaml.default
Total Runs: 2
Metric Samples: Last Run: 1, Total: 2
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 0s
Last Execution Date : 2021-09-16 14:33:22 UTC (1631802802000)
Last Successful Execution Date : 2021-09-16 14:33:22 UTC (1631802802000)
========
JMXFetch
========
Information
==================
Initialized checks
==================
no checks
Failed checks
=============
no checks
=========
Forwarder
=========
Transactions
============
Cluster: 0
CronJob: 0
DaemonSet: 0
Deployment: 0
Dropped: 0
DroppedOnInput: 0
Job: 0
Node: 0
PersistentVolume: 0
PersistentVolumeClaim: 0
Pod: 0
ReplicaSet: 0
Requeued: 0
Retried: 0
RetryQueueSize: 0
Service: 0
StatefulSet: 0
Transaction Successes
=====================
Total number: 7
Successes By Endpoint:
check_run_v1: 2
intake: 3
series_v1: 2
API Keys status
===============
API key ending with 330c5: API Key valid
==========
Endpoints
==========
https://app.datadoghq.com - API Key ending with:
- 330c5
==========
Logs Agent
==========
Logs Agent is not running
=========
APM Agent
=========
Status: Running
Pid: 1
Uptime: 44 seconds
Mem alloc: 10,581,840 bytes
Hostname: i-07f24cfb6cb2e6b70
Receiver: 0.0.0.0:8126
Endpoints:
https://trace.agent.datadoghq.com
Receiver (previous minute)
==========================
No traces received in the previous minute.
Default priority sampling rate: 100.0%
Writer (previous minute)
========================
Traces: 0 payloads, 0 traces, 0 events, 0 bytes
Stats: 0 payloads, 0 stats buckets, 0 bytes
=========
Aggregator
=========
Checks Metric Sample: 6,130
Dogstatsd Metric Sample: 276
Event: 3
Events Flushed: 3
Number Of Flushes: 2
Series Flushed: 4,125
Service Check: 42
Service Checks Flushed: 33
=========
DogStatsD
=========
Event Packets: 0
Event Parse Errors: 0
Metric Packets: 275
Metric Parse Errors: 0
Service Check Packets: 0
Service Check Parse Errors: 0
Udp Bytes: 21,072
Udp Packet Reading Errors: 0
Udp Packets: 215
Uds Bytes: 0
Uds Origin Detection Errors: 0
Uds Packet Reading Errors: 0
Uds Packets: 1
Unterminated Metric Errors: 0
=====================
Datadog Cluster Agent
=====================
- Datadog Cluster Agent endpoint detected: https://172.20.5.161:5005
Successfully connected to the Datadog Cluster Agent.
- Running: 1.15.0+commit.6781e85
Additional environment details (Operating System, Cloud provider, etc): Datadog Helm chart 2.22.2 EKS 1.21 Cilium 1.10.4
Steps to reproduce the issue:
- Create an EKS cluster
- Deploy Datadog with network policies flavour cilium and custom
agent.confd:
to collect Cilium operator metrics
confd:
cilium.yaml: |-
instances:
- agent_endpoint: http://localhost:9090/metrics
tags:
- cilium-pod:%%host%%
- operator_endpoint: http://localhost:6942/metrics
tags:
- cilium-pod:%%host%%
- Replace AWS CNI with Cilium
Describe the results you received:
$ agent check cilium
...
=== Service Checks ===
[
{
"check": "cilium.prometheus.health",
"host_name": "i-07f24cfb6cb2e6b70",
"timestamp": 1631803840,
"status": 0,
"message": "",
"tags": [
"cilium-pod:10.0.67.197",
"docker_image:quay.io/cilium/cilium",
"endpoint:http://10.0.67.197:9090/metrics",
"image_name:quay.io/cilium/cilium",
"image_tag:v1.10.4",
"kube_container_name:cilium-agent",
"kube_daemon_set:cilium",
"kube_namespace:kube-system",
"kube_ownerref_kind:daemonset",
"pod_phase:running",
"short_image:cilium"
]
}
]
2021-09-16 14:50:44 UTC | CORE | INFO | (pkg/util/kubernetes/clustername/clustername.go:98 in getClusterName) | Using cluster name sandbox-infra-2036 auto discovered from the ec2 API
2021-09-16 14:50:44 UTC | CORE | INFO | (pkg/metadata/host/host_tags.go:88 in GetHostTags) | Adding both tags cluster_name and kube_cluster_name. You can use 'disable_cluster_name_tag_key' in the Agent config to keep the kube_cluster_name tag only
2021-09-16 14:50:44 UTC | CORE | WARN | (pkg/util/gce/gce_tags.go:49 in getCachedTags) | unable to get tags from gce and cache is empty: status code 404 trying to GET http://169.254.169.254/computeMetadata/v1/?recursive=true
2021-09-16 14:50:44 UTC | CORE | INFO | (pkg/metadata/host/host.go:170 in getPublicIPv4) | No public IPv4 address found
=========
Collector
=========
Running Checks
==============
cilium (1.7.2)
--------------
Instance ID: cilium:50ebf894b0a553d7 [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/cilium.d/auto_conf.yaml
Total Runs: 1
Metric Samples: Last Run: 971, Total: 971
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 1, Total: 1
Average Execution Time : 295ms
Last Execution Date : 2021-09-16 14:50:41 UTC (1631803841000)
Last Successful Execution Date : 2021-09-16 14:50:41 UTC (1631803841000)
metadata:
version.major: 1
version.minor: 10
version.patch: 4
version.raw: 1.10.4
version.scheme: semver
=== Service Checks ===
[
{
"check": "cilium.prometheus.health",
"host_name": "i-07f24cfb6cb2e6b70",
"timestamp": 1631803844,
"status": 2,
"message": "",
"tags": [
"cilium-pod:%%host%%",
"endpoint:http://localhost:9090/metrics"
]
}
]
=========
Collector
=========
Running Checks
==============
cilium (1.7.2)
--------------
Instance ID: cilium:8eb256ce5f51f2ec [ERROR]
Configuration Source: file:/etc/datadog-agent/conf.d/cilium.yaml
Total Runs: 1
Metric Samples: Last Run: 0, Total: 0
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 1, Total: 1
Average Execution Time : 8ms
Last Execution Date : 2021-09-16 14:50:44 UTC (1631803844000)
Last Successful Execution Date : Never
Error: HTTPConnectionPool(host='localhost', port=9090): Max retries exceeded with url: /metrics (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f785920e910>: Failed to establish a new connection: [Errno 111] Connection refused'))
Traceback (most recent call last):
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py", line 169, in _new_conn
conn = connection.create_connection(
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/connection.py", line 96, in create_connection
raise err
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/connection.py", line 86, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py", line 394, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py", line 234, in request
super(HTTPConnection, self).request(method, url, body=body, headers=headers)
File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 1256, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 1302, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 1251, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 1011, in _send_output
self.send(msg)
File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 951, in send
self.connect()
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py", line 200, in connect
conn = self._new_conn()
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py", line 181, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f785920e910>: Failed to establish a new connection: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/retry.py", line 574, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=9090): Max retries exceeded with url: /metrics (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f785920e910>: Failed to establish a new connection: [Errno 111] Connection refused'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py", line 1006, in run
self.check(instance)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/base_check.py", line 135, in check
self.process(scraper_config)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/mixins.py", line 533, in process
for metric in self.scrape_metrics(scraper_config):
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/mixins.py", line 470, in scrape_metrics
response = self.poll(scraper_config)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/mixins.py", line 780, in poll
response = self.send_request(endpoint, scraper_config, headers)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/mixins.py", line 806, in send_request
return http_handler.get(endpoint, stream=True, **kwargs)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py", line 304, in get
return self._request('get', url, options)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py", line 368, in _request
response = self.make_request_aia_chasing(request_method, method, url, new_options, persist)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py", line 373, in make_request_aia_chasing
response = request_method(url, **new_options)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=9090): Max retries exceeded with url: /metrics (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f785920e910>: Failed to establish a new connection: [Errno 111] Connection refused'))
=== Service Checks ===
[
{
"check": "cilium.prometheus.health",
"host_name": "i-07f24cfb6cb2e6b70",
"timestamp": 1631803844,
"status": 2,
"message": "",
"tags": [
"cilium-pod:%%host%%",
"endpoint:http://localhost:6942/metrics"
]
}
]
=========
Collector
=========
Running Checks
==============
cilium (1.7.2)
--------------
Instance ID: cilium:10d6f36b45683928 [ERROR]
Configuration Source: file:/etc/datadog-agent/conf.d/cilium.yaml
Total Runs: 1
Metric Samples: Last Run: 0, Total: 0
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 1, Total: 1
Average Execution Time : 3ms
Last Execution Date : 2021-09-16 14:50:44 UTC (1631803844000)
Last Successful Execution Date : Never
Error: HTTPConnectionPool(host='localhost', port=6942): Max retries exceeded with url: /metrics (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f785899a6d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
Traceback (most recent call last):
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py", line 169, in _new_conn
conn = connection.create_connection(
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/connection.py", line 96, in create_connection
raise err
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/connection.py", line 86, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py", line 394, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py", line 234, in request
super(HTTPConnection, self).request(method, url, body=body, headers=headers)
File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 1256, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 1302, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 1251, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 1011, in _send_output
self.send(msg)
File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 951, in send
self.connect()
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py", line 200, in connect
conn = self._new_conn()
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py", line 181, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f785899a6d0>: Failed to establish a new connection: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/retry.py", line 574, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=6942): Max retries exceeded with url: /metrics (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f785899a6d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py", line 1006, in run
self.check(instance)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/base_check.py", line 135, in check
self.process(scraper_config)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/mixins.py", line 533, in process
for metric in self.scrape_metrics(scraper_config):
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/mixins.py", line 470, in scrape_metrics
response = self.poll(scraper_config)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/mixins.py", line 780, in poll
response = self.send_request(endpoint, scraper_config, headers)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/mixins.py", line 806, in send_request
return http_handler.get(endpoint, stream=True, **kwargs)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py", line 304, in get
return self._request('get', url, options)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py", line 368, in _request
response = self.make_request_aia_chasing(request_method, method, url, new_options, persist)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py", line 373, in make_request_aia_chasing
response = request_method(url, **new_options)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=6942): Max retries exceeded with url: /metrics (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f785899a6d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
Check has run only once, if some metrics are missing you can try again with --check-rate to see any other metric if available.
Describe the results you expected:
Datadog agent to see that the operator PODs do not run in all the nodes but in the ones it does it should collect the metrics from localhost
because Cilium operator container runs with hostNetwork: true
Additional information you deem important (e.g. issue happens only occasionally):
Issue Analytics
- State:
- Created 2 years ago
- Comments:10 (1 by maintainers)
Top Results From Across the Web
Monitoring & Metrics — Cilium 1.13.90 documentation
Prometheus is a pluggable metrics collection and storage system and can act as a data source ... By default, the operator will expose...
Read more >Issue with Prometheus scraping cilium-agent #17949 - GitHub
The metrics are per node. The Service is headless to just give Prometheus the right annotations to scrape, but I'm confused by the...
Read more >Key Metrics for Monitoring Cilium - Datadog
This metric captures the total number of nodes in which the Cilium Operator is not able to allocate more IP addresses. A high...
Read more >Cilium Enterprise integration | Grafana Cloud documentation
The following sample using Helm enables the Prometheus metrics endpoint and configures the relevant metrics for Cilium Agent, Cilium Operator, and Hubble:
Read more >cilium 1.13.0-rc3 · joaquinito2051/cilium2 - Artifact Hub
Key Type Default
MTU int 0
affinity object
agent bool true
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@ChristineTChen you are absolutely right!
All the changes distracted me from changing the address. Now that we are using the Kubernetes Endpoints provided by the headless Kubernetes service the autodiscovery configuration should use
%%host%%
instead oflocalhost
I’ll test the changes next week and confirm this
Thanks!
Hey @carlosjgp ,
I noticed that you port-forwarded the port in your earlier comment. The curl to localhost only works here because you forwarded the port of the cilium-operator.
I don’t think
localhost:6942/metrics
is going to yield the same output in the Datadog agent container. You test this out by running the curl from an Agent pod.In your annotations, can you try replacing
localhost
with%%host%%
?