Latest network integration stopped reporting connection state when using the hostNetwork
See original GitHub issueOutput of the info page
====================
Collector (v 5.22.0)
====================
Status date: 2018-02-16 21:59:59 (9s ago)
Pid: 20
Platform: Linux-4.4.0-109-generic-x86_64-with-debian-8.10
Python Version: 2.7.14, 64bit
Logs: <stderr>, /var/log/datadog/collector.log
Clocks
======
NTP offset: -0.0087 s
System UTC time: 2018-02-16 22:00:08.728334
Paths
=====
conf.d: /etc/dd-agent/conf.d
checks.d: Not found
Hostnames
=========
[elided]
Checks
======
etcd (1.3.0)
------------
- instance #0 [OK]
- Collected 18 metrics, 0 events & 2 service checks
network (1.4.0)
---------------
- instance #0 [WARNING]
Warning: Cannot collect connection state: currently with a custom /proc path: /host/proc/1
- Collected 32 metrics, 0 events & 0 service checks
elastic (1.5.0)
---------------
- instance #0 [OK]
- Collected 175 metrics, 0 events & 2 service checks
ntp (1.0.0)
-----------
- instance #0 [OK]
- Collected 1 metric, 0 events & 1 service check
disk (1.1.0)
------------
- instance #0 [OK]
- Collected 34 metrics, 0 events & 0 service checks
docker_daemon (1.8.0)
---------------------
- instance #0 [OK]
- Collected 290 metrics, 0 events & 1 service check
Emitters
========
- http_emitter [OK]
====================
Dogstatsd (v 5.22.0)
====================
Status date: 2018-02-16 22:00:04 (4s ago)
Pid: 17
Platform: Linux-4.4.0-109-generic-x86_64-with-debian-8.10
Python Version: 2.7.14, 64bit
Logs: <stderr>, /var/log/datadog/dogstatsd.log
Flush count: 636
Packet Count: 3500
Packets per second: 0.2
Metric count: 20
Event count: 0
Service check count: 0
====================
Forwarder (v 5.22.0)
====================
Status date: 2018-02-16 22:00:07 (1s ago)
Pid: 16
Platform: Linux-4.4.0-109-generic-x86_64-with-debian-8.10
Python Version: 2.7.14, 64bit
Logs: <stderr>, /var/log/datadog/forwarder.log
Queue Size: 0 bytes
Queue Length: 0
Flush Count: 2058
Transactions received: 1547
Transactions flushed: 1547
Transactions rejected: 0
API Key Status: API Key is valid
======================
Trace Agent (v 5.22.0)
======================
Pid: 15
Uptime: 6384 seconds
Mem alloc: 990096 bytes
Hostname: [elided]
Receiver: 0.0.0.0:8126
API Endpoint: https://trace.agent.datadoghq.com
--- Receiver stats (1 min) ---
--- Writer stats (1 min) ---
Traces: 0 payloads, 0 traces, 0 bytes
Stats: 0 payloads, 0 stats buckets, 0 bytes
Services: 0 payloads, 0 services, 0 bytes
We run our datadog agent in kubernetes with hostNetwork: true
. After redeploying recently, the network
plugin no longer reports connection state information, despite zero configuration changes on our side.
As far as I can tell, the change is from this merged PR: https://github.com/DataDog/integrations-core/pull/994
The PR indicates the change was intended to get data when not running with hostNetwork: true, but has affected our situation where the hostNetwork is used.
Additional environment details (Operating System, Cloud provider, etc):
Running in kubernetes 1.8.x on AWS.
Steps to reproduce the issue:
- Build a datadog agent container
FROM datadog/docker-dd-agent
that adds a network.yaml with the following configuration:
init_config:
instances:
# Network check only supports one configured instance
- collect_connection_state: true
excluded_interfaces:
- lo
- lo0
excluded_interface_re: veth.*
- Additionally include the following config for docker_daemon.yaml:
init_config:
docker_root: /host
instances:
- url: "unix://var/run/docker.sock"
- Run the datadog agent container in kubernetes as a daemonset with
hostNetwork: true
- Run
/etc/init.d/datadog-agent info
in the container, observe the status of network checks. Also observe the metrics forsystem.net.tcp{4,6}.listening
in datadog.
Describe the results you received:
- No metrics found in datadog web UI
datadog-agent info
shows warnings for the network check:
network (1.4.0)
---------------
- instance #0 [WARNING]
Warning: Cannot collect connection state: currently with a custom /proc path: /host/proc/1
- Collected 32 metrics, 0 events & 0 service checks
Describe the results you expected:
- Metrics available in datadog web UI for
system.net.tcp{4,6}.listening
and related metrics - Network check shows healthy
Additional information you deem important (e.g. issue happens only occasionally):
The network connection info reporting worked in a previous build of the container, which pulled the upstream container on Feb 12 ~ 2 PM PST. We rebuilt and redeployed the container (including pulling the upstream datadog/docker-dd-agent
container) on Feb 15 ~ 5 PM PST, at which point we lost the metrics. No changes were made to the datadog configuration, nor to the method of container deploy in kubernetes - it was a manually triggered rebuild and redeploy to test unrelated processes.
Issue Analytics
- State:
- Created 6 years ago
- Comments:15 (7 by maintainers)
Top GitHub Comments
+1
Would be really nice to get this resolved.
Going through old issues, this was solved in https://github.com/DataDog/integrations-core/pull/4150