Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Latest network integration stopped reporting connection state when using the hostNetwork

See original GitHub issue

Output of the info page

====================
Collector (v 5.22.0)
====================

  Status date: 2018-02-16 21:59:59 (9s ago)
  Pid: 20
  Platform: Linux-4.4.0-109-generic-x86_64-with-debian-8.10
  Python Version: 2.7.14, 64bit
  Logs: <stderr>, /var/log/datadog/collector.log

  Clocks
  ======
  
    NTP offset: -0.0087 s
    System UTC time: 2018-02-16 22:00:08.728334
  
  Paths
  =====
  
    conf.d: /etc/dd-agent/conf.d
    checks.d: Not found
  
  Hostnames
  =========
  
[elided]
  
  Checks
  ======
  
    etcd (1.3.0)
    ------------
      - instance #0 [OK]
      - Collected 18 metrics, 0 events & 2 service checks
  
    network (1.4.0)
    ---------------
      - instance #0 [WARNING]
          Warning: Cannot collect connection state: currently with a custom /proc path: /host/proc/1
      - Collected 32 metrics, 0 events & 0 service checks
  
    elastic (1.5.0)
    ---------------
      - instance #0 [OK]
      - Collected 175 metrics, 0 events & 2 service checks
  
    ntp (1.0.0)
    -----------
      - instance #0 [OK]
      - Collected 1 metric, 0 events & 1 service check
  
    disk (1.1.0)
    ------------
      - instance #0 [OK]
      - Collected 34 metrics, 0 events & 0 service checks
  
    docker_daemon (1.8.0)
    ---------------------
      - instance #0 [OK]
      - Collected 290 metrics, 0 events & 1 service check
  
  
  Emitters
  ========
  
    - http_emitter [OK]

====================
Dogstatsd (v 5.22.0)
====================

  Status date: 2018-02-16 22:00:04 (4s ago)
  Pid: 17
  Platform: Linux-4.4.0-109-generic-x86_64-with-debian-8.10
  Python Version: 2.7.14, 64bit
  Logs: <stderr>, /var/log/datadog/dogstatsd.log

  Flush count: 636
  Packet Count: 3500
  Packets per second: 0.2
  Metric count: 20
  Event count: 0
  Service check count: 0

====================
Forwarder (v 5.22.0)
====================

  Status date: 2018-02-16 22:00:07 (1s ago)
  Pid: 16
  Platform: Linux-4.4.0-109-generic-x86_64-with-debian-8.10
  Python Version: 2.7.14, 64bit
  Logs: <stderr>, /var/log/datadog/forwarder.log

  Queue Size: 0 bytes
  Queue Length: 0
  Flush Count: 2058
  Transactions received: 1547
  Transactions flushed: 1547
  Transactions rejected: 0
  API Key Status: API Key is valid
  

======================
Trace Agent (v 5.22.0)
======================

  Pid: 15
  Uptime: 6384 seconds
  Mem alloc: 990096 bytes

  Hostname: [elided]
  Receiver: 0.0.0.0:8126
  API Endpoint: https://trace.agent.datadoghq.com

  --- Receiver stats (1 min) ---


  --- Writer stats (1 min) ---

  Traces: 0 payloads, 0 traces, 0 bytes
  Stats: 0 payloads, 0 stats buckets, 0 bytes
  Services: 0 payloads, 0 services, 0 bytes

We run our datadog agent in kubernetes with hostNetwork: true. After redeploying recently, the network plugin no longer reports connection state information, despite zero configuration changes on our side.

As far as I can tell, the change is from this merged PR: https://github.com/DataDog/integrations-core/pull/994

The PR indicates the change was intended to get data when not running with hostNetwork: true, but has affected our situation where the hostNetwork is used.

Additional environment details (Operating System, Cloud provider, etc):

Running in kubernetes 1.8.x on AWS.

Steps to reproduce the issue:

Build a datadog agent container FROM datadog/docker-dd-agent that adds a network.yaml with the following configuration:

init_config:

instances:
  # Network check only supports one configured instance
  - collect_connection_state: true
    excluded_interfaces:
      - lo
      - lo0
    excluded_interface_re: veth.*

Additionally include the following config for docker_daemon.yaml:

init_config:
  docker_root: /host
instances:
  - url: "unix://var/run/docker.sock"

Run the datadog agent container in kubernetes as a daemonset with hostNetwork: true
Run /etc/init.d/datadog-agent info in the container, observe the status of network checks. Also observe the metrics for system.net.tcp{4,6}.listening in datadog.

Describe the results you received:

No metrics found in datadog web UI
datadog-agent info shows warnings for the network check:

    network (1.4.0)
    ---------------
      - instance #0 [WARNING]
          Warning: Cannot collect connection state: currently with a custom /proc path: /host/proc/1
      - Collected 32 metrics, 0 events & 0 service checks

Describe the results you expected:

Metrics available in datadog web UI for system.net.tcp{4,6}.listening and related metrics
Network check shows healthy

Additional information you deem important (e.g. issue happens only occasionally):

The network connection info reporting worked in a previous build of the container, which pulled the upstream container on Feb 12 ~ 2 PM PST. We rebuilt and redeployed the container (including pulling the upstream datadog/docker-dd-agent container) on Feb 15 ~ 5 PM PST, at which point we lost the metrics. No changes were made to the datadog configuration, nor to the method of container deploy in kubernetes - it was a manually triggered rebuild and redeploy to test unrelated processes.

Issue Analytics

State:
Created 6 years ago
Comments:15 (7 by maintainers)

Top GitHub Comments

2reactions

SmoshySmoshcommented, Feb 28, 2018

Would be really nice to get this resolved.

0reactions

FlorianVeauxcommented, Jan 31, 2020

Going through old issues, this was solved in https://github.com/DataDog/integrations-core/pull/4150

Top Results From Across the Web

Troubleshooting - IBM

Troubleshoot issues that may occur while using Integration tracing. "Application is not available" error, although all pods are ready.

Network - Datadog Docs

Number of times TCP connections have made a direct transition to the SYN-SENT state from the CLOSED state (Linux only). Shown as connection....

Kubernetes 1.25: PodHasNetwork Condition for Pods

In order to have the kubelet report the PodHasNetwork condition in the status field of a pod, please enable the PodHasNetworkCondition feature ......

Ensure containers do not share the host network namespace

Description. When using the host network mode for a container, that container's network stack is not isolated from the Docker host, so the...

Network Integration Tools - Riverbed Support

During the 40- to 45-second delay before the client-side SteelHead declares a peer unavailable, it passes through any incoming new connections; they are...