question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Latest network integration stopped reporting connection state when using the hostNetwork

See original GitHub issue

Output of the info page

====================
Collector (v 5.22.0)
====================

  Status date: 2018-02-16 21:59:59 (9s ago)
  Pid: 20
  Platform: Linux-4.4.0-109-generic-x86_64-with-debian-8.10
  Python Version: 2.7.14, 64bit
  Logs: <stderr>, /var/log/datadog/collector.log

  Clocks
  ======
  
    NTP offset: -0.0087 s
    System UTC time: 2018-02-16 22:00:08.728334
  
  Paths
  =====
  
    conf.d: /etc/dd-agent/conf.d
    checks.d: Not found
  
  Hostnames
  =========
  
[elided]
  
  Checks
  ======
  
    etcd (1.3.0)
    ------------
      - instance #0 [OK]
      - Collected 18 metrics, 0 events & 2 service checks
  
    network (1.4.0)
    ---------------
      - instance #0 [WARNING]
          Warning: Cannot collect connection state: currently with a custom /proc path: /host/proc/1
      - Collected 32 metrics, 0 events & 0 service checks
  
    elastic (1.5.0)
    ---------------
      - instance #0 [OK]
      - Collected 175 metrics, 0 events & 2 service checks
  
    ntp (1.0.0)
    -----------
      - instance #0 [OK]
      - Collected 1 metric, 0 events & 1 service check
  
    disk (1.1.0)
    ------------
      - instance #0 [OK]
      - Collected 34 metrics, 0 events & 0 service checks
  
    docker_daemon (1.8.0)
    ---------------------
      - instance #0 [OK]
      - Collected 290 metrics, 0 events & 1 service check
  
  
  Emitters
  ========
  
    - http_emitter [OK]

====================
Dogstatsd (v 5.22.0)
====================

  Status date: 2018-02-16 22:00:04 (4s ago)
  Pid: 17
  Platform: Linux-4.4.0-109-generic-x86_64-with-debian-8.10
  Python Version: 2.7.14, 64bit
  Logs: <stderr>, /var/log/datadog/dogstatsd.log

  Flush count: 636
  Packet Count: 3500
  Packets per second: 0.2
  Metric count: 20
  Event count: 0
  Service check count: 0

====================
Forwarder (v 5.22.0)
====================

  Status date: 2018-02-16 22:00:07 (1s ago)
  Pid: 16
  Platform: Linux-4.4.0-109-generic-x86_64-with-debian-8.10
  Python Version: 2.7.14, 64bit
  Logs: <stderr>, /var/log/datadog/forwarder.log

  Queue Size: 0 bytes
  Queue Length: 0
  Flush Count: 2058
  Transactions received: 1547
  Transactions flushed: 1547
  Transactions rejected: 0
  API Key Status: API Key is valid
  

======================
Trace Agent (v 5.22.0)
======================

  Pid: 15
  Uptime: 6384 seconds
  Mem alloc: 990096 bytes

  Hostname: [elided]
  Receiver: 0.0.0.0:8126
  API Endpoint: https://trace.agent.datadoghq.com

  --- Receiver stats (1 min) ---


  --- Writer stats (1 min) ---

  Traces: 0 payloads, 0 traces, 0 bytes
  Stats: 0 payloads, 0 stats buckets, 0 bytes
  Services: 0 payloads, 0 services, 0 bytes

We run our datadog agent in kubernetes with hostNetwork: true. After redeploying recently, the network plugin no longer reports connection state information, despite zero configuration changes on our side.

As far as I can tell, the change is from this merged PR: https://github.com/DataDog/integrations-core/pull/994

The PR indicates the change was intended to get data when not running with hostNetwork: true, but has affected our situation where the hostNetwork is used.

Additional environment details (Operating System, Cloud provider, etc):

Running in kubernetes 1.8.x on AWS.

Steps to reproduce the issue:

  1. Build a datadog agent container FROM datadog/docker-dd-agent that adds a network.yaml with the following configuration:
init_config:

instances:
  # Network check only supports one configured instance
  - collect_connection_state: true
    excluded_interfaces:
      - lo
      - lo0
    excluded_interface_re: veth.*
  1. Additionally include the following config for docker_daemon.yaml:
init_config:
  docker_root: /host
instances:
  - url: "unix://var/run/docker.sock"
  1. Run the datadog agent container in kubernetes as a daemonset with hostNetwork: true
  2. Run /etc/init.d/datadog-agent info in the container, observe the status of network checks. Also observe the metrics for system.net.tcp{4,6}.listening in datadog.

Describe the results you received:

  • No metrics found in datadog web UI
  • datadog-agent info shows warnings for the network check:
    network (1.4.0)
    ---------------
      - instance #0 [WARNING]
          Warning: Cannot collect connection state: currently with a custom /proc path: /host/proc/1
      - Collected 32 metrics, 0 events & 0 service checks

Describe the results you expected:

  • Metrics available in datadog web UI for system.net.tcp{4,6}.listening and related metrics
  • Network check shows healthy

Additional information you deem important (e.g. issue happens only occasionally):

The network connection info reporting worked in a previous build of the container, which pulled the upstream container on Feb 12 ~ 2 PM PST. We rebuilt and redeployed the container (including pulling the upstream datadog/docker-dd-agent container) on Feb 15 ~ 5 PM PST, at which point we lost the metrics. No changes were made to the datadog configuration, nor to the method of container deploy in kubernetes - it was a manually triggered rebuild and redeploy to test unrelated processes.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:15 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
SmoshySmoshcommented, Feb 28, 2018

+1

Would be really nice to get this resolved.

0reactions
FlorianVeauxcommented, Jan 31, 2020

Going through old issues, this was solved in https://github.com/DataDog/integrations-core/pull/4150

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting - IBM
Troubleshoot issues that may occur while using Integration tracing. "Application is not available" error, although all pods are ready.
Read more >
Network - Datadog Docs
Number of times TCP connections have made a direct transition to the SYN-SENT state from the CLOSED state (Linux only). Shown as connection....
Read more >
Kubernetes 1.25: PodHasNetwork Condition for Pods
In order to have the kubelet report the PodHasNetwork condition in the status field of a pod, please enable the PodHasNetworkCondition feature ......
Read more >
Ensure containers do not share the host network namespace
Description. When using the host network mode for a container, that container's network stack is not isolated from the Docker host, so the...
Read more >
Network Integration Tools - Riverbed Support
During the 40- to 45-second delay before the client-side SteelHead declares a peer unavailable, it passes through any incoming new connections; they are...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found