Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[envoy integration] Metrics missing

See original GitHub issue

Note: If you have a feature request, you should contact support so the request can be properly tracked.

Output of the info page

root@datadog-cluster-agent-69bc84c5c-rrkch:/# datadog-cluster-agent status
Getting the status from the agent.
2022-09-02 07:00:49 UTC | CLUSTER | WARN | (pkg/util/log/log.go:591 in func1) | Agent configuration relax permissions constraint on the secret backend cmd, Group can read and exec

===============================
Datadog Cluster Agent (v1.22.0)
===============================

  Status date: 2022-09-02 07:00:49.867 UTC (1662102049867)
  Agent start: 2022-08-30 08:45:54.797 UTC (1661849154797)
  Pid: 1
  Go Version: go1.17.11
  Build arch: amd64
  Agent flavor: cluster_agent
  Check Runners: 4
  Log Level: WARN

  Paths
  =====
    Config File: /etc/datadog-agent/datadog-cluster.yaml
    conf.d: /etc/datadog-agent/conf.d

  Clocks
  ======
    System time: 2022-09-02 07:00:49.867 UTC (1662102049867)

  Hostnames
  =========
    ec2-hostname: ****
    host_aliases: [***]
    hostname: ****
    instance-id: ***
    socket-fqdn: datadog-cluster-agent-69bc84c5c-rrkch
    socket-hostname: datadog-cluster-agent-69bc84c5c-rrkch
    hostname provider: container
    unused hostname providers:
      aws: Unable to determine hostname from EC2: Get "http://169.254.169.254/latest/meta-data/instance-id": dial tcp 169.254.169.254:80: connect: connection refused
      azure: azure_hostname_style is set to 'os'
      configuration/environment: hostname is empty
      gce: unable to retrieve hostname from GCE: GCE metadata API error: Get "http://169.254.169.254/computeMetadata/v1/instance/hostname": dial tcp 169.254.169.254:80: connect: connection refused

  Metadata
  ========

Leader Election
===============
  Leader Election Status:  Running
  Leader Name is: datadog-cluster-agent-69bc84c5c-r6r98
  Last Acquisition of the lease: Fri, 26 Aug 2022 14:02:50 UTC
  Renewed leadership: Fri, 02 Sep 2022 07:00:41 UTC
  Number of leader transitions: 13 transitions

Custom Metrics Server
=====================

  Data sources
  ------------
  URL: https://api.datadoghq.com

  
  ConfigMap name: default/datadog-custom-metrics
  External Metrics
  ----------------
    Total: 0
    Valid: 0
    

Cluster Checks Dispatching
==========================
  Status: Follower, redirecting to leader at 10.42.224.6

Admission Controller
====================
  
    Webhooks info
    -------------
      MutatingWebhookConfigurations name: datadog-webhook
      Created at: 2022-06-01T07:04:25Z
      ---------
        Name: datadog.webhook.config
        CA bundle digest: 4a037a372da419e0
        Object selector: &LabelSelector{MatchLabels:map[string]string{},MatchExpressions:[]LabelSelectorRequirement{LabelSelectorRequirement{Key:admission.datadoghq.com/enabled,Operator:NotIn,Values:[false],},},}
        Rule 1: Operations: [CREATE] - APIGroups: [] - APIVersions: [v1] - Resources: [pods]
        Service: default/datadog-cluster-agent-admission-controller - Port: 443 - Path: /injectconfig
      ---------
        Name: datadog.webhook.tags
        CA bundle digest: 4a037a372da419e0
        Object selector: &LabelSelector{MatchLabels:map[string]string{},MatchExpressions:[]LabelSelectorRequirement{LabelSelectorRequirement{Key:admission.datadoghq.com/enabled,Operator:NotIn,Values:[false],},},}
        Rule 1: Operations: [CREATE] - APIGroups: [] - APIVersions: [v1] - Resources: [pods]
        Service: default/datadog-cluster-agent-admission-controller - Port: 443 - Path: /injecttags
  
    Secret info
    -----------
    Secret name: webhook-certificate
    Secret namespace: default
    Created at: 2022-06-01T07:04:25Z
    CA bundle digest: 4a037a372da419e0
    Duration before certificate expiration: 6528h3m34.106622362s

=========
Collector
=========

  Running Checks
  ==============
    
    kubernetes_apiserver
    --------------------
      Instance ID: kubernetes_apiserver [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/kubernetes_apiserver.d/conf.yaml.default
      Total Runs: 16,860
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-09-02 07:00:42 UTC (1662102042000)
      Last Successful Execution Date : 2022-09-02 07:00:42 UTC (1662102042000)
      
    
    orchestrator
    ------------
      Instance ID: orchestrator:*** [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/orchestrator.d/conf.yaml.default
      Total Runs: 25,290
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-09-02 07:00:47 UTC (1662102047000)
      Last Successful Execution Date : 2022-09-02 07:00:47 UTC (1662102047000)
      
=========
Forwarder
=========

  Transactions
  ============
    Cluster: 0
    ClusterRole: 0
    ClusterRoleBinding: 0
    CronJob: 0
    DaemonSet: 0
    Deployment: 0
    Dropped: 0
    HighPriorityQueueFull: 0
    Ingress: 0
    Job: 0
    Node: 0
    PersistentVolume: 0
    PersistentVolumeClaim: 0
    Pod: 0
    ReplicaSet: 0
    Requeued: 300
    Retried: 94
    RetryQueueSize: 0
    Role: 0
    RoleBinding: 0
    Service: 0
    ServiceAccount: 0
    StatefulSet: 0

  Transaction Successes
  =====================
    Total number: 33719
    Successes By Endpoint:
      check_run_v1: 16,859
      intake: 1
      series_v1: 16,859

  Transaction Errors
  ==================
    Total number: 11
    Errors By Type:
      DNSErrors: 11

  On-disk storage
  ===============
    On-disk storage is disabled. Configure `forwarder_storage_max_size_in_bytes` to enable it.

==========
Endpoints
==========
  https://app.datadoghq.com - API Key ending with:
      - 1f056

=====================
Orchestrator Explorer
=====================
  Collection Status: Clusterchecks are activated but still warming up, the collection could be running on CLC Runners. To verify that we need the clusterchecks to be warmed up.
  Cluster Name: ***
  Cluster ID: ****
  Container scrubbing: enabled

  ======================
  Orchestrator Endpoints
  ======================
    https://orchestrator.datadoghq.com - API Key ending with: *****

  Status: Follower, cluster agent leader is: datadog-cluster-agent-69bc84c5c-r6r98

Additional environment details (Operating System, Cloud provider, etc): There is a support case 901101 but didn’t make much progress

Steps to reproduce the issue:

I have istio installed in my cluster and I need some metrics from envoy level hence I configured below on the app pods to scrape the envoy metrics.

        ad.datadoghq.com/istio-proxy.check_names: '["envoy"]'
        ad.datadoghq.com/istio-proxy.init_configs: '[{}]'
        ad.datadoghq.com/istio-proxy.instances: |
            [
              {
                "openmetrics_endpoint": "http://%%host%%:15090/stats/prometheus",
                "histogram_buckets_as_distributions": "true",
                "log_requests": "true",
                "extra_metrics": 
                  [
                    {
                      "envoy_cluster_upstream_rq_time": 
                        {
                          "name": "cluster.upstream_rq_time"
                          "type": "histogram"
                        }
                    }
                  ]
              }
            ]

send some traffic from one pod to the other. From the metrics endpoint and prometheus

Describe the results you received: I could find these metrics but in datadog explorer, I could not find them. Except for the 1st one, others are included in your metrics dict

cluster.upstream_rq_time
cluster.upstream_cx_rx_bytes_total
cluster.upstream_cx_tx_bytes_total
listener.downstream_cx_length_ms
cluster.upstream_rq_xx (raw metrics are with specific status code. I’m guess the agent will parse it?)
some metrics has the data but different from the raw metrics or prometheus scrapes. did datadog/query did some aggregation in the metrics explorer?
the support requested me to add ‘status_url’ but I guess it won’t work for v2 integration?
some metrics ‘type’ are different from the type exposed from the pod. like the ‘counter’ is converted to ‘rate’. Is this expected or somewhere has misconfiguration Describe the results you expected: scrape those metrics

Additional information you deem important (e.g. issue happens only occasionally):

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:11 (5 by maintainers)

Top GitHub Comments

1reaction

yzhan289commented, Nov 18, 2022

Hey @burningalchemist , unfortunately we don’t have any updates on this.

1reaction

burningalchemistcommented, Nov 3, 2022

@yzhan289 having the same issue I bet extra_metrics field is non-effective. I believe envoy_cluster_upstream_rq_time is important to have as a part of the integration to balance the existing envoy.http.downstream_rq_time while staying with openmetrics_endpoint. Would you mind reopening the issue?

@Shuanglu in the meantime did you find a solution?

Top Results From Across the Web

Envoy - Datadog Docs

If you are using Envoy as part of Istio, configure the Envoy integration to collect metrics from the Istio proxy metrics endpoint.

Missing Envoy metrics - - Bountysource

Missing Envoy metrics. ... A number of statistics from Envoyproxy are not exposed in Datadog. Amongst others these include the following:.

Observability for the Missing Hop: Envoy Mobile

Envoy has a best-in-class, comprehensive suite of observability data points. From distributed tracing, to logging, to time-series metrics, ...

Envoy Proxy Integration | Tanzu Observability Documentation

Envoy Proxy: This integration installs and configures Telegraf to send Envoy Proxy metrics into Wavefront. Telegraf is a light-weight server process capable ...

Ingest metrics from Envoy

The OpenTelemetry Collector, when configured with a Prometheus receiver, provides an integration with Envoy to ingest metrics.