question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

openmetrics - max retries exceeded error ceases metric ingestion until agent is restarted

See original GitHub issue

max retries exceeded error ceases metric ingestion until agent is restarted, happens (intermittently) when the instrumented service is redeployed

UTC | CORE | ERROR | (pkg/collector/worker/check_logger.go:68 in Error) | check:openmetrics | Error running check: [{"message": "HTTPConnectionPool(host='1.2.3.4', port=1234): Max retries exceeded with url: /metrics (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7fa33dba4c70>, 'Connection to 1.2.3.4 timed out. (connect timeout=10.0)'))", "traceback": "Traceback (most recent call last):
  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py\", line 174, in _new_conn
    conn = connection.create_connection(
  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/connection.py\", line 95, in create_connection
    raise err
  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/connection.py\", line 85, in create_connection
    sock.connect(sa)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py\", line 703, in urlopen
    httplib_response = self._make_request(
  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py\", line 398, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py\", line 239, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File \"/opt/datadog-agent/embedded/lib/python3.8/http/client.py\", line 1256, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File \"/opt/datadog-agent/embedded/lib/python3.8/http/client.py\", line 1302, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File \"/opt/datadog-agent/embedded/lib/python3.8/http/client.py\", line 1251, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File \"/opt/datadog-agent/embedded/lib/python3.8/http/client.py\", line 1011, in _send_output
    self.send(msg)
  File \"/opt/datadog-agent/embedded/lib/python3.8/http/client.py\", line 951, in send
    self.connect()
  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py\", line 205, in connect
    conn = self._new_conn()
  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py\", line 179, in _new_conn
    raise ConnectTimeoutError(
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPConnection object at 0x7fa33dba4c70>, 'Connection to 1.2.3.4 timed out. (connect timeout=10.0)')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/adapters.py\", line 439, in send
    resp = conn.urlopen(
  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py\", line 785, in urlopen
    retries = retries.increment(
  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/retry.py\", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='1.2.3.4', port=1234): Max retries exceeded with url: /metrics (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7fa33dba4c70>, 'Connection to 1.2.3.4 timed out. (connect timeout=10.0)'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py\", line 1033, in run
    self.check(instance)
  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/base_check.py\", line 142, in check
    self.process(scraper_config)
  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/mixins.py\", line 560, in process
    for metric in self.scrape_metrics(scraper_config):
  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/mixins.py\", line 490, in scrape_metrics
    response = self.poll(scraper_config)
  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/mixins.py\", line 824, in poll
    response = self.send_request(endpoint, scraper_config, headers)
  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/mixins.py\", line 850, in send_request
    return http_handler.get(endpoint, stream=True, **kwargs)
  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py\", line 341, in get
    return self._request('get', url, options)
  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py\", line 405, in _request
    response = self.make_request_aia_chasing(request_method, method, url, new_options, persist)
  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py\", line 411, in make_request_aia_chasing
    response = request_method(url, **new_options)
  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/api.py\", line 76, in get
    return request('get', url, params=params, **kwargs)
  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/api.py\", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/sessions.py\", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/sessions.py\", line 655, in send
    r = adapter.send(request, **kwargs)
  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/adapters.py\", line 504, in send
    raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPConnectionPool(host='1.2.3.4', port=1234): Max retries exceeded with url: /metrics (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7fa33dba4c70>, 'Connection to 1.2.3.4 timed out. (connect timeout=10.0)'))
"}]

Additional environment details (Operating System, Cloud provider, etc): not important

Steps to reproduce the issue:

No known re-produceable case but when it happens it always coincides with a redeployment of the instrumented service, and is always fixed by restarting the datadog agents and cluster agent (unclear which of two, or if both, need restarting)

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:1
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
ofekcommented, Mar 31, 2022
0reactions
yzhan289commented, May 4, 2022

Hi @naseemkullah , to help us collect more information on why this is failing, could you open a ticket to https://help.datadoghq.com/hc/en-us?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Agent cannot get Kubernetics metrics · Issue #5165 · DataDog ...
Datadog agent deployed with Helm Chart. Error log. ERROR | (pkg/collector/runner/runner.go:292 in work) | Error running check postgres: ...
Read more >
Prometheus and OpenMetrics metrics collection from a host
Collect your exposed Prometheus and OpenMetrics metrics from your application running on your hosts using the Datadog Agent, and the Datadog-OpenMetrics or ...
Read more >
Changelog - Cortex metrics
#4988 [ENHANCEMENT] Querier: limit series query to only ingesters if start param ... sharding where compaction stops when a tenant stops ingesting samples....
Read more >
Troubleshoot the Ops Agent | Operations Suite - Google Cloud
Go to the Agent is installed but not running section first to fix that condition. To fix this error, enable the Monitoring API...
Read more >
Exclude kubernetes namespaces from metric collection by ...
I'm using containerExclude to limit the namespace scope. ... port=9153): Max retries exceeded with url: /metrics (Caused by ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found