openmetrics - max retries exceeded error ceases metric ingestion until agent is restarted
See original GitHub issuemax retries exceeded error ceases metric ingestion until agent is restarted, happens (intermittently) when the instrumented service is redeployed
UTC | CORE | ERROR | (pkg/collector/worker/check_logger.go:68 in Error) | check:openmetrics | Error running check: [{"message": "HTTPConnectionPool(host='1.2.3.4', port=1234): Max retries exceeded with url: /metrics (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7fa33dba4c70>, 'Connection to 1.2.3.4 timed out. (connect timeout=10.0)'))", "traceback": "Traceback (most recent call last):
File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py\", line 174, in _new_conn
conn = connection.create_connection(
File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/connection.py\", line 95, in create_connection
raise err
File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/connection.py\", line 85, in create_connection
sock.connect(sa)
socket.timeout: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py\", line 703, in urlopen
httplib_response = self._make_request(
File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py\", line 398, in _make_request
conn.request(method, url, **httplib_request_kw)
File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py\", line 239, in request
super(HTTPConnection, self).request(method, url, body=body, headers=headers)
File \"/opt/datadog-agent/embedded/lib/python3.8/http/client.py\", line 1256, in request
self._send_request(method, url, body, headers, encode_chunked)
File \"/opt/datadog-agent/embedded/lib/python3.8/http/client.py\", line 1302, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File \"/opt/datadog-agent/embedded/lib/python3.8/http/client.py\", line 1251, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File \"/opt/datadog-agent/embedded/lib/python3.8/http/client.py\", line 1011, in _send_output
self.send(msg)
File \"/opt/datadog-agent/embedded/lib/python3.8/http/client.py\", line 951, in send
self.connect()
File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py\", line 205, in connect
conn = self._new_conn()
File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py\", line 179, in _new_conn
raise ConnectTimeoutError(
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPConnection object at 0x7fa33dba4c70>, 'Connection to 1.2.3.4 timed out. (connect timeout=10.0)')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/adapters.py\", line 439, in send
resp = conn.urlopen(
File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py\", line 785, in urlopen
retries = retries.increment(
File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/retry.py\", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='1.2.3.4', port=1234): Max retries exceeded with url: /metrics (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7fa33dba4c70>, 'Connection to 1.2.3.4 timed out. (connect timeout=10.0)'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py\", line 1033, in run
self.check(instance)
File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/base_check.py\", line 142, in check
self.process(scraper_config)
File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/mixins.py\", line 560, in process
for metric in self.scrape_metrics(scraper_config):
File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/mixins.py\", line 490, in scrape_metrics
response = self.poll(scraper_config)
File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/mixins.py\", line 824, in poll
response = self.send_request(endpoint, scraper_config, headers)
File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/mixins.py\", line 850, in send_request
return http_handler.get(endpoint, stream=True, **kwargs)
File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py\", line 341, in get
return self._request('get', url, options)
File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py\", line 405, in _request
response = self.make_request_aia_chasing(request_method, method, url, new_options, persist)
File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py\", line 411, in make_request_aia_chasing
response = request_method(url, **new_options)
File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/api.py\", line 76, in get
return request('get', url, params=params, **kwargs)
File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/api.py\", line 61, in request
return session.request(method=method, url=url, **kwargs)
File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/sessions.py\", line 542, in request
resp = self.send(prep, **send_kwargs)
File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/sessions.py\", line 655, in send
r = adapter.send(request, **kwargs)
File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/adapters.py\", line 504, in send
raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPConnectionPool(host='1.2.3.4', port=1234): Max retries exceeded with url: /metrics (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7fa33dba4c70>, 'Connection to 1.2.3.4 timed out. (connect timeout=10.0)'))
"}]
Additional environment details (Operating System, Cloud provider, etc): not important
Steps to reproduce the issue:
No known re-produceable case but when it happens it always coincides with a redeployment of the instrumented service, and is always fixed by restarting the datadog agents and cluster agent (unclear which of two, or if both, need restarting)
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Agent cannot get Kubernetics metrics · Issue #5165 · DataDog ...
Datadog agent deployed with Helm Chart. Error log. ERROR | (pkg/collector/runner/runner.go:292 in work) | Error running check postgres: ...
Read more >Prometheus and OpenMetrics metrics collection from a host
Collect your exposed Prometheus and OpenMetrics metrics from your application running on your hosts using the Datadog Agent, and the Datadog-OpenMetrics or ...
Read more >Changelog - Cortex metrics
#4988 [ENHANCEMENT] Querier: limit series query to only ingesters if start param ... sharding where compaction stops when a tenant stops ingesting samples....
Read more >Troubleshoot the Ops Agent | Operations Suite - Google Cloud
Go to the Agent is installed but not running section first to fix that condition. To fix this error, enable the Monitoring API...
Read more >Exclude kubernetes namespaces from metric collection by ...
I'm using containerExclude to limit the namespace scope. ... port=9153): Max retries exceeded with url: /metrics (Caused by ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Can you try setting https://github.com/DataDog/integrations-core/blob/50681b9c0df71cb8b03ad84c9c4a7be9144f557c/openmetrics/datadog_checks/openmetrics/data/conf.yaml.example#L49-L52 instead of
prometheus_url
to use the new implementation?Hi @naseemkullah , to help us collect more information on why this is failing, could you open a ticket to https://help.datadoghq.com/hc/en-us?