Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Random Read timed out.

See original GitHub issue

It seems that under heavy load my API service is throwing the following errors at random intervals:

Traceback (most recent call last):
  File "/Users/phil/.pyenv/versions/nvp/lib/python3.9/site-packages/elasticapm/transport/http.py", line 86, in send
    response = self.http.urlopen(
  File "/Users/phil/.pyenv/versions/nvp/lib/python3.9/site-packages/urllib3/poolmanager.py", line 417, in urlopen
    return self.urlopen(method, redirect_location, **kw)
  File "/Users/phil/.pyenv/versions/nvp/lib/python3.9/site-packages/urllib3/poolmanager.py", line 375, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "/Users/phil/.pyenv/versions/nvp/lib/python3.9/site-packages/elasticapm/instrumentation/packages/base.py", line 205, in call_if_sampling
    return wrapped(*args, **kwargs)
  File "/Users/phil/.pyenv/versions/nvp/lib/python3.9/site-packages/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "/Users/phil/.pyenv/versions/nvp/lib/python3.9/site-packages/urllib3/util/retry.py", line 532, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/Users/phil/.pyenv/versions/nvp/lib/python3.9/site-packages/urllib3/packages/six.py", line 770, in reraise
    raise value
  File "/Users/phil/.pyenv/versions/nvp/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/Users/phil/.pyenv/versions/nvp/lib/python3.9/site-packages/urllib3/connectionpool.py", line 447, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/Users/phil/.pyenv/versions/nvp/lib/python3.9/site-packages/urllib3/connectionpool.py", line 336, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='xxxx', port=443): Read timed out. (read timeout=5)

Its weird because some traces do successfully come before and after I get those errors without anything changing server side.

Have you seen this before? I am running APM Server on EKS through ECK.

Issue Analytics

State:
Created 2 years ago
Comments:6 (6 by maintainers)

Top GitHub Comments

1reaction

basepicommented, Nov 17, 2021

Retry sounds nice until we’re impacting your host adversely due to retries.

The assumption is that users who are running at high load are also probably handling a lot of traffic, and thus tuning their sample rate to something closer to 10% or 1%. At that point, we’re only sampling a subset anyway, which means that losing traces is more acceptable than using more resources, since you’re not seeing all the traces anyway.

Granted, we still collect durations for unsampled transactions, which means that dropped events will result in slight inaccuracies for overall metrics. But we still think this is a valid tradeoff to avoid impacting a system.

0reactions

beniwohlicommented, Jan 11, 2022

I’ll close this as, as far we can tell, the agent works as intended.