ReadTimeoutErrors on 169.254.169.254 /latest/api/token
See original GitHub issueAfter updating the agent to version 5.8.0 my all my Django manage.py
commands seem to hang indefinitely and never finish and I’m first seeing:
WARNING urllib3.connectionpool Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPConnectionPool(host='169.254.169.254', port=80): Read timed out. (read timeout=3.0)")': /latest/api/token connectionpool.py:749
and then (after some time):
ERROR elasticapm.transport Closing the transport connection timed out. base.py:261
After this last error message the process just hangs indefinitely and has to be killed manually.
To Reproduce
I’m not able to reproduce this locally, it seems to happen exclusively in my staging/production environment on AWS. There, it really only takes a Django management command to make it hang. For example:
$ python manage.py migrate
Operations to perform:
Apply all migrations: admin, auth, contenttypes, sessions
Running migrations:
No migrations to apply.
2020-07-03 13:17:19,256 WARNING urllib3.connectionpool Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPConnectionPool(host='169.254.169.254', port=80): Read timed out. (read timeout=3.0)")': /latest/api/token connectionpool.py:749
2020-07-03 13:17:27,164 ERROR elasticapm.transport Closing the transport connection timed out. base.py:261
Environment
- OS:
Linux 6094537dd0b9 4.14.181-108.257.amzn1.x86_64 elastic/apm-agent-python#1 SMP Wed May 27 02:43:03 UTC 2020 x86_64 GNU/Linux
(Docker container - python:3.8 image, docker running on Amazon Linux/2.15.2) - Python version: 3.8.3
- Framework and version: Django 3.0.8 (same thing happens with 3.0.7)
- APM Server version: 7.5.2
- Agent version: 5.8.0 (I can confirm this doesn’t happen with v5.7.0)
Additional context
The app runs on AWS ElasticBeanstalk Docker environment with python 3.8 image, pretty straight forward.
-
requirements.txt
:Click to expand
appdirs==1.4.4 asgiref==3.2.10; python_version >= '3.5' boto3==1.14.16 botocore==1.17.16 certifi==2020.6.20 cffi==1.14.0 chardet==3.0.4 cryptography==2.9.2; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4' django-migration-linter==2.3.0 django-redis==4.12.1 django-sortedm2m==3.0.0 django-storages==1.9.1 django-widget-tweaks==1.4.8 django==3.0.8 djangorestframework-simplejwt==4.4.0 djangorestframework==3.11.0 docutils==0.15.2; python_version >= '2.6' and python_version not in '3.0, 3.1, 3.2, 3.3' elastic-apm==5.8.0 factory-boy==2.12.0 faker==4.1.1; python_version >= '3.4' future==0.18.2; python_version >= '2.6' and python_version not in '3.0, 3.1, 3.2, 3.3' googlemaps==4.4.1 gunicorn==20.0.4 idna==2.10; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3' jmespath==0.10.0; python_version >= '2.6' and python_version not in '3.0, 3.1, 3.2, 3.3' jwcrypto==0.7 markdown==3.2.2 oauthlib==3.1.0; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3' pillow==7.2.0 psycopg2-binary==2.8.5 pycparser==2.20; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3' pyjwt==1.7.1 python-dateutil==2.8.1; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3' python-twitter==3.5 pytz==2020.1 pyyaml==5.3.1 redis==3.5.3; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4' requests-oauthlib==1.3.0 requests==2.24.0 s3transfer==0.3.3 sentry-sdk==0.16.0 six==1.15.0; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3' sqlparse==0.3.1; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3' text-unidecode==1.3 twilio==6.43.0 ua-parser==0.10.0 uritemplate==3.0.1 urllib3==1.25.9; python_version != '3.4' user-agents==2.1
Issue Analytics
- State:
- Created 3 years ago
- Reactions:3
- Comments:10 (7 by maintainers)
That’s my best guess. But it still shouldn’t have hung more than about 10-12 seconds, so I’m definitely confused…
In any case, I’m glad the fix is working!
Thanks for the report! This is certainly related to the new cloud metadata we collect, but I’ll need to investigate more to figure out what’s going wrong, especially with
CLOUD_PROVIDER=False
.