3.1.0 causing intermittent Connection closed by server error
See original GitHub issueVersion: redis-py: 3.1.0 redis: 3.2.4 django-redis: 4.10.0
Platform: Python 2.7 on Alpine-Linux inside Docker
Description: After upgrading from redis-py 3.0.1, our service becomes very unstable talking to the existing redis server. It generates around 30 ‘Connection closed by server.’ errors in 10 minutes while the server is under ~20 QPS. The error is intermittent and I am not able to reproduce what exactly caused the errors. I tried restarting the redis server, rebuild our Docker images without any cache, and none of them worked.
After rolling back to redis-py==3.0.1, all errors are gone.
I understand that I don’t really provide enough information to fix the problem, but I hope to at least highlight this problem and others might provide more.
Errors
File "lib/last_seen/models.py" in user_seen
96. seen = cache.get(cache_key)
File "/usr/lib/python2.7/site-packages/django_redis/cache.py" in _decorator
39. raise e.parent
Exception Type: ConnectionError at /helper/listing/list_591.5917864/
Exception Value: Error while reading from socket: (u'Connection closed by server.',)
Issue Analytics
- State:
- Created 5 years ago
- Reactions:15
- Comments:35 (11 by maintainers)
Great, glad things are going well. I’m going to add an EPollSelector today or over the weekend, write a few more tests and then get this merged to master.
Thanks for helping test this stuff!
I’m getting this error as well, also with
retry_on_timeout
. In my case the server hastimeout 300
set in redis.conf. I can get this to happen consistently by setting a really lowtimeout 1
.I believe what’s happening is the connection is timed out by the server, but isn’t being removed from the client’s connection pool. A subsequent request that attempts to use that connection triggers ConnectionError:
Error while reading from socket: ('Connection closed by server.',)
Previously, version 3.0.1 would retry and succeed, presumably with another working connection in the pool. In 3.1.0 it fails with an exception. This resutls in a 500 error with django-redis.
Issue #306 seems like it could be involved here. If a connection that has timed out on the server is not removed from the pool until it is tried again and fails, we’d get this behavior. The
retry_on_timeout
behavior in 3.0.1 mitigated this.