Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

3.1.0 causing intermittent Connection closed by server error

See original GitHub issue

Version: redis-py: 3.1.0 redis: 3.2.4 django-redis: 4.10.0

Platform: Python 2.7 on Alpine-Linux inside Docker

Description: After upgrading from redis-py 3.0.1, our service becomes very unstable talking to the existing redis server. It generates around 30 ‘Connection closed by server.’ errors in 10 minutes while the server is under ~20 QPS. The error is intermittent and I am not able to reproduce what exactly caused the errors. I tried restarting the redis server, rebuild our Docker images without any cache, and none of them worked.

After rolling back to redis-py==3.0.1, all errors are gone.

I understand that I don’t really provide enough information to fix the problem, but I hope to at least highlight this problem and others might provide more.

Errors

File "lib/last_seen/models.py" in user_seen
  96.     seen = cache.get(cache_key)

File "/usr/lib/python2.7/site-packages/django_redis/cache.py" in _decorator
  39.             raise e.parent

Exception Type: ConnectionError at /helper/listing/list_591.5917864/
Exception Value: Error while reading from socket: (u'Connection closed by server.',)

Issue Analytics

State:
Created 5 years ago
Reactions:15
Comments:35 (11 by maintainers)

Top GitHub Comments

6reactions

andymccurdycommented, Feb 8, 2019

Great, glad things are going well. I’m going to add an EPollSelector today or over the weekend, write a few more tests and then get this merged to master.

Thanks for helping test this stuff!

4reactions

bartelscommented, Jan 30, 2019

I’m getting this error as well, also with retry_on_timeout. In my case the server has timeout 300 set in redis.conf. I can get this to happen consistently by setting a really low timeout 1.

I believe what’s happening is the connection is timed out by the server, but isn’t being removed from the client’s connection pool. A subsequent request that attempts to use that connection triggers ConnectionError: Error while reading from socket: ('Connection closed by server.',)

Previously, version 3.0.1 would retry and succeed, presumably with another working connection in the pool. In 3.1.0 it fails with an exception. This resutls in a 500 error with django-redis.

Issue #306 seems like it could be involved here. If a connection that has timed out on the server is not removed from the pool until it is tried again and fails, we’d get this behavior. The retry_on_timeout behavior in 3.0.1 mitigated this.