question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Sporadic "Connection not ready" exceptions with BlockingConnectionPool since 3.2.0

See original GitHub issue

Version: 3.2.0

Platform: Python 3.6.7 | packaged by conda-forge | (default, Nov 21 2018, 02:32:25) [GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux CentOS Linux release 7.5.1804 (Core) on docker

The redis server is using the official docker image. Redis server v=4.0.11 sha=00000000:0 malloc=jemalloc-4.0.3 bits=64 build=74253224a862200c

Description: Since upgrading to 3.2.0, we started getting sporadic errors in getting connections. image

The code that is running looks like this:

    pool = BlockingConnectionPool(max_connections=config.REDIS_CONNECTIONS_PER_WORKER, host=config.REDIS_HOST,  port=config.REDIS_PORT, db=0, timeout=config.REDIS_TIMEOUT)
   redis = StrictRedis(connection_pool=pool)
   redis.setex(...) / redis.get(...)

Additional information: This code is running inside an eventlet gunicorn worker. The server is very “network heavy” and opens lots of sockets (for example, dns queries).

127.0.0.1:6379> config get maxclients
1) "maxclients"
2) "10000"
127.0.0.1:6379> info clients
# Clients
connected_clients:104
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:2
  • Comments:22 (9 by maintainers)

github_iconTop GitHub Comments

2reactions
andymccurdycommented, Feb 20, 2019

It’s in line with what I was skimming in the eventlet docs. So that means that in your environment redis-py is likely using select.select to validate the health of a connection.

select.select has a number of issues, most notably only being able to poll file descriptors with file numbers < ~1024. If you’re running over that limit, the current implementation will simply return that the connection isn’t ready, which would explain why you’re seeing the error. See here: https://github.com/andymccurdy/redis-py/blob/master/redis/selector.py#L47

If you can correlate these errors around traffic spikes, that might further suggest that we’re on the right track.

One thing you could try is to reduce the max_connections in the pool. Fewer connections means fewer file descriptors which should reduce the chance of hitting the select.select issue.

We could also make the selector more pluggable such that you could inject your own logic or turn off the health checks for your environment.

You could actually do this now with a (albeit ugly) monkey patch like so:

from redis.selector import SelectSelector

class MySelector(SelectSelector):
    def check_is_ready_for_command(self, timeout):
        return True


from redis import connection
connection.DefaultSelector = MySelector

Or another possible solution is a simple flag to the connection pool to turn of the health checks. Enabling such a flag would cause the pool to behave the same as it did in 3.1.0.

1reaction
andymccurdycommented, Jun 6, 2019

Great, thanks for testing @NirBenor. I’ll get this merged in a few days.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Sporadic "Connection not ready" exceptions with ... - GitHub
0, we started getting sporadic errors in getting connections. image. The code that is running looks like this: pool = BlockingConnectionPool(max_connections= ...
Read more >
SUSE-CU-2021:9-1: Security update of harbor/harbor-redis
* Attempt to guarentee that the ConnectionPool hands out healthy connections. Healthy connections are those that have an established socket ...
Read more >
Low Level APIs - aioredis
ConnectionPool implementation does), it makes the client wait (“blocks”) for a specified number of seconds until a connection becomes available. Use ...
Read more >
A Redis Cache Backend for Django
django-redis-cache shares the same API as django's built-in cache backends, with a few exceptions. cache.delete_pattern. Delete keys using glob-style pattern.
Read more >
[opensuse-factory] New Tumbleweed snapshot 20200419 ...
Fix git-daemon not starting after conversion from sysvinit to systemd service ... Updated the docs to describe how to close the D-Bus connection...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found