After failover occurred, commands such as keys() fails due to ConnectionError
See original GitHub issue- Library versions:
- redis (2.10.5)
- redis-py-cluster (1.3.4)
Hi, First, thank you for this great and useful library!
To test Redis cluster master/slave mode, I installed 6 redis instances (3masters and 3slaves) on the same machine with 6 different ports. Let’s say we have the following six nodes.
127.0.0.1:37074 master
127.0.0.1:37075 master
127.0.0.1:37076 master
127.0.0.1:37077 slave
127.0.0.1:37078 slave
127.0.0.1:37079 slave
Then, I killed one of master nodes. When I checked with cluster_nodes() method, the change (failover) is applied.
all_startup_nodes = [
{"host":"localhost", "port":37074},
{"host":"localhost", "port":37075},
{"host":"localhost", "port":37076},
{"host":"localhost", "port":37077},
{"host":"localhost", "port":37078},
{"host":"localhost", "port":37079}]
conn1 = rediscluster.StrictRedisCluster(startup_nodes=all_startup_nodes)
for node in conn1.cluster_nodes():
print node["host"], node["port"], node["flags"]
127.0.0.1 37074 ('myself', 'master')
127.0.0.1 37075 ('master', 'fail')
127.0.0.1 37076 ('slave',)
127.0.0.1 37077 ('slave',)
127.0.0.1 37078 ('master',)
127.0.0.1 37079 ('master',)
But, as soon as I run some commands such as keys(), it failed as follows.
conn1.keys()
---------------------------------------------------------------------------
ConnectionError Traceback (most recent call last)
<ipython-input-94-53843ab27ad5> in <module>()
----> 1 conn1.keys()
/opt/anaconda2/lib/python2.7/site-packages/redis/client.pyc in keys(self, pattern)
934 def keys(self, pattern='*'):
935 "Returns a list of keys matching ``pattern``"
--> 936 return self.execute_command('KEYS', pattern)
937
938 def mget(self, keys, *args):
/opt/anaconda2/lib/python2.7/site-packages/rediscluster/utils.pyc in inner(*args, **kwargs)
99 for _ in range(0, 3):
100 try:
--> 101 return func(*args, **kwargs)
102 except ClusterDownError:
103 # Try again with the new cluster setup. All other errors
/opt/anaconda2/lib/python2.7/site-packages/rediscluster/client.pyc in execute_command(self, *args, **kwargs)
317 node = self.determine_node(*args, **kwargs)
318 if node:
--> 319 return self._execute_command_on_nodes(node, *args, **kwargs)
320
321 # If set externally we must update it before calling any commands
/opt/anaconda2/lib/python2.7/site-packages/rediscluster/client.pyc in _execute_command_on_nodes(self, nodes, *args, **kwargs)
408 raise
409
--> 410 connection.send_command(*args)
411 res[node["name"]] = self.parse_response(connection, command, **kwargs)
412 finally:
/opt/anaconda2/lib/python2.7/site-packages/redis/connection.pyc in send_command(self, *args)
561 def send_command(self, *args):
562 "Pack and send a command to the Redis server"
--> 563 self.send_packed_command(self.pack_command(*args))
564
565 def can_read(self, timeout=0):
/opt/anaconda2/lib/python2.7/site-packages/redis/connection.pyc in send_packed_command(self, command)
536 "Send an already packed command to the Redis server"
537 if not self._sock:
--> 538 self.connect()
539 try:
540 if isinstance(command, str):
/opt/anaconda2/lib/python2.7/site-packages/redis/connection.pyc in connect(self)
440 except socket.error:
441 e = sys.exc_info()[1]
--> 442 raise ConnectionError(self._error_message(e))
443
444 self._sock = sock
ConnectionError: Error 111 connecting to 127.0.0.1:37075. Connection refused.
While digging the source code, I found that these commands use determine_node(“some command”) method to figure out Redis servers. However, the results of this method is old and does not reflect failover. For example, the following code shows that the killed master node (localhost:37075) is still a master node.
for node in list(conn1.determine_node("KEYS")):
print node
{'host': u'127.0.0.1', 'server_type': 'master', 'port': 37074, 'name': '127.0.0.1:37074'}
{'host': u'127.0.0.1', 'server_type': 'master', 'port': 37075, 'name': '127.0.0.1:37075'}
{'host': u'127.0.0.1', 'server_type': 'slave', 'port': 37076L, 'name': '127.0.0.1:37076'}
{'host': u'127.0.0.1', 'server_type': 'slave', 'port': 37077L, 'name': '127.0.0.1:37077'}
{'host': u'127.0.0.1', 'server_type': 'slave', 'port': 37078L, 'name': '127.0.0.1:37078'}
{'host': u'127.0.0.1', 'server_type': 'master', 'port': 37079, 'name': '127.0.0.1:37079'}
I wonder if this is a bug or I need to set some config option.
Best wishes, Han-Cheol
Issue Analytics
- State:
- Created 6 years ago
- Comments:13 (10 by maintainers)
@ofhellsfire Please take out your specific error that you have above and make a new issue with it so i can track it separate from this and add it into the 3.0.0 release instead.
If i read the root cause of the original issue in this issue then it should be solved now in master branch with the same fix that i applied to your other issue @ofhellsfire as the node selection and execution path is now covered in both scenarios with the try/except handling now so the connection error should no longer happen the same way as it does during failover.
Will close this issue as resolved and if the problem still remains in 2.1.0 then please open up a new issue and it can be solved in there.