question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

After failover occurred, commands such as keys() fails due to ConnectionError

See original GitHub issue
  • Library versions:
    • redis (2.10.5)
    • redis-py-cluster (1.3.4)

Hi, First, thank you for this great and useful library!

To test Redis cluster master/slave mode, I installed 6 redis instances (3masters and 3slaves) on the same machine with 6 different ports. Let’s say we have the following six nodes.

127.0.0.1:37074 master
127.0.0.1:37075 master
127.0.0.1:37076 master
127.0.0.1:37077 slave
127.0.0.1:37078 slave
127.0.0.1:37079 slave

Then, I killed one of master nodes. When I checked with cluster_nodes() method, the change (failover) is applied.

all_startup_nodes = [
    {"host":"localhost", "port":37074},
    {"host":"localhost", "port":37075},
    {"host":"localhost", "port":37076},
    {"host":"localhost", "port":37077},
    {"host":"localhost", "port":37078},
    {"host":"localhost", "port":37079}]
conn1 = rediscluster.StrictRedisCluster(startup_nodes=all_startup_nodes)

for node in conn1.cluster_nodes():
    print node["host"], node["port"], node["flags"]
127.0.0.1 37074 ('myself', 'master')
127.0.0.1 37075 ('master', 'fail')
127.0.0.1 37076 ('slave',)
127.0.0.1 37077 ('slave',)
127.0.0.1 37078 ('master',)
127.0.0.1 37079 ('master',)

But, as soon as I run some commands such as keys(), it failed as follows.

conn1.keys()
---------------------------------------------------------------------------
ConnectionError                           Traceback (most recent call last)
<ipython-input-94-53843ab27ad5> in <module>()
----> 1 conn1.keys()

/opt/anaconda2/lib/python2.7/site-packages/redis/client.pyc in keys(self, pattern)
    934     def keys(self, pattern='*'):
    935         "Returns a list of keys matching ``pattern``"
--> 936         return self.execute_command('KEYS', pattern)
    937
    938     def mget(self, keys, *args):

/opt/anaconda2/lib/python2.7/site-packages/rediscluster/utils.pyc in inner(*args, **kwargs)
     99         for _ in range(0, 3):
    100             try:
--> 101                 return func(*args, **kwargs)
    102             except ClusterDownError:
    103                 # Try again with the new cluster setup. All other errors

/opt/anaconda2/lib/python2.7/site-packages/rediscluster/client.pyc in execute_command(self, *args, **kwargs)
    317         node = self.determine_node(*args, **kwargs)
    318         if node:
--> 319             return self._execute_command_on_nodes(node, *args, **kwargs)
    320
    321         # If set externally we must update it before calling any commands

/opt/anaconda2/lib/python2.7/site-packages/rediscluster/client.pyc in _execute_command_on_nodes(self, nodes, *args, **kwargs)
    408                     raise
    409
--> 410                 connection.send_command(*args)
    411                 res[node["name"]] = self.parse_response(connection, command, **kwargs)
    412             finally:

/opt/anaconda2/lib/python2.7/site-packages/redis/connection.pyc in send_command(self, *args)
    561     def send_command(self, *args):
    562         "Pack and send a command to the Redis server"
--> 563         self.send_packed_command(self.pack_command(*args))
    564
    565     def can_read(self, timeout=0):

/opt/anaconda2/lib/python2.7/site-packages/redis/connection.pyc in send_packed_command(self, command)
    536         "Send an already packed command to the Redis server"
    537         if not self._sock:
--> 538             self.connect()
    539         try:
    540             if isinstance(command, str):

/opt/anaconda2/lib/python2.7/site-packages/redis/connection.pyc in connect(self)
    440         except socket.error:
    441             e = sys.exc_info()[1]
--> 442             raise ConnectionError(self._error_message(e))
    443
    444         self._sock = sock

ConnectionError: Error 111 connecting to 127.0.0.1:37075. Connection refused.

While digging the source code, I found that these commands use determine_node(“some command”) method to figure out Redis servers. However, the results of this method is old and does not reflect failover. For example, the following code shows that the killed master node (localhost:37075) is still a master node.

for node in list(conn1.determine_node("KEYS")):
    print node
{'host': u'127.0.0.1', 'server_type': 'master', 'port': 37074, 'name': '127.0.0.1:37074'}
{'host': u'127.0.0.1', 'server_type': 'master', 'port': 37075, 'name': '127.0.0.1:37075'}
{'host': u'127.0.0.1', 'server_type': 'slave', 'port': 37076L, 'name': '127.0.0.1:37076'}
{'host': u'127.0.0.1', 'server_type': 'slave', 'port': 37077L, 'name': '127.0.0.1:37077'}
{'host': u'127.0.0.1', 'server_type': 'slave', 'port': 37078L, 'name': '127.0.0.1:37078'}
{'host': u'127.0.0.1', 'server_type': 'master', 'port': 37079, 'name': '127.0.0.1:37079'}

I wonder if this is a bug or I need to set some config option.

Best wishes, Han-Cheol

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:13 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
Grokzencommented, Sep 15, 2020

@ofhellsfire Please take out your specific error that you have above and make a new issue with it so i can track it separate from this and add it into the 3.0.0 release instead.

If i read the root cause of the original issue in this issue then it should be solved now in master branch with the same fix that i applied to your other issue @ofhellsfire as the node selection and execution path is now covered in both scenarios with the try/except handling now so the connection error should no longer happen the same way as it does during failover.

0reactions
Grokzencommented, Sep 20, 2020

Will close this issue as resolved and if the problem still remains in 2.1.0 then please open up a new issue and it can be solved in there.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting - Amazon ElastiCache for Redis
While CPU utilization alone is not the cause for connectivity issues, spending too much time to process a single or few commands over...
Read more >
Azure SQL Database - Working with transient errors
After a delay of several seconds, retry the connection. A transient error occurs during a SQL Database and SQL Managed Instance query command....
Read more >
Enterprise Manager OMS Start Failure Known Issues And ...
Verify that repository credentials are set in credential store. Solution: Note 1942181.1 emctl start oms Failed With 'Unexpected error occurred.
Read more >
Cloud SQL for MySQL error messages
Check for the following conditions: The application did not call mysql_close() before exiting. Communication errors. The application might have been sleeping ...
Read more >
ReplyError: MOVED error after connecting to Redis Cluster AWS
The MOVED error is caused by using the Redis client directly and the configuration endpoint of ElastiCache (Redis Cluster Mode).
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found