question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

system ID mismatch, node belongs to a different cluster - after reboot

See original GitHub issue

Hi,

I ended up with a node out of the cluster after a minor postgresql upgrade and system reboot. I did this before in the past but never with patroni 1.2.5.

Error:

2017-10-10 11:20:44,915 CRITICAL: system ID mismatch, node sql2 belongs to a different cluster: 6283810682861444259 !=
2017-10-10 11:20:44,931 ERROR:
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/patroni/__init__.py", line 134, in patroni_main
    patroni.run()
  File "/usr/local/lib/python3.5/dist-packages/patroni/__init__.py", line 110, in run
    logger.info(self.ha.run_cycle())
  File "/usr/local/lib/python3.5/dist-packages/patroni/ha.py", line 946, in run_cycle
    info = self._run_cycle()
  File "/usr/local/lib/python3.5/dist-packages/patroni/ha.py", line 903, in _run_cycle
    sys.exit(1)
SystemExit: 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/patroni/dcs/etcd.py", line 329, in wrapper
    retval = func(self, *args, **kwargs) is not None
  File "/usr/local/lib/python3.5/dist-packages/patroni/dcs/etcd.py", line 511, in delete_leader
    return self._client.delete(self.leader_path, prevValue=self._name)
  File "/usr/local/lib/python3.5/dist-packages/etcd/client.py", line 584, in delete
    self.key_endpoint + key, self._MDELETE, params=kwds)
  File "/usr/local/lib/python3.5/dist-packages/patroni/dcs/etcd.py", line 211, in api_execute
    return self._handle_server_response(response)
  File "/usr/local/lib/python3.5/dist-packages/etcd/client.py", line 928, in _handle_server_response
    etcd.EtcdError.handle(r)
  File "/usr/local/lib/python3.5/dist-packages/etcd/__init__.py", line 304, in handle
    raise exc(msg, payload)
etcd.EtcdCompareFailed: Compare failed : [sql2 != sql1]

I found this: https://github.com/zalando/patroni/issues/438 Edited that file, indented return to proper position but still the same issue. Any ideas on how to debug this?

Thank you.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
bradnicholsoncommented, Dec 8, 2017

For our use case, since we are doing the pg_upgrade on newly provisioned DB’s only, the simplest approach is to upgrade the data before Patroni sees it.

0reactions
ghostcommented, Feb 12, 2020

I doubt that you want to delete the key from Consul, as it reports that there is an existing lock owner (odoo-psql). The initialize key comes from PostgreSQL, and can be seen using the pg_controldata command. All instances in your Patroni cluster should have the same ID, which will be the case if the second node was not initialized as a separate database but rather created as a copy of the first.

# /usr/lib/postgresql/12/bin/pg_controldata -D /var/lib/postgresql/12/main | grep "Database system identifier"
Database system identifier:           6792036218013953695

The only reason to delete the key from Consul is if you want to replace the existing database cluster entirely. To do that, you can use the Consul UI, Key/Value tab.

Read more comments on GitHub >

github_iconTop Results From Across the Web

system ID mismatch, node belongs to a different cluster ...
i installed a patroni master node and need to create a pgbackrest replica the master node state is running but the slave node...
Read more >
Unable to start Patroni after refreshing UAT postgres data ...
To fix this issue, you will need to clear out the Database System Identifier from the ETCD Data Store and then start Patroni....
Read more >
system ID mismatch, node belongs to a different cluster
Hi,. I ended up with a node out of the cluster after a minor postgresql upgrade and system reboot. I did this before...
Read more >
Allow to specify member name for `patroni reinitialize-replica ...
The problem is that when there is a system ID mismatch patroni will not boot up ... belongs to a different cluster: 6915205339950380026...
Read more >
Patroni - system ID mismatch, node belongs to a different cluster
I got system ID mismatch, node belongs to a different cluster error when try to remove existing node and re – add same...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found