Can't setup leader after reinstalling cluster
See original GitHub issueHello dear community. I am newbie with patroni, but I want to use it. I am using it in next scenario
-
Set up cluster with consul Server1: 10.0.0.55 - bootstrap in consul Server2: 10.0.0.57 - server in consul Server3: 10.0.0.77 - server in consul
-
Set up postgresql cluster Server1: 10.0.0.48 - agent in consul, must be leader in patroni Server2: 10.0.0.49 - agent in consul Server3: 10.0.0.54 - agent in consul
Here is my configs:
Consul bootstrap:
{
"disable_remote_exec": true,
"domain": "consul.",
"data_dir": "/etc/consul/data",
"dns_config": {
"allow_stale": false,
"max_stale": "5s",
"node_ttl": "0s",
"service_ttl": {
"*": "0s"
},
"enable_truncate": false, "only_passing": false
},
"log_level": "INFO",
"node_name": "consul01.example.local",
"server": true,
"bind_addr": "10.0.0.55",
"datacenter": "test",
"bootstrap": true,
"ports": {
"dns": 8600,
"http": 8500,
"https": -1,
"rpc": 8400,
"serf_lan": 8301,
"serf_wan": 8302,
"server": 8300
},
"http_api_response_headers": {
"Access-Control-Allow-Origin": "*"
},
"rejoin_after_leave": true,
"leave_on_terminate": false,
"disable_update_check": false
}
Consul server:
{
"start_join": ["10.0.0.55", "10.0.0.77"],
"disable_remote_exec": true,
"domain": "consul.",
"data_dir": "/etc/consul/data",
"dns_config": {
"allow_stale": false,
"max_stale": "5s",
"node_ttl": "0s",
"service_ttl": {
"*": "0s"
},
"enable_truncate": false, "only_passing": false
},
"log_level": "INFO",
"node_name": "consul02.example.local",
"server": true,
"bind_addr": "10.0.0.57",
"datacenter": "test",
"ports": {
"dns": 8600,
"http": 8500,
"https": -1,
"rpc": 8400,
"serf_lan": 8301,
"serf_wan": 8302,
"server": 8300
},
"http_api_response_headers": {
"Access-Control-Allow-Origin": "*"
},
"rejoin_after_leave": true,
"leave_on_terminate": false,
"disable_update_check": false
}
Postgres server:
Patroni:
name: pgsql01.example.local
scope: &scope pgsql_cluster
role: master
consul:
host: 127.0.0.1:8500
restapi:
listen: 0.0.0.0:8008
connect_address: 10.0.0.48:8008
auth: 'username:flsdjkfasdjhfsd'
bootstrap:
dcs:
ttl: &ttl 30
loop_wait: &loop_wait 10
maximum_lag_on_failover: 1048576 # 1 megabyte in bytes
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
archive_mode: "on"
wal_level: hot_standby
archive_command: mkdir -p ../wal_archive && cp %p ../wal_archive/%f
max_wal_senders: 10
wal_keep_segments: 8
archive_timeout: 1800s
max_replication_slots: 5
hot_standby: "on"
wal_log_hints: "on"
pg_hba: # Add following lines to pg_hba.conf after running 'initdb'
- host replication replicator 10.0.0.0/16 md5
- host all all 0.0.0.0/0 md5
postgresql:
listen: 0.0.0.0:5432
connect_address: 10.0.0.48:5432
data_dir: /var/lib/pgsql/9.6/data
pg_rewind:
username: superuser
password: 123
pg_hba:
- host replication replicator 10.0.0.0/16 md5
- host all all 0.0.0.0/0 md5
replication:
username: replicator
password: 123
network: 10.0.0.0/16
superuser:
username: superuser
password: flsdjkfasdjhfsd
admin:
username: admin
password: 123
restore: /opt/patroni/patroni/scripts/restore.py
Consul on postgres server:
{
"start_join": ["10.0.0.55", "10.0.0.57", "10.0.0.77"],
"disable_remote_exec": true,
"domain": "consul.",
"data_dir": "/etc/consul/data",
"dns_config": {
"allow_stale": false,
"max_stale": "5s",
"node_ttl": "0s",
"service_ttl": {
"*": "0s"
},
"enable_truncate": false, "only_passing": false
},
"log_level": "INFO",
"node_name": "pgsql01.example.local",
"server": false,
"bind_addr": 10.0.0.48",
"datacenter": "test",
"ports": {
"dns": 8600,
"http": 8500,
"https": -1,
"rpc": 8400,
"serf_lan": 8301,
"serf_wan": 8302,
"server": 8300
},
"http_api_response_headers": {
"Access-Control-Allow-Origin": "*"
},
"rejoin_after_leave": true,
"leave_on_terminate": false,
"disable_update_check": false
}
Here is the question. If I am making new install of whole consul cluster and whole postgresql cluster all working as expected. BUT if I will reinstall all nodes in postgresql cluster after it patroni can’t find the leader. Please help me to solve this issue. Thank you
Issue Analytics
- State:
- Created 6 years ago
- Comments:7 (5 by maintainers)
Top GitHub Comments
This is expected behavior. Consul keeps information that such cluster already exists. Since cluster was already initialized (with initdb), Patroni will not run initdb the second time, because such operation effectively creates a new cluster.
There is a way to clean up this information from Consul without reinstalling it:
P.S. where do you get such Patroni config file? Our sample files looking differently for more than one year already: https://github.com/zalando/patroni/blob/master/postgres0.yml
@davecramer by and large yes, but the actual permission issue when erasing the cluster with patronictl on Kubernetes is specific to this issue.