Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't setup leader after reinstalling cluster

See original GitHub issue

Hello dear community. I am newbie with patroni, but I want to use it. I am using it in next scenario

Set up cluster with consul Server1: 10.0.0.55 - bootstrap in consul Server2: 10.0.0.57 - server in consul Server3: 10.0.0.77 - server in consul
Set up postgresql cluster Server1: 10.0.0.48 - agent in consul, must be leader in patroni Server2: 10.0.0.49 - agent in consul Server3: 10.0.0.54 - agent in consul

Here is my configs:

Consul bootstrap:

{
  "disable_remote_exec": true,
  "domain": "consul.",
  "data_dir": "/etc/consul/data",
  "dns_config": {
    "allow_stale": false,
    "max_stale": "5s",
    "node_ttl": "0s",
    "service_ttl": {
      "*": "0s"
    },
    "enable_truncate": false,    "only_passing": false
  },
  "log_level": "INFO",
  "node_name": "consul01.example.local",
  "server": true,
  "bind_addr": "10.0.0.55",
  "datacenter": "test",
  "bootstrap": true,
  "ports": {
    "dns": 8600,
    "http": 8500,
    "https": -1,
    "rpc": 8400,
    "serf_lan": 8301,
    "serf_wan": 8302,
    "server": 8300
  },
  "http_api_response_headers": {
        "Access-Control-Allow-Origin": "*"
  },
  "rejoin_after_leave": true,
  "leave_on_terminate": false,
  "disable_update_check": false
}

Consul server:

{
  "start_join": ["10.0.0.55", "10.0.0.77"],
  "disable_remote_exec": true,
  "domain": "consul.",
  "data_dir": "/etc/consul/data",
  "dns_config": {
    "allow_stale": false,
    "max_stale": "5s",
    "node_ttl": "0s",
    "service_ttl": {
      "*": "0s"
    },
    "enable_truncate": false,    "only_passing": false
  },
  "log_level": "INFO",
  "node_name": "consul02.example.local",
  "server": true,
  "bind_addr": "10.0.0.57",
  "datacenter": "test",
  "ports": {
    "dns": 8600,
    "http": 8500,
    "https": -1,
    "rpc": 8400,
    "serf_lan": 8301,
    "serf_wan": 8302,
    "server": 8300
  },
  "http_api_response_headers": {
        "Access-Control-Allow-Origin": "*"
  },
  "rejoin_after_leave": true,
  "leave_on_terminate": false,
  "disable_update_check": false
}

Postgres server:

Patroni:

name: pgsql01.example.local
scope: &scope pgsql_cluster
role: master

consul:
  host: 127.0.0.1:8500

restapi:
  listen: 0.0.0.0:8008
  connect_address: 10.0.0.48:8008
  auth: 'username:flsdjkfasdjhfsd'

bootstrap:
  dcs:
    ttl: &ttl 30
    loop_wait: &loop_wait 10
    maximum_lag_on_failover: 1048576 # 1 megabyte in bytes
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        archive_mode: "on"
        wal_level: hot_standby
        archive_command: mkdir -p ../wal_archive && cp %p ../wal_archive/%f
        max_wal_senders: 10
        wal_keep_segments: 8
        archive_timeout: 1800s
        max_replication_slots: 5
        hot_standby: "on"
        wal_log_hints: "on"

  pg_hba:  # Add following lines to pg_hba.conf after running 'initdb'
    - host replication replicator 10.0.0.0/16 md5
    - host all all 0.0.0.0/0 md5

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 10.0.0.48:5432
  data_dir: /var/lib/pgsql/9.6/data
  pg_rewind:
    username: superuser
    password: 123
  pg_hba:
  - host replication replicator 10.0.0.0/16 md5
  - host all all 0.0.0.0/0 md5
  replication:
    username: replicator
    password: 123
    network:  10.0.0.0/16
  superuser:
    username: superuser
    password: flsdjkfasdjhfsd
  admin:
    username: admin
    password: 123
  restore: /opt/patroni/patroni/scripts/restore.py

Consul on postgres server:

{
  "start_join": ["10.0.0.55", "10.0.0.57", "10.0.0.77"],
  "disable_remote_exec": true,
  "domain": "consul.",
  "data_dir": "/etc/consul/data",
  "dns_config": {
    "allow_stale": false,
    "max_stale": "5s",
    "node_ttl": "0s",
    "service_ttl": {
      "*": "0s"
    },
    "enable_truncate": false,    "only_passing": false
  },
  "log_level": "INFO",
  "node_name": "pgsql01.example.local",
  "server": false,
  "bind_addr": 10.0.0.48",
  "datacenter": "test",
  "ports": {
    "dns": 8600,
    "http": 8500,
    "https": -1,
    "rpc": 8400,
    "serf_lan": 8301,
    "serf_wan": 8302,
    "server": 8300
  },
  "http_api_response_headers": {
        "Access-Control-Allow-Origin": "*"
  },
  "rejoin_after_leave": true,
  "leave_on_terminate": false,
  "disable_update_check": false
}

Here is the question. If I am making new install of whole consul cluster and whole postgresql cluster all working as expected. BUT if I will reinstall all nodes in postgresql cluster after it patroni can’t find the leader. Please help me to solve this issue. Thank you

Issue Analytics

State:
Created 6 years ago
Comments:7 (5 by maintainers)

Top GitHub Comments

4reactions

CyberDem0ncommented, Oct 13, 2017

BUT if I will reinstall all nodes in postgresql cluster after it patroni can’t find the leader.

This is expected behavior. Consul keeps information that such cluster already exists. Since cluster was already initialized (with initdb), Patroni will not run initdb the second time, because such operation effectively creates a new cluster.

There is a way to clean up this information from Consul without reinstalling it:

$ patronictl -c patronicfg.yaml remove pgsql_cluster

P.S. where do you get such Patroni config file? Our sample files looking differently for more than one year already: https://github.com/zalando/patroni/blob/master/postgres0.yml

0reactions

alexeyklyukincommented, Jul 23, 2018

@davecramer by and large yes, but the actual permission issue when erasing the cluster with patronictl on Kubernetes is specific to this issue.

Top Results From Across the Web

Troubleshooting installations | OpenShift Container Platform 4.8

When troubleshooting OpenShift Container Platform installation issues, you can monitor installation logs to determine at which stage issues occur. Then, ...

Reinstall Microsoft HPC Pack Preserving the Data in the HPC ...

You cannot reinstall HPC Pack by using the installation wizard, ... On the head node, close HPC Cluster Manager, if it is currently...

Cluster does not comeback after shutdown- After Reboot Cause

You can reboot the whole cluster and once a majority of machines come back online eventually a new leader will automatically be elected...

Chapter 1. Troubleshooting Red Hat Advanced Cluster ...

1. Symptom: Reinstallation failure. If your pods do not start after you install Red Hat Advanced Cluster Management for Kubernetes, it is likely...

Place a Member of vSAN Cluster in Maintenance Mode

If a virtual machine object that has data on the host is not accessible and is not fully evacuated, the host cannot enter...