question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't setup leader after reinstalling cluster

See original GitHub issue

Hello dear community. I am newbie with patroni, but I want to use it. I am using it in next scenario

  1. Set up cluster with consul Server1: 10.0.0.55 - bootstrap in consul Server2: 10.0.0.57 - server in consul Server3: 10.0.0.77 - server in consul

  2. Set up postgresql cluster Server1: 10.0.0.48 - agent in consul, must be leader in patroni Server2: 10.0.0.49 - agent in consul Server3: 10.0.0.54 - agent in consul

Here is my configs:

Consul bootstrap:

{
  "disable_remote_exec": true,
  "domain": "consul.",
  "data_dir": "/etc/consul/data",
  "dns_config": {
    "allow_stale": false,
    "max_stale": "5s",
    "node_ttl": "0s",
    "service_ttl": {
      "*": "0s"
    },
    "enable_truncate": false,    "only_passing": false
  },
  "log_level": "INFO",
  "node_name": "consul01.example.local",
  "server": true,
  "bind_addr": "10.0.0.55",
  "datacenter": "test",
  "bootstrap": true,
  "ports": {
    "dns": 8600,
    "http": 8500,
    "https": -1,
    "rpc": 8400,
    "serf_lan": 8301,
    "serf_wan": 8302,
    "server": 8300
  },
  "http_api_response_headers": {
        "Access-Control-Allow-Origin": "*"
  },
  "rejoin_after_leave": true,
  "leave_on_terminate": false,
  "disable_update_check": false
}

Consul server:

{
  "start_join": ["10.0.0.55", "10.0.0.77"],
  "disable_remote_exec": true,
  "domain": "consul.",
  "data_dir": "/etc/consul/data",
  "dns_config": {
    "allow_stale": false,
    "max_stale": "5s",
    "node_ttl": "0s",
    "service_ttl": {
      "*": "0s"
    },
    "enable_truncate": false,    "only_passing": false
  },
  "log_level": "INFO",
  "node_name": "consul02.example.local",
  "server": true,
  "bind_addr": "10.0.0.57",
  "datacenter": "test",
  "ports": {
    "dns": 8600,
    "http": 8500,
    "https": -1,
    "rpc": 8400,
    "serf_lan": 8301,
    "serf_wan": 8302,
    "server": 8300
  },
  "http_api_response_headers": {
        "Access-Control-Allow-Origin": "*"
  },
  "rejoin_after_leave": true,
  "leave_on_terminate": false,
  "disable_update_check": false
}

Postgres server:

Patroni:

name: pgsql01.example.local
scope: &scope pgsql_cluster
role: master

consul:
  host: 127.0.0.1:8500

restapi:
  listen: 0.0.0.0:8008
  connect_address: 10.0.0.48:8008
  auth: 'username:flsdjkfasdjhfsd'

bootstrap:
  dcs:
    ttl: &ttl 30
    loop_wait: &loop_wait 10
    maximum_lag_on_failover: 1048576 # 1 megabyte in bytes
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        archive_mode: "on"
        wal_level: hot_standby
        archive_command: mkdir -p ../wal_archive && cp %p ../wal_archive/%f
        max_wal_senders: 10
        wal_keep_segments: 8
        archive_timeout: 1800s
        max_replication_slots: 5
        hot_standby: "on"
        wal_log_hints: "on"

  pg_hba:  # Add following lines to pg_hba.conf after running 'initdb'
    - host replication replicator 10.0.0.0/16 md5
    - host all all 0.0.0.0/0 md5

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 10.0.0.48:5432
  data_dir: /var/lib/pgsql/9.6/data
  pg_rewind:
    username: superuser
    password: 123
  pg_hba:
  - host replication replicator 10.0.0.0/16 md5
  - host all all 0.0.0.0/0 md5
  replication:
    username: replicator
    password: 123
    network:  10.0.0.0/16
  superuser:
    username: superuser
    password: flsdjkfasdjhfsd
  admin:
    username: admin
    password: 123
  restore: /opt/patroni/patroni/scripts/restore.py

Consul on postgres server:

{
  "start_join": ["10.0.0.55", "10.0.0.57", "10.0.0.77"],
  "disable_remote_exec": true,
  "domain": "consul.",
  "data_dir": "/etc/consul/data",
  "dns_config": {
    "allow_stale": false,
    "max_stale": "5s",
    "node_ttl": "0s",
    "service_ttl": {
      "*": "0s"
    },
    "enable_truncate": false,    "only_passing": false
  },
  "log_level": "INFO",
  "node_name": "pgsql01.example.local",
  "server": false,
  "bind_addr": 10.0.0.48",
  "datacenter": "test",
  "ports": {
    "dns": 8600,
    "http": 8500,
    "https": -1,
    "rpc": 8400,
    "serf_lan": 8301,
    "serf_wan": 8302,
    "server": 8300
  },
  "http_api_response_headers": {
        "Access-Control-Allow-Origin": "*"
  },
  "rejoin_after_leave": true,
  "leave_on_terminate": false,
  "disable_update_check": false
}

Here is the question. If I am making new install of whole consul cluster and whole postgresql cluster all working as expected. BUT if I will reinstall all nodes in postgresql cluster after it patroni can’t find the leader. Please help me to solve this issue. Thank you

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

4reactions
CyberDem0ncommented, Oct 13, 2017

BUT if I will reinstall all nodes in postgresql cluster after it patroni can’t find the leader.

This is expected behavior. Consul keeps information that such cluster already exists. Since cluster was already initialized (with initdb), Patroni will not run initdb the second time, because such operation effectively creates a new cluster.

There is a way to clean up this information from Consul without reinstalling it:

$ patronictl -c patronicfg.yaml remove pgsql_cluster

P.S. where do you get such Patroni config file? Our sample files looking differently for more than one year already: https://github.com/zalando/patroni/blob/master/postgres0.yml

0reactions
alexeyklyukincommented, Jul 23, 2018

@davecramer by and large yes, but the actual permission issue when erasing the cluster with patronictl on Kubernetes is specific to this issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting installations | OpenShift Container Platform 4.8
When troubleshooting OpenShift Container Platform installation issues, you can monitor installation logs to determine at which stage issues occur. Then, ...
Read more >
Reinstall Microsoft HPC Pack Preserving the Data in the HPC ...
You cannot reinstall HPC Pack by using the installation wizard, ... On the head node, close HPC Cluster Manager, if it is currently...
Read more >
Cluster does not comeback after shutdown- After Reboot Cause
You can reboot the whole cluster and once a majority of machines come back online eventually a new leader will automatically be elected...
Read more >
Chapter 1. Troubleshooting Red Hat Advanced Cluster ...
1. Symptom: Reinstallation failure. If your pods do not start after you install Red Hat Advanced Cluster Management for Kubernetes, it is likely...
Read more >
Place a Member of vSAN Cluster in Maintenance Mode
If a virtual machine object that has data on the host is not accessible and is not fully evacuated, the host cannot enter...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found