Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ERROR: Error when fetching backup: pg_basebackup exited with code=1

See original GitHub issue

Hi, Can someone please guide me on this. I am configuring Postgres 12.1 with Patroni. I have a cluster with 7 nodes.Everytime I scale up, i end up with this situation. The master/Leader starts and works but the slave/replica either stops or ends up in ‘creating replica’ mode.

root@103d123f2d5c:/bp2/src# patronictl -c pg_patroni.yml list
+---------+--------+----------------+--------+------------------+----+-----------+
| Cluster | Member |      Host      |  Role  |      State       | TL | Lag in MB |
+---------+--------+----------------+--------+------------------+----+-----------+
|  blue0  |  pg_0  | 127.0.0.1:5432 | Leader |     running      |  1 |           |
|  blue0  |  pg_1  | 127.0.0.1:5432 |        |     stopped      |    |   unknown |
|  blue0  |  pg_2  | 127.0.0.1:5432 |        |     stopped      |    |   unknown |
|  blue0  |  pg_3  | 127.0.0.1:5432 |        |     stopped      |    |   unknown |
|  blue0  |  pg_4  | 127.0.0.1:5432 |        |     stopped      |    |   unknown |
|  blue0  |  pg_5  | 127.0.0.1:5432 |        |     stopped      |    |   unknown |
|  blue0  |  pg_6  | 127.0.0.1:5432 |        | creating replica |    |   unknown |
+---------+--------+----------------+--------+------------------+----+-----------+

The logs constantly shows error such as these: 

2020-01-02 23:04:27,006 DEBUG: Sending request(xid=717): SetData(path='/bp/blue0/members/pg_1', data=b'{"conn_url":"postgres://127.0.0.1:5432/postgres","api_url":"http://127.0.0.1:8008/patroni","state":"stopped","role":"uninitialized","version":"1.6.3"}', version=-1)
2020-01-02 23:04:27,011 DEBUG: Received response(xid=717): ZnodeStat(czxid=12885154300, mzxid=12885161237, ctime=1577999766021, mtime=1578006267006, version=652, cversion=0, aversion=0, ephemeralOwner=31334655534829047, dataLength=150, numChildren=0, pzxid=12885154300)
2020-01-02 23:04:27,011 INFO: trying to bootstrap from leader 'pg_0'
2020-01-02 23:04:27,025 ERROR: Error when fetching backup: pg_basebackup exited with code=1
2020-01-02 23:04:27,025 WARNING: Trying again in 5 seconds
2020-01-02 23:04:32,037 ERROR: Error when fetching backup: pg_basebackup exited with code=1
2020-01-02 23:04:32,037 ERROR: failed to bootstrap from leader 'pg_0'
2020-01-02 23:04:32,037 INFO: Removing data directory: /bp2/data/psql
2020-01-02 23:04:37,004 INFO: Lock owner: pg_0; I am pg_1
2020-01-02 23:04:37,006 DEBUG: Sending request(xid=718): SetData(path='/bp/blue0/members/pg_1', data=b'{"conn_url":"postgres://127.0.0.1:5432/postgres","api_url":"http://127.0.0.1:8008/patroni","state":"stopped","role":"uninitialized","version":"1.6.3"}', version=-1)
2020-01-02 23:04:37,011 DEBUG: Received response(xid=718): ZnodeStat(czxid=12885154300, mzxid=12885161248, ctime=1577999766021, mtime=1578006277007, version=653, cversion=0, aversion=0, ephemeralOwner=31334655534829047, dataLength=150, numChildren=0, pzxid=12885154300)
2020-01-02 23:04:37,012 INFO: trying to bootstrap from leader 'pg_0'
2020-01-02 23:04:37,022 ERROR: Error when fetching backup: pg_basebackup exited with code=1
2020-01-02 23:04:37,023 WARNING: Trying again in 5 seconds
2020-01-02 23:04:42,036 ERROR: Error when fetching backup: pg_basebackup exited with code=1
2020-01-02 23:04:42,036 ERROR: failed to bootstrap from leader 'pg_0'
2020-01-02 23:04:42,036 INFO: Removing data directory: /bp2/data/psql

and here is my yaml file:

scope: blue0
namespace: /bp/
name: pg_1

log:
  level: DEBUG
  traceback_level: debug
  dir: /bp2/log/
   
restapi:
  listen: 127.0.0.1:8008
  connect_address: 127.0.0.1:8008

zookeeper:
  hosts: zookeeper:2181

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      parameters:
  initdb: 
    - encoding: UTF8
    - data-checksums
  pg_hba: 
    - host replication replicator 127.0.0.1/32 md5
    - host all all 0.0.0.0/0 md5
  users:
    sbpadmin:
        password: sbpadminpw
        options:
            - createrole
            - createdb
    bpadmin:
        password: bpadminpw
        options:
            - replication
    wbpadmin:
        password: wbpadminpw
        options:
            - rewind

postgresql:
  listen: 127.0.0.1:5432
  connect_address: 127.0.0.1:5432
  config_dir: /bp2/data/psql
  data_dir: /bp2/data/psql
  bin_dir: /usr/lib/postgresql/12/bin/

  pgpass: /bp2/log/pgpass
  authentication:
    replication:
      username: bpadmin
      password: bpadminpw
    superuser:
      username: sbpadmin
      password: sbpadminpw
    rewind:
      username: wbpadmin
      password: wbpadminpw
  parameters:
    unix_socket_directories: '.'
tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

Issue Analytics

State:
Created 4 years ago
Comments:10

Top GitHub Comments

2reactions

sandeepkalracommented, Jan 10, 2020

Thanks. I think I found the issue, but do not know if my fix is correct or not. There were 2 problems

There was no user called ‘replicator’. 2- The pg_hba entry wasn’t correct (permissions wise). I changed pg_hba block to the following:

pg_hba:
 host replication $rep_username $localhost_ip/32 md5
 host all   $rep_username $localhost_ip/32 md5
 host replication $rep_username 0.0.0.0/0 md5
 host all   $rep_username 0.0.0.0/0 md5
 host all all 0.0.0.0/0 md5

the $rep_username and $localhost_ip were replaced with the correct username and host-ip.

After this, the replication completes for replicas.

Thanks,

1reaction

CyberDem0ncommented, Jan 4, 2020

Why do you set listen and connect_address to 127.0.0.1? How nodes will discover each other?