ERROR: Error when fetching backup: pg_basebackup exited with code=1
See original GitHub issueHi, Can someone please guide me on this. I am configuring Postgres 12.1 with Patroni. I have a cluster with 7 nodes.Everytime I scale up, i end up with this situation. The master/Leader starts and works but the slave/replica either stops or ends up in ‘creating replica’ mode.
root@103d123f2d5c:/bp2/src# patronictl -c pg_patroni.yml list
+---------+--------+----------------+--------+------------------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+---------+--------+----------------+--------+------------------+----+-----------+
| blue0 | pg_0 | 127.0.0.1:5432 | Leader | running | 1 | |
| blue0 | pg_1 | 127.0.0.1:5432 | | stopped | | unknown |
| blue0 | pg_2 | 127.0.0.1:5432 | | stopped | | unknown |
| blue0 | pg_3 | 127.0.0.1:5432 | | stopped | | unknown |
| blue0 | pg_4 | 127.0.0.1:5432 | | stopped | | unknown |
| blue0 | pg_5 | 127.0.0.1:5432 | | stopped | | unknown |
| blue0 | pg_6 | 127.0.0.1:5432 | | creating replica | | unknown |
+---------+--------+----------------+--------+------------------+----+-----------+
The logs constantly shows error such as these:
2020-01-02 23:04:27,006 DEBUG: Sending request(xid=717): SetData(path='/bp/blue0/members/pg_1', data=b'{"conn_url":"postgres://127.0.0.1:5432/postgres","api_url":"http://127.0.0.1:8008/patroni","state":"stopped","role":"uninitialized","version":"1.6.3"}', version=-1)
2020-01-02 23:04:27,011 DEBUG: Received response(xid=717): ZnodeStat(czxid=12885154300, mzxid=12885161237, ctime=1577999766021, mtime=1578006267006, version=652, cversion=0, aversion=0, ephemeralOwner=31334655534829047, dataLength=150, numChildren=0, pzxid=12885154300)
2020-01-02 23:04:27,011 INFO: trying to bootstrap from leader 'pg_0'
2020-01-02 23:04:27,025 ERROR: Error when fetching backup: pg_basebackup exited with code=1
2020-01-02 23:04:27,025 WARNING: Trying again in 5 seconds
2020-01-02 23:04:32,037 ERROR: Error when fetching backup: pg_basebackup exited with code=1
2020-01-02 23:04:32,037 ERROR: failed to bootstrap from leader 'pg_0'
2020-01-02 23:04:32,037 INFO: Removing data directory: /bp2/data/psql
2020-01-02 23:04:37,004 INFO: Lock owner: pg_0; I am pg_1
2020-01-02 23:04:37,006 DEBUG: Sending request(xid=718): SetData(path='/bp/blue0/members/pg_1', data=b'{"conn_url":"postgres://127.0.0.1:5432/postgres","api_url":"http://127.0.0.1:8008/patroni","state":"stopped","role":"uninitialized","version":"1.6.3"}', version=-1)
2020-01-02 23:04:37,011 DEBUG: Received response(xid=718): ZnodeStat(czxid=12885154300, mzxid=12885161248, ctime=1577999766021, mtime=1578006277007, version=653, cversion=0, aversion=0, ephemeralOwner=31334655534829047, dataLength=150, numChildren=0, pzxid=12885154300)
2020-01-02 23:04:37,012 INFO: trying to bootstrap from leader 'pg_0'
2020-01-02 23:04:37,022 ERROR: Error when fetching backup: pg_basebackup exited with code=1
2020-01-02 23:04:37,023 WARNING: Trying again in 5 seconds
2020-01-02 23:04:42,036 ERROR: Error when fetching backup: pg_basebackup exited with code=1
2020-01-02 23:04:42,036 ERROR: failed to bootstrap from leader 'pg_0'
2020-01-02 23:04:42,036 INFO: Removing data directory: /bp2/data/psql
and here is my yaml file:
scope: blue0
namespace: /bp/
name: pg_1
log:
level: DEBUG
traceback_level: debug
dir: /bp2/log/
restapi:
listen: 127.0.0.1:8008
connect_address: 127.0.0.1:8008
zookeeper:
hosts: zookeeper:2181
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
parameters:
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 127.0.0.1/32 md5
- host all all 0.0.0.0/0 md5
users:
sbpadmin:
password: sbpadminpw
options:
- createrole
- createdb
bpadmin:
password: bpadminpw
options:
- replication
wbpadmin:
password: wbpadminpw
options:
- rewind
postgresql:
listen: 127.0.0.1:5432
connect_address: 127.0.0.1:5432
config_dir: /bp2/data/psql
data_dir: /bp2/data/psql
bin_dir: /usr/lib/postgresql/12/bin/
pgpass: /bp2/log/pgpass
authentication:
replication:
username: bpadmin
password: bpadminpw
superuser:
username: sbpadmin
password: sbpadminpw
rewind:
username: wbpadmin
password: wbpadminpw
parameters:
unix_socket_directories: '.'
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
Issue Analytics
- State:
- Created 4 years ago
- Comments:10
Top Results From Across the Web
Error on 2-Replica Cluster - Bootstrap from leader
JIC, the PRIMARY POD is hippo-instance1-tdxx. My assumption is (maybe I'm completely wrong), that with replicas=2 we have an HA setup with 1...
Read more >PGUpgrade script not working on patroni standby leaders
PGUpgrade script not working on patroni standby leaders ... 12:46:42,416 ERROR: Error when fetching backup: pg_basebackup exited with code=1.
Read more >Re: pg_basebackup: return value 1: reason? - PostgreSQL
> > I tried to run pg_basebackup. Return value is 1. > > > > How to find out its reason? > >...
Read more >Thread: pg_basebackup: return value 1: reason?
I think it failed to start process to fetching wal logs created during backup: but neither on master node neither on pg_basebackup output ......
Read more >26.3. Continuous Archiving and Point-in-Time Recovery (PITR)
You should ensure that any error condition or request to a human operator is ... 1.16 and later exit with 1 if a...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thanks. I think I found the issue, but do not know if my fix is correct or not. There were 2 problems
the $rep_username and $localhost_ip were replaced with the correct username and host-ip.
After this, the replication completes for replicas.
Thanks,
Why do you set listen and connect_address to 127.0.0.1? How nodes will discover each other?