Add support for PgBackrest
See original GitHub issueHi @CyberDem0n and team,
In August on the Slack channel, we talked about the support of PgBackrest in Spilo. Now my colleague @fdalfons and I are working on it. Let me first specify our requirements. We need to run PgBackrest on Spilo image in order to collect WAL files and send them to our COS on IBM Cloud (S3 based Object Storage) every 10 minutes. Then we have a PgBackrest container standalone that runs on a separate worker (different from Patroni nodes because we don’t want to run a backup process where Patroni leave to not overload it) that connects to the Patroni nodes and creates a full backup every 6 hours and incremental backup every 30 minutes and send them to COS. We need also the feature to let Patroni cluster bootstrap from backup using PgBackrest on Spilo nodes.
Now we want to discuss here only the installation of PgBackrest on Spilo image, the writing of its /etc/pgbackrest.conf
file in order to collect the WAL files and send them to COS.
The installation of PgBackrest on Spilo is easy:
RUN export DEBIAN_FRONTEND=noninteractive \
&& apt-get update \
&& apt-get install -y pgbackrest
just below this line of code: https://github.com/zalando/spilo/blob/3612e7f54284c8394bb7a4fd1097a0ef63e7c154/postgres-appliance/Dockerfile#L412
in order to generate the /etc/pgbackrest/pgbackrest.conf
we need a set of environment variables like this:
ENV PGBACKREST_USE=false
ENV PGBACKREST_S3_ENDPOINT=
ENV PGBACKREST_S3_KEY=
ENV PGBACKREST_S3_KEY_SECRET=
ENV PGBACKREST_S3_PATH=
ENV PGBACKREST_S3_REGION=
ENV PGBACKREST_CHIPHER_TYPE=
ENV PGBACKREST_CHIPHER_PASSWORD=
The final file should be something like this:
[$SCOPE]
pg1-path=$PGDATA
pg1-socket-path=/tmp
pg1-port=$PGPORT
pg2-path=$PGDATA
pg2-socket-path=/tmp
pg2-port=$PGPORT
pg3-path=$PGDATA
pg3-socket-path=/tmp
pg3-port=$PGPORT
[global]
log-level-file=detail
process-max=4
repo1-cipher-pass=$PGBACKREST_CHIPHER_PASSWORD
repo1-cipher-type=$PGBACKREST_CHIPHER_TYPE
repo1-retention-diff=2
repo1-retention-full=2
repo1-path=$PGBACKREST_S3_PATH
repo1-s3-bucket=$PGBACKREST_S3_BUCKET
repo1-s3-endpoint=$PGBACKREST_S3_ENDPOINT
repo1-s3-key=$PGBACKREST_S3_KEY
repo1-s3-key-secret=$PGBACKREST_S3_KEY_SECRET
repo1-s3-region=$PGBACKREST_S3_REGION
repo1-type=s3
[global:archive-push]
compress-level=3
At the moment this file comes from our code, probably needs some rework, but it is just a baseline for our discussion. As you can notice the file need some information already available in Spilo, for example:
- PGDATA
- PGPORT
- SCOPE
- PGUSER_ADMIN
Now it’s clear to me that, in order to write this file, I need to modify configure_spilo.py
and I used WAL-E for inspiration. I think I need to add something like this:
elif section == 'pgbackrest':
if placeholders['PGBACKREST_USE']:
write_pgbackrest_environment(placeholders, '', args['force'])
now write_pgbackrest_environment
should be something like write_wale_environment
but it’s not clear to me exactly what this function does and if I need to support all Clouds like AWS, Google, etc. that are out of the scope of my activity.
I imagine also that I need something like this also for bootstrap from backup with PgBackrest: https://github.com/zalando/spilo/blob/3612e7f54284c8394bb7a4fd1097a0ef63e7c154/postgres-appliance/scripts/configure_spilo.py#L1084-L1086
In general, I need a few directions on how to change the code to achieve my requirements, what to do to make the code as generic as possible to be included in Spilo (obviously if this doesn’t take too much time), and if there are other missing parts I didn’t consider (I imagine also some changes in TEMPLATE).
Thank you in advance for any help.
Issue Analytics
- State:
- Created 2 years ago
- Comments:13 (13 by maintainers)
Top GitHub Comments
This is interesting. Thank you.
Ok, thank you. My question is: isn’t it dangerous to have the primary server do the backup? Usually what we do is always try to leave to sync node (where possible) or async to manage backup, because it is expected it don’t have read traffic and, in general, there is less workload.
Hi all,
I finished implementing PgBackrest in Spilo. It works quite well for our purpose. If you want to give a look you can check out this commit: https://github.com/sasadangelo/spilo/commit/666005248cd1eb9a69c2d4bb0e789c1b408bea7e
however, for the moment I will not provide a PR because I would the to explore the possibility to have PgBackrest as a separate container and run it in the same Spilo Pod. In this way, there is no need to change the Spilo code. At the moment, I already have a Docker container for PgBackrest but it works in a separate Pod only to schedule and orchestrate backups on a NOT MASTER node. The goal is to have this container is two way:
Obviously, to do that we need that the PgBackrest container has access via mount point to the PostgreSQL data directory. Then inject via Kubernetes YAML the archive_command.
A similar approach is used by CrunchyData to keep the backup code in a separated container different by the Patroni/PostgreSQL one. I will info you as soon as it will be ready.