Azure backups fail to complete in spilo-14:2.1-p6
See original GitHub issueWhen running spilo-14:2.1-p6 in an AKS cluster via the postgres operator, I can exec into a running pod and start a backup manually with this command:
envdir "/run/etc/wal-e.d/env" /scripts/postgres_backup.sh "/home/postgres/pgdata/pgroot/data"
However, the backup seems to get stuck, never moving past the Calling pg_stop_backup()
log message:
2022-06-22 17:08:34.816 - /scripts/postgres_backup.sh - I was called as: /scripts/postgres_backup.sh /home/postgres/pgdata/pgroot/data
2022-06-22 17:08:34.956 - /scripts/postgres_backup.sh - producing a new backup
INFO: 2022/06/22 17:08:35.005656 Calling pg_start_backup()
INFO: 2022/06/22 17:08:35.645221 Starting a new tar bundle
INFO: 2022/06/22 17:08:35.645254 Walking ...
INFO: 2022/06/22 17:08:35.645578 Starting part 1 ...
INFO: 2022/06/22 17:08:35.900894 Packing ...
INFO: 2022/06/22 17:08:35.902092 Finished writing part 1.
INFO: 2022/06/22 17:08:36.398543 Starting part 2 ...
INFO: 2022/06/22 17:08:36.398577 /global/pg_control
INFO: 2022/06/22 17:08:36.407265 Finished writing part 2.
INFO: 2022/06/22 17:08:36.407287 Calling pg_stop_backup()
I cannot reproduce this on a postgres cluster using spilo-14:2.1-p5 and the same command:
2022-06-22 18:17:14.291 - /scripts/postgres_backup.sh - I was called as: /scripts/postgres_backup.sh /home/postgres/pgdata/pgroot/data
2022-06-22 18:17:14.463 - /scripts/postgres_backup.sh - producing a new backup
INFO: 2022/06/22 18:17:14.502348 Selecting the latest backup as the base for the current delta backup...
INFO: 2022/06/22 18:17:14.520890 Calling pg_start_backup()
INFO: 2022/06/22 18:17:14.722144 Starting a new tar bundle
INFO: 2022/06/22 18:17:14.722173 Walking ...
INFO: 2022/06/22 18:17:14.722396 Starting part 1 ...
INFO: 2022/06/22 18:17:15.158707 Packing ...
INFO: 2022/06/22 18:17:15.158946 Finished writing part 1.
INFO: 2022/06/22 18:17:15.798094 Starting part 2 ...
INFO: 2022/06/22 18:17:15.798136 /global/pg_control
INFO: 2022/06/22 18:17:15.805326 Finished writing part 2.
INFO: 2022/06/22 18:17:15.806011 Calling pg_stop_backup()
INFO: 2022/06/22 18:17:16.952639 Starting part 3 ...
INFO: 2022/06/22 18:17:17.157215 backup_label
INFO: 2022/06/22 18:17:17.157300 tablespace_map
INFO: 2022/06/22 18:17:17.157365 Finished writing part 3.
INFO: 2022/06/22 18:17:17.437986 Wrote backup with name base_000000040000000000000042
Aside from the different Docker image being used, there are no differences in configuration or data. I also don’t see this issue on our AWS EKS clusters running this new version of spilo. Looking at the release notes for 2.1-p6, could this be related to the upgrade to wal-g v2.0.0?
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:11 (4 by maintainers)
Top Results From Across the Web
Troubleshoot backup errors with Azure VMs - Microsoft Learn
This section covers backup operation failure of Azure Virtual machine. ... Change the backup policy to perform backups during off peak hours ...
Read more >Troubleshoot the Azure Backup agent - Microsoft Learn
If the backup job completed with warnings, see Backup Jobs Completed with Warning; If scheduled backup fails but manual backup works, ...
Read more >Troubleshooting backup failures in Azure Disk Backup
In this article. Common issues faced with Azure Disk Backup; Next steps. This article provides troubleshooting information on backup and ...
Read more >Troubleshooting Common Configuration Issues with Azure ...
This blog post helps resolve common configuration issues with the Microsoft Cloud Backup Solution, Azure Backup.
Read more >Troubleshoot Agent and extension issues - Azure Backup
Troubleshoot Azure Backup failure: Issues with the agent or extension ... Complete the following troubleshooting steps in the order listed, ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@tomasz-zylka @b0n541 Here it is 😃 https://github.com/zalando/spilo/releases/tag/2.1-p7
A bug fix for wal-g 2.0 has been released today. When can we expect the fix being included into Spilo and made available in the operator?