question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Azure backups fail to complete in spilo-14:2.1-p6

See original GitHub issue

When running spilo-14:2.1-p6 in an AKS cluster via the postgres operator, I can exec into a running pod and start a backup manually with this command:

envdir "/run/etc/wal-e.d/env" /scripts/postgres_backup.sh "/home/postgres/pgdata/pgroot/data"

However, the backup seems to get stuck, never moving past the Calling pg_stop_backup() log message:

2022-06-22 17:08:34.816 - /scripts/postgres_backup.sh - I was called as: /scripts/postgres_backup.sh /home/postgres/pgdata/pgroot/data
2022-06-22 17:08:34.956 - /scripts/postgres_backup.sh - producing a new backup
INFO: 2022/06/22 17:08:35.005656 Calling pg_start_backup()
INFO: 2022/06/22 17:08:35.645221 Starting a new tar bundle
INFO: 2022/06/22 17:08:35.645254 Walking ...
INFO: 2022/06/22 17:08:35.645578 Starting part 1 ...
INFO: 2022/06/22 17:08:35.900894 Packing ...
INFO: 2022/06/22 17:08:35.902092 Finished writing part 1.
INFO: 2022/06/22 17:08:36.398543 Starting part 2 ...
INFO: 2022/06/22 17:08:36.398577 /global/pg_control
INFO: 2022/06/22 17:08:36.407265 Finished writing part 2.
INFO: 2022/06/22 17:08:36.407287 Calling pg_stop_backup()

I cannot reproduce this on a postgres cluster using spilo-14:2.1-p5 and the same command:

2022-06-22 18:17:14.291 - /scripts/postgres_backup.sh - I was called as: /scripts/postgres_backup.sh /home/postgres/pgdata/pgroot/data
2022-06-22 18:17:14.463 - /scripts/postgres_backup.sh - producing a new backup
INFO: 2022/06/22 18:17:14.502348 Selecting the latest backup as the base for the current delta backup...
INFO: 2022/06/22 18:17:14.520890 Calling pg_start_backup()
INFO: 2022/06/22 18:17:14.722144 Starting a new tar bundle
INFO: 2022/06/22 18:17:14.722173 Walking ...
INFO: 2022/06/22 18:17:14.722396 Starting part 1 ...
INFO: 2022/06/22 18:17:15.158707 Packing ...
INFO: 2022/06/22 18:17:15.158946 Finished writing part 1.
INFO: 2022/06/22 18:17:15.798094 Starting part 2 ...
INFO: 2022/06/22 18:17:15.798136 /global/pg_control
INFO: 2022/06/22 18:17:15.805326 Finished writing part 2.
INFO: 2022/06/22 18:17:15.806011 Calling pg_stop_backup()
INFO: 2022/06/22 18:17:16.952639 Starting part 3 ...
INFO: 2022/06/22 18:17:17.157215 backup_label
INFO: 2022/06/22 18:17:17.157300 tablespace_map
INFO: 2022/06/22 18:17:17.157365 Finished writing part 3.
INFO: 2022/06/22 18:17:17.437986 Wrote backup with name base_000000040000000000000042

Aside from the different Docker image being used, there are no differences in configuration or data. I also don’t see this issue on our AWS EKS clusters running this new version of spilo. Looking at the release notes for 2.1-p6, could this be related to the upgrade to wal-g v2.0.0?

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:1
  • Comments:11 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
b0n541commented, Aug 25, 2022

A bug fix for wal-g 2.0 has been released today. When can we expect the fix being included into Spilo and made available in the operator?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot backup errors with Azure VMs - Microsoft Learn
This section covers backup operation failure of Azure Virtual machine. ... Change the backup policy to perform backups during off peak hours ...
Read more >
Troubleshoot the Azure Backup agent - Microsoft Learn
If the backup job completed with warnings, see Backup Jobs Completed with Warning; If scheduled backup fails but manual backup works, ...
Read more >
Troubleshooting backup failures in Azure Disk Backup
In this article. Common issues faced with Azure Disk Backup; Next steps. This article provides troubleshooting information on backup and ...
Read more >
Troubleshooting Common Configuration Issues with Azure ...
This blog post helps resolve common configuration issues with the Microsoft Cloud Backup Solution, Azure Backup.
Read more >
Troubleshoot Agent and extension issues - Azure Backup
Troubleshoot Azure Backup failure: Issues with the agent or extension ... Complete the following troubleshooting steps in the order listed, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found