Pgbackrest failure for INCR and DIFF but not FULL backup
От | KK CHN |
---|---|
Тема | Pgbackrest failure for INCR and DIFF but not FULL backup |
Дата | |
Msg-id | CAKgGyB-8ap3-zna_gkpEQNfzgWzu4QwO5Y6U_-2q1JcjK=UTvw@mail.gmail.com обсуждение исходный текст |
Список | pgsql-general |
Hi folks,
I am facing a strange issue, Pgbackrest backup fails for DIFF or INCR backups but not Full backup, with the error WAL file cannot be archived before 60000 ms timeout.
The pgbackrest " stanza check " command sometimes succeeds, but sometimes fails.
I don't know why PG is unable to copy WAL files from pg_wal to /data/myarchive_dir in real time. I always observed a delay of a few minutes for a wal file from pg_wal to appear in /data/my_archive_dir.
I'hv observed in the postgresql.conf (checkpoint_timeout = 5 m, max_wal_size = 16 GB, wal_keep_size=15GB, min_wal_size=80MB etc.)
and I found pg_wal dir size is 16 GB on disk (du -h -d 1 ) and /data/archive = 1.2 T
/dev/mapper/rhel_bcga68-data 5.0T 1.8T 3.3T 35% /data
Can we suspect the 5 M is the reason for the WAL archiving delay ?
Backup to a remote RepoServer for INCR or DIFF backup always fails, I found a full backup always succeeds!!!
What is the ideal value needed to be set for "checkpoint_timeout" ?
Or this doesn't have any impact on pgbackrest failure ?
archive_command = 'pgbackrest --stanza=My_Repo archive-push %p && cp %p /data/archive/%f'
From postgresql logs I am seeing this ..
HINT: check '/var/log/pgbackrest/My_Repo-archive-push-async.log' for errors.
INFO: archive-push command end: aborted with exception [082]
2025-05-02 12:15:17 IST LOG: archive command failed with exit code 82
2025-05-02 12:15:17 IST DETAIL: The failed archive command was: pgbackrest --stanza=My_Repo archive-push pg_wal/000000010000026300000002 && cp pg_wal/000000010000026300000002 /data/archive/000000010000026300000002
INFO: archive-push command begin 2.52.1: [pg_wal/000000010000026300000002] --archive-async --compress-type=zst --exec-id=2848559-384cf49c --log-level-console=info --log-level-file=debug --log-level-stderr=info --pg1-path= /var/lib/postgres/16/data --pg-version-force=16 --process-max=6 --repo1-host=10.x.y.202 --repo1-host-user=pgbackrest --spool-path=/var/spool/pgbackrest --stanza=My_Repo
top output on DB cluster:
top - 12:37:00 up 66 days, 17:24, 2 users, load average: 4.04, 4.72, 4.56
Tasks: 902 total, 4 running, 897 sleeping, 0 stopped, 1 zombie
%Cpu(s): 7.4 us, 1.7 sy, 0.0 ni, 89.9 id, 0.4 wa, 0.2 hi, 0.4 si, 0.0 st
MiB Mem : 31837.6 total, 706.1 free, 15243.0 used, 24741.0 buff/cache
MiB Swap: 8060.0 total, 6634.0 free, 1426.0 used. 16608.9 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2839363 postgre+ 20 0 8965608 7.2g 7.1g S 70.2 23.0 2:02.61 postgres
2864108 postgre+ 20 0 8967848 7.1g 7.1g S 64.9 22.8 0:30.04 postgres
2865547 postgre+ 20 0 8965432 7.1g 7.1g S 39.1 22.8 0:32.30 postgres
2865752 postgre+ 20 0 8964352 6.9g 6.9g S 16.6 22.3 0:32.94 postgres
Model name: Intel(R) Xeon(R) Gold 6430
BIOS Model name: Intel(R) Xeon(R) Gold 6430
CPU family: 6
Model: 143
Thread(s) per core: 1
Core(s) per socket: 16
These are vCPUs (16 nos) , OS RHEL 9, postgres 16, pgbackrest 2.52.1
Any hints most welcome to find the root cause / troubleshoot the pgbackrest failures for DIFF/ INCR backups.
2839363 postgre+ 20 0 8965608 7.2g 7.1g S 70.2 23.0 2:02.61 postgres
2864108 postgre+ 20 0 8967848 7.1g 7.1g S 64.9 22.8 0:30.04 postgres
2865547 postgre+ 20 0 8965432 7.1g 7.1g S 39.1 22.8 0:32.30 postgres
2865752 postgre+ 20 0 8964352 6.9g 6.9g S 16.6 22.3 0:32.94 postgres
Model name: Intel(R) Xeon(R) Gold 6430
BIOS Model name: Intel(R) Xeon(R) Gold 6430
CPU family: 6
Model: 143
Thread(s) per core: 1
Core(s) per socket: 16
These are vCPUs (16 nos) , OS RHEL 9, postgres 16, pgbackrest 2.52.1
Any hints most welcome to find the root cause / troubleshoot the pgbackrest failures for DIFF/ INCR backups.
Thank you
Krishane
For More Inputs:
For more info : you can see the full backup success here. but a diff backup fails.
full backup: 20250505-070204F
timestamp start/stop: 2025-05-05 07:02:04+05:30 / 2025-05-05 22:11:23+05:30
wal start/stop: 000000010000026F00000066 / 000000010000027300000045
database size: 503.1GB, database backup size: 503.1GB
repo1: backup size: 79GB
When I try diff backup it always fails.
[root@Repo ~]# tail -f /var/log/pgbackrest/My_Repo-backup.log
stack trace:
command/archive/find.c:walSegmentFind:191:(this: {WalSegmentFind}, walSegment: {"000000010000027B0000006C"})
command/backup/backup.c:backupArchiveCheckCopy:(backupData: {BackupData}, manifest: {Manifest})
command/backup/backup.c:cmdBackup:(void)
main.c:main:(debug log level required for parameters)
--------------------------------------------------------------------
2025-05-07 15:47:49.760 P00 INFO: backup command end: aborted with exception [082]
2025-05-07 15:47:49.760 P00 DEBUG: command/exit::exitSafe: => 82
2025-05-07 15:47:49.860 P00 DEBUG: main::main: => 82
^C
[ root@Repo ~ ~]# date
Wednesday 07 May 2025 04:06:37 PM IST
The postgres log says
=316781-61d82f85 --log-level-console=info --log-level-file=debug --log-level-stderr=info --pg1-path= /var/lib/postgres/16/data --pg-version-force=16 --process-max=3 --repo1-host=10.x.y.202 --repo1-host-user=pgbackrest --spool-path=/var/spool/pgbackrest --stanza=My_Repo
INFO: pushed WAL file '000000010000027B00000082' to the archive asynchronously
INFO: archive-push command end: completed successfully (37506ms)
INFO: archive-push command begin 2.52.1: [pg_wal/000000010000027B00000083] --archive-async --compress-type=zst --exec-id=317334-27a30b57 --log-level-console=info --log-level-file=debug --log-level-stderr=info --pg1-path= /var/lib/postgres/16/data --pg-version-force=16 --process-max=3 --repo1-host=10.x.y.202 --repo1-host-user=pgbackrest --spool-path=/var/spool/pgbackrest --stanza=My_Repo
ERROR: [082]: unable to push WAL file '000000010000027B00000083' to the archive asynchronously after 60 second(s)
HINT: check '/var/log/pgbackrest/My_Repo-archive-push-async.log' for errors.
INFO: archive-push command end: aborted with exception [082]
2025-05-07 16:22:56 IST LOG: archive command failed with exit code 82
INFO: pushed WAL file '000000010000027B00000082' to the archive asynchronously
INFO: archive-push command end: completed successfully (37506ms)
INFO: archive-push command begin 2.52.1: [pg_wal/000000010000027B00000083] --archive-async --compress-type=zst --exec-id=317334-27a30b57 --log-level-console=info --log-level-file=debug --log-level-stderr=info --pg1-path= /var/lib/postgres/16/data --pg-version-force=16 --process-max=3 --repo1-host=10.x.y.202 --repo1-host-user=pgbackrest --spool-path=/var/spool/pgbackrest --stanza=My_Repo
ERROR: [082]: unable to push WAL file '000000010000027B00000083' to the archive asynchronously after 60 second(s)
HINT: check '/var/log/pgbackrest/My_Repo-archive-push-async.log' for errors.
INFO: archive-push command end: aborted with exception [082]
2025-05-07 16:22:56 IST LOG: archive command failed with exit code 82
[ root@Repo ~ ~]#
В списке pgsql-general по дате отправления: