Обсуждение: BUG #19032: In restore_command %f parameter does not support WAL partial files.
BUG #19032: In restore_command %f parameter does not support WAL partial files.
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 19032 Logged by: Wojciech Szenajch Email address: wszenajch@gmail.com PostgreSQL version: 17.6 Operating system: Ubuntu 24.04.03 LTS Description: Using restore_command = 'cp /tmp/receivewal/%f %p' does not use WAL partial file during backup restore process. %p value is used by Postgres to copy WAL and history files and is filled by Postgres internally without any Administrator influence. Example: 1. Contents of /tmp/receivewal: 0000000300000001000000E6 0000000300000001000000E7 0000000300000001000000E8 0000000300000001000000E9 0000000300000001000000EA ... 000000030000000200000004 000000030000000200000005 000000030000000200000006.partial 00000003.history 2. Log from restore process using restore_command = 'cp /tmp/receivewal/%f %p' : 2025-08-25 11:15:48.100 CEST [3326] LOG: starting PostgreSQL 17.6 (Ubuntu 17.6-1.pgdg24.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, 64-bit 2025-08-25 11:15:48.100 CEST [3326] LOG: listening on IPv4 address "0.0.0.0", port 5432 2025-08-25 11:15:48.101 CEST [3326] LOG: listening on IPv6 address "::", port 5432 2025-08-25 11:15:48.111 CEST [3326] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2025-08-25 11:15:48.139 CEST [3330] LOG: database system was interrupted; last known up at 2025-08-25 10:28:23 CEST cp: cannot stat '/tmp/receivewal/00000004.history': No such file or directory 2025-08-25 11:15:51.197 CEST [3330] LOG: starting backup recovery with redo LSN 1/E7000028, checkpoint LSN 1/E7000080, on timeline ID 3 2025-08-25 11:15:51.213 CEST [3330] LOG: restored log file "00000003.history" from archive 2025-08-25 11:15:51.318 CEST [3330] LOG: restored log file "0000000300000001000000E7" from archive 2025-08-25 11:15:51.439 CEST [3330] LOG: starting archive recovery 2025-08-25 11:15:51.463 CEST [3330] LOG: redo starts at 1/E7000028 2025-08-25 11:15:51.526 CEST [3330] LOG: restored log file "0000000300000001000000E8" from archive 2025-08-25 11:15:51.647 CEST [3330] LOG: completed backup recovery with redo LSN 1/E7000028 and end LSN 1/E7000120 2025-08-25 11:15:51.647 CEST [3330] LOG: consistent recovery state reached at 1/E7000120 2025-08-25 11:15:51.647 CEST [3326] LOG: database system is ready to accept read-only connections 2025-08-25 11:15:51.860 CEST [3330] LOG: restored log file "0000000300000001000000E9" from archive 2025-08-25 11:15:52.111 CEST [3330] LOG: restored log file "0000000300000001000000EA" from archive ... 2025-08-25 11:16:04.450 CEST [3328] LOG: recovery restart point at 1/FC09F7B0 2025-08-25 11:16:04.450 CEST [3328] DETAIL: Last completed transaction was at log time 2025-08-25 10:29:45.078323+02. 2025-08-25 11:16:04.450 CEST [3328] LOG: restartpoint starting: wal 2025-08-25 11:16:04.634 CEST [3330] LOG: restored log file "000000030000000200000004" from archive 2025-08-25 11:16:04.821 CEST [3330] LOG: restored log file "000000030000000200000005" from archive cp: cannot stat '/tmp/receivewal/000000030000000200000006': No such file or directory 2025-08-25 11:16:05.218 CEST [3330] LOG: redo done at 2/5F8CD38 system usage: CPU: user: 2.09 s, system: 4.27 s, elapsed: 13.75 s 2025-08-25 11:16:05.218 CEST [3330] LOG: last completed transaction was at log time 2025-08-25 10:31:27.632146+02 2025-08-25 11:16:05.269 CEST [3330] LOG: restored log file "000000030000000200000005" from archive cp: cannot stat '/tmp/receivewal/00000004.history': No such file or directory 2025-08-25 11:16:05.437 CEST [3330] LOG: selected new timeline ID: 4 2025-08-25 11:16:05.564 CEST [3330] LOG: restored log file "00000003.history" from archive 2025-08-25 11:16:05.583 CEST [3330] LOG: archive recovery complete 2025-08-25 11:16:06.625 CEST [3328] LOG: restartpoint complete: wrote 6392 buffers (39.0%); 1 WAL file(s) added, 2 removed, 0 recycled; write=1.443 s, sync=0.568 s, total=2.175 s; sync files=13, longest=0.444 s, average=0.044 s; distance=49438 kB, estimate=88596 kB; lsn=2/20B7918, redo lsn=1/FF0E70A8 2025-08-25 11:16:06.626 CEST [3328] LOG: recovery restart point at 1/FF0E70A8 2025-08-25 11:16:06.626 CEST [3328] DETAIL: Last completed transaction was at log time 2025-08-25 10:31:27.632146+02. 2025-08-25 11:16:06.626 CEST [3328] LOG: checkpoints are occurring too frequently (2 seconds apart) 2025-08-25 11:16:06.626 CEST [3328] HINT: Consider increasing the configuration parameter "max_wal_size". 2025-08-25 11:16:06.640 CEST [3328] LOG: checkpoint starting: end-of-recovery immediate wait wal 2025-08-25 11:16:07.033 CEST [3328] LOG: checkpoint complete: wrote 4125 buffers (25.2%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.278 s, sync=0.057 s, total=0.407 s; sync files=32, longest=0.013 s, average=0.002 s; distance=113534 kB, estimate=113534 kB; lsn=2/5FC6C38, redo lsn=2/5FC6C38 2025-08-25 11:16:07.068 CEST [3326] LOG: database system is ready to accept connections The last used is "restored log file "000000030000000200000005" from archive". Partial file is not used because is not copied: cp: cannot stat '/tmp/receivewal/000000030000000200000006': No such file or directory. When I used the following parameter value: restore_command = 'cp /tmp/receivewal/%f %p || cp /tmp/receivewal/%f.partial %p' to include files with partial extension, file 000000030000000200000006.partial was copied and seemed to be processed during restore (maybe it had no any valuable data in my experiment). The end of the log: ... 2025-08-25 11:06:46.703 CEST [3120] LOG: restored log file "000000030000000200000005" from archive cp: cannot stat '/tmp/receivewal/000000030000000200000006': No such file or directory 2025-08-25 11:06:47.604 CEST [3120] LOG: restored log file "000000030000000200000006" from archive 2025-08-25 11:06:47.670 CEST [3118] LOG: restartpoint complete: wrote 5058 buffers (30.9%); 0 WAL file(s) added, 3 removed, 0 recycled; write=1.791 s, sync=0.506 s, total=2.469 s; sync files=5, longest=0.303 s, average=0.102 s; distance=49438 kB, estimate=81497 kB; lsn=2/20B7918, redo lsn=1/FF0E70A8 2025-08-25 11:06:47.670 CEST [3118] LOG: recovery restart point at 1/FF0E70A8 2025-08-25 11:06:47.670 CEST [3118] DETAIL: Last completed transaction was at log time 2025-08-25 10:31:27.632146+02. 2025-08-25 11:06:47.671 CEST [3118] LOG: restartpoint starting: wal 2025-08-25 11:06:48.062 CEST [3120] LOG: invalid magic number 0000 in WAL segment 000000030000000200000006, LSN 2/686C000, offset 8830976 2025-08-25 11:06:48.063 CEST [3120] LOG: redo done at 2/686BFB0 system usage: CPU: user: 2.54 s, system: 6.48 s, elapsed: 17.16 s 2025-08-25 11:06:48.063 CEST [3120] LOG: last completed transaction was at log time 2025-08-25 10:31:45.384122+02 cp: cannot stat '/tmp/receivewal/000000030000000200000006': No such file or directory 2025-08-25 11:06:48.143 CEST [3120] LOG: restored log file "000000030000000200000006" from archive cp: cannot stat '/tmp/receivewal/00000004.history': No such file or directory cp: cannot stat '/tmp/receivewal/00000004.history.partial': No such file or directory 2025-08-25 11:06:48.306 CEST [3120] LOG: selected new timeline ID: 4 2025-08-25 11:06:48.461 CEST [3120] LOG: restored log file "00000003.history" from archive 2025-08-25 11:06:48.487 CEST [3120] LOG: archive recovery complete 2025-08-25 11:06:49.671 CEST [3118] LOG: restartpoint complete: wrote 6161 buffers (37.6%); 0 WAL file(s) added, 0 removed, 1 recycled; write=1.025 s, sync=0.916 s, total=2.001 s; sync files=44, longest=0.645 s, average=0.021 s; distance=50765 kB, estimate=78424 kB; lsn=2/4E58E48, redo lsn=2/227A4F8 2025-08-25 11:06:49.671 CEST [3118] LOG: recovery restart point at 2/227A4F8 2025-08-25 11:06:49.671 CEST [3118] DETAIL: Last completed transaction was at log time 2025-08-25 10:31:45.384122+02. 2025-08-25 11:06:49.682 CEST [3118] LOG: checkpoint starting: end-of-recovery immediate wait 2025-08-25 11:06:50.385 CEST [3118] LOG: checkpoint complete: wrote 10155 buffers (62.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.600 s, sync=0.062 s, total=0.714 s; sync files=21, longest=0.013 s, average=0.003 s; distance=71622 kB, estimate=77744 kB; lsn=2/686BFF0, redo lsn=2/686BFF0 2025-08-25 11:06:50.428 CEST [3116] LOG: database system is ready to accept connections Command used for copying WAL and history files copied 000000030000000200000006.partial file and it was used in restore process but giving false reference to /tmp/receivewal/00000004.history.partial. Maybe better template for including partial files may be used. But as an Administrator I expect this %f parameter to cover all files required to restore for recovery_target_timeline = 'latest'. Especially that restore_command = 'cp /tmp/receivewal/%f %p' definition example is used in Postgres documentation as sufficient to get maximum possible restore of data. I found "restore_command = 'cp /tmp/receivewal/%f %p || cp /tmp/receivewal/%f.partial %p'" on page: https://www.cybertec-postgresql.com/en/never-lose-a-postgresql-transaction-with-pg_receivewal/#archive-recovery-and-partial-wal-segments published 10.2019, so the problem probably also exists in older Postgres versions.
Re: BUG #19032: In restore_command %f parameter does not support WAL partial files.
От
"David G. Johnston"
Дата:
On Tue, Aug 26, 2025 at 5:10 PM PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:
Bug reference: 19032
Logged by: Wojciech Szenajch
Email address: wszenajch@gmail.com
PostgreSQL version: 17.6
Operating system: Ubuntu 24.04.03 LTS
Description:
Using restore_command = 'cp /tmp/receivewal/%f %p' does not use WAL partial
file during backup restore process.
While this seems un-documented in the user-facing documentation the source code makes it clear this is intended behavior. If your point-in-time-recovery flow requires the use of these files you will need to manually manipulate them.
(link to article about a custom archiving process relying on pg_receivewal)
It seems unfortunate that the pg_receivewal creates files with ".partial" extensions that are semantically different from what the core system ".partial" files are. But if your archive process is custom (i.e., uses something other than archive_command) it seems reasonable that any restore_command you need to create would also be something custom. Going by the documentation for only half of the process is going to lead to these kinds of inconsistent results.
David J.
Re: BUG #19032: In restore_command %f parameter does not support WAL partial files.
От
Wojciech Szenajch
Дата:
I used pg_basebackup for full backups. pg_receivewal for collecting latest WALs from Primary Server.
The scenario I described was for full database restoration in a situation of complete Primary database server unrecoverable crash.
So, it is not the situation when orignial Primary may return back.
I did not use archive_command, because it seemed to be obsolete when pg_receivewal is is used.
In my procedures there is nothing different from standard usage described in Postgres manual.
"pg_receivewal creates files with ".partial" extensions that are semantically different from what the core system ".partial" files" - does it mean that files are useless for database restoring from backup?
"pg_receivewal creates files with ".partial" extensions that are semantically different from what the core system ".partial" files" - does it mean that files are useless for database restoring from backup?
From pg_receivewal documentation:
"pg_receivewal streams the write-ahead log in real time as it's being generated on the server, and does not wait for segments to complete like archive_command and archive_library do. For this reason, it is not necessary to set archive_timeout when using pg_receivewal." - What is it for? Especially with this "archive_timeout" and " does not wait for segments to complete" remarks.
"pg_receivewal streams the write-ahead log in real time as it's being generated on the server, and does not wait for segments to complete like archive_command and archive_library do. For this reason, it is not necessary to set archive_timeout when using pg_receivewal." - What is it for? Especially with this "archive_timeout" and " does not wait for segments to complete" remarks.
"When using pg_receivewal instead of archive_command or archive_library as the main WAL backup method, it is strongly recommended to use replication slots" and "The archive_command or archive_library is only invoked for completed WAL segments."
In a case of server crash usable WAL part file is not accessible (putting pg_wal directory on i.e. NFS Server is not a good idea in my opinion).
It seems that advantage of Postgres database with almost continuous WAL backup compared to some commercial databases when equivalent to archive_timeout must be set to i.e. 10 minutes and end up with a lot of "empty" WALs i.e. during weekends is not true.
It is a pity, because Postgres has all tools to have this advantage, but they must be properly modified. Or there is something I do not understand and this functionality is somehow available.
Wojciech S.
śr., 27 sie 2025 o 02:48 David G. Johnston <david.g.johnston@gmail.com> napisał(a):
On Tue, Aug 26, 2025 at 5:10 PM PG Bug reporting form <noreply@postgresql.org> wrote:The following bug has been logged on the website:
Bug reference: 19032
Logged by: Wojciech Szenajch
Email address: wszenajch@gmail.com
PostgreSQL version: 17.6
Operating system: Ubuntu 24.04.03 LTS
Description:
Using restore_command = 'cp /tmp/receivewal/%f %p' does not use WAL partial
file during backup restore process.While this seems un-documented in the user-facing documentation the source code makes it clear this is intended behavior. If your point-in-time-recovery flow requires the use of these files you will need to manually manipulate them.(link to article about a custom archiving process relying on pg_receivewal)It seems unfortunate that the pg_receivewal creates files with ".partial" extensions that are semantically different from what the core system ".partial" files are. But if your archive process is custom (i.e., uses something other than archive_command) it seems reasonable that any restore_command you need to create would also be something custom. Going by the documentation for only half of the process is going to lead to these kinds of inconsistent results.David J.