May data be corrupted after an interrupted, but afterwards sucessfully replayed recovery?

Поиск
Список
Период
Сортировка
От Dietrich, Benjamin
Тема May data be corrupted after an interrupted, but afterwards sucessfully replayed recovery?
Дата
Msg-id 236a0aa64b1c41dd8e305e64ca6d38b5@uni-tuebingen.de
обсуждение исходный текст
Ответы Re: May data be corrupted after an interrupted, but afterwards sucessfully replayed recovery?
Список pgsql-admin
Hi experts,

we have a cluster that crashed because of a full 'pg_wal' disk.
When it automatically tried to restart after failure, it went into recovery,
crashed again - this time in recovery - and stopped (see logs below [1]).

The original problem is solved and the cluster now successfully started an
finished recovery.
But when I restarted the cluster it HINTed:

2025-02-20 09:26:57.615 CET [3982486] LOG:  database system was interrupted
while in recovery at 2025-02-20 03:38:24 CET
2025-02-20 09:26:57.615 CET [3982486] HINT:  This probably means that some
data is corrupted and you will have to use the last backup for recovery.


My question is now, if there is still a chance that the data is corrupted,
although the interrupted recovery was successfully redone later?
Should I recover the whole cluster from the last PITR backup to be safe? Or
can I be sure that the data is in a solid state, since recovery succeeded on
second try?

Regards,
Benjamin 


----------------------
[1] logs:

2025-02-20 03:38:18.278 CET [3767151] PANIC:  could not write to file
"pg_wal/xlogtemp.3767151": No space left on device
2025-02-20 03:38:18.315 CET [1909] LOG:  server process (PID 3767151) was
terminated by signal 6: Aborted
2025-02-20 03:38:18.315 CET [1909] LOG:  terminating any other active server
processes
2025-02-20 03:38:21.938 CET [1909] LOG:  all server processes terminated;
reinitializing
2025-02-20 03:38:23.867 CET [3863518] LOG:  database system was interrupted;
last known up at 2025-02-20 03:34:49 CET
2025-02-20 03:38:24.089 CET [3863518] LOG:  database system was not properly
shut down; automatic recovery in progress
2025-02-20 03:38:24.102 CET [3863518] LOG:  redo starts at E8A/EB648078
2025-02-20 03:38:24.119 CET [3863518] LOG:  redo done at E8A/EC19B6B0 system
usage: CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.02 s
2025-02-20 03:38:24.136 CET [3863518] FATAL:  could not write to file
"pg_wal/xlogtemp.3863518": No space left on device
2025-02-20 03:38:24.139 CET [1909] LOG:  startup process (PID 3863518)
exited with exit code 1
2025-02-20 03:38:24.139 CET [1909] LOG:  terminating any other active server
processes
2025-02-20 03:38:24.140 CET [1909] LOG:  shutting down due to startup
process failure
2025-02-20 03:38:24.213 CET [1909] LOG:  database system is shut down
2025-02-20 09:26:57.597 CET [3982483] LOG:  starting PostgreSQL 15.10
(Debian 15.10-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian
12.2.0-14) 12.2.0, 64-bit
2025-02-20 09:26:57.597 CET [3982483] LOG:  listening on IPv4 address
"0.0.0.0", port 5432
2025-02-20 09:26:57.597 CET [3982483] LOG:  listening on IPv6 address "::",
port 5432
2025-02-20 09:26:57.598 CET [3982483] LOG:  listening on Unix socket
"/var/run/postgresql/.s.PGSQL.5432"
2025-02-20 09:26:57.615 CET [3982486] LOG:  database system was interrupted
while in recovery at 2025-02-20 03:38:24 CET
2025-02-20 09:26:57.615 CET [3982486] HINT:  This probably means that some
data is corrupted and you will have to use the last backup for recovery.
2025-02-20 09:26:57.886 CET [3982486] LOG:  database system was not properly
shut down; automatic recovery in progress
2025-02-20 09:26:57.890 CET [3982486] LOG:  redo starts at E8A/EB648078
2025-02-20 09:26:57.932 CET [3982486] LOG:  redo done at E8A/EC19B6B0 system
usage: CPU: user: 0.00 s, system: 0.01 s, elapsed: 0.04 s
2025-02-20 09:26:57.966 CET [3982484] LOG:  checkpoint starting:
end-of-recovery immediate wait
2025-02-20 09:26:58.061 CET [3982484] LOG:  checkpoint complete: wrote 390
buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.058 s,
sync=0.001 s, total=0.096 s; sync files=103, longest=0.001 s, average=0.001
s; distance=26335 kB, estimate=26335 kB
2025-02-20 09:26:58.068 CET [3982483] LOG:  database system is ready to
accept connections

Вложения

В списке pgsql-admin по дате отправления: