Re: BUG #18009: Postgres Recovery not happening
От | Thomas Munro |
---|---|
Тема | Re: BUG #18009: Postgres Recovery not happening |
Дата | |
Msg-id | CA+hUKGL5ga_CaVh_ckNTgz8+3crYJYHA2RniXOFieFSrKqG9NA@mail.gmail.com обсуждение исходный текст |
Ответ на | BUG #18009: Postgres Recovery not happening (PG Bug reporting form <noreply@postgresql.org>) |
Ответы |
Re: BUG #18009: Postgres Recovery not happening
|
Список | pgsql-bugs |
On Sat, Jul 1, 2023 at 2:29 AM PG Bug reporting form <noreply@postgresql.org> wrote: > Operating system: AIX > I verified in the OS side, we are not observing explicit fsync() call post > writing to this file "000000010000000000000003". I suspect this because > the writes are present in the VMM page cache and not getting synced up to > the disk. Post restart of my node, DB is not coming up. We don't usually call fsync() for WAL files (except when initially creating them), we use various methods controlled by the setting wal_sync_method[1] and on AIX we default to open_datasync (that means we open the WAL with O_DSYNC and then we expect pwrite() to return only after the data is durably on disk). Have you changed that setting? When you say "abrupt shutdown", do you mean power loss? Perhaps you could investigate what O_DSYNC does with respect to write caches on your system and what your disk controllers etc promise about power loss. Can you reproduce this problem with a fresh cluster, and does it go away if you use wal_sync_method=fdatasync? It doesn't seem that likely to me that expensive AIX systems would fail at sensible volatile cache management, so that's a long shot, but we know that some other systems can fail in that way (eg Windows on consumer storage), and I'm pretty sure they can fail exactly as you described because the control file is fsync'd while the WAL is only written to volatile drive caches. [1] https://www.postgresql.org/docs/15/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS
В списке pgsql-bugs по дате отправления: