Re: BUG #18009: Postgres Recovery not happening

Поиск
Список
Период
Сортировка
От Vamshikrishna T
Тема Re: BUG #18009: Postgres Recovery not happening
Дата
Msg-id CA+t6Qsnj3W+4zCY3QwayLGUhkXGRSwgbTZox3KXJEg-kajjdHQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #18009: Postgres Recovery not happening  (Thomas Munro <thomas.munro@gmail.com>)
Ответы Re: BUG #18009: Postgres Recovery not happening  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-bugs
Hi Thomas,

Thank you for the confirmation on fdatasync, Definitely this is not related to regular AIX users, so we may not need any change.
Other thing i am interested in is, Looks like Postgres 15.2 doesn't support Direct or Concurrent I/O on AIX ?. I don't see any
setting where i can tune that ?.  I feel DIO or CIO would be more safer than cached I/O. I see fsync error behaviors are
pretty interesting across different OS. Thanks for the info.

Thanks
 Vamshi.

On Wed, 5 Jul 2023 at 09:06, Thomas Munro <thomas.munro@gmail.com> wrote:
On Wed, Jul 5, 2023 at 1:48 AM Vamshikrishna T <tvk1271@gmail.com> wrote:
> Thank you for your immediate response, This is not on the default file system of AIX ( JFS2 ), but on a specific special purpose file system. looks like (  open_datasync ) O_DSYNC is causing the issue which seems to be not honoured on this file system. Yeah abrupt shutdown can be treated as power loss.

Sounds like a fun project.  Is this alien technology that regular AIX
users won't run into and I should forget this conversation ever
happened, or is it a clue we should use wal_sync_method=fdatasync by
default on AIX?

> I used wal_sync_method=fdatasync, ( although i am not sure the problem vanished or not, because it is not getting reproduced )  but what i observe
> is there is an immediate explicit sync calls to the files present in ../pg_wal/ directory post write call completions.
>
> 2023-07-04 03:36:04.259 CDT|64a3d9b4.8701bc|LOG:  checkpoint strting: time
> 2023-07-04 03:36:06.263 CDT|64a3d9b4.8701bc|LOG:  checkpoint complete: wrote 21 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=1.925 s, sync=0.053 s, total=2.005 s; sync files=20, longest=0.019 s, average=0.003 s; distance=32 kB, estimate=32 kB
>
> I can see between these two time interval, write caches are cleared. With  wal_sync_method=fdatasync tunable, Is it safe to assume all the Postgres DB writes during checkpointing are called via explicit call to  OS level sync() or fsync(), irrespective of O_DSYNC during open?.

Yes.  Our periodic checkpoints write out all the relation data (files
like base/1234/2345 that hold tables and indexes), and then always
call fsync() (sometimes the pwrite() calls and the fsync() happen in
different processes*).  But WAL data (files like
pg_wal/000000010000000000000001) get the various wal_sync_method
behaviours, with (IMHO unfortunately) different defaults based on a
series of inconsistent platform-by-platform historical decisions...

Just BTW, if you're interested in PostgreSQL on AIX, see
https://wiki.postgresql.org/wiki/AIX .

*With interesting cross-platform consequences.  If you're a
kernel/VMM/storage person you might find
https://wiki.postgresql.org/wiki/Fsync_Errors interesting.  Of course
we have no idea what any closed source kernel does.


--
Thanks
 Vamshi.

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Richard Guo
Дата:
Сообщение: Re: BUG #17540: Prepared statement: PG switches to a generic query plan which is consistently much slower
Следующее
От: Andres Freund
Дата:
Сообщение: Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()