Re: BUG #18025: Probably we need to change behaviour of the checkpoint failures in PG

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: BUG #18025: Probably we need to change behaviour of the checkpoint failures in PG
Дата
Msg-id ZLT2d0b/Zhhgh3v1@paquier.xyz
обсуждение исходный текст
Ответ на Re: BUG #18025: Probably we need to change behaviour of the checkpoint failures in PG  (Laurenz Albe <laurenz.albe@cybertec.at>)
Список pgsql-bugs
On Mon, Jul 17, 2023 at 09:53:32AM +0200, Laurenz Albe wrote:
> On Mon, 2023-07-17 at 05:03 +0000, PG Bug reporting form wrote:
>> Scenario is like, there was checkpoint operation failures going on the DB
>> server since last 8 hours which means no successful checkpoint happened in
>> the DB server since last 8 hours. Then DB server went into the crash mode
>> due to the exhausted disk space and did not came up as part of crash
>> recovery.
>
> Mistake #1: you did not monitor disk space.

max_wal_size is a very critical piece to adjust.  It is usually
recommended to split pg_wal/ into its own partition so as the space
allocated for WAL records is predictable across checkpoints.  This is
not a perfect science as max_wal_size is a soft limit so usually one
needs an extra margin with a WAL partition.  There have been some
patches floating around to make that a hard limit, as well, but I
don't think we've ever agreed on the semantics that would be
acceptable when reaching the upper limit authorized.
--
Michael

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: The same 2PC data maybe recovered twice
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: pg_basebackup: errors on macOS on directories with ".DS_Store" files