Re: Incorrect handling of OOM in WAL replay leading to data loss
От | Kyotaro Horiguchi |
---|---|
Тема | Re: Incorrect handling of OOM in WAL replay leading to data loss |
Дата | |
Msg-id | 20230809.170049.2032567705309253841.horikyota.ntt@gmail.com обсуждение исходный текст |
Ответ на | Re: Incorrect handling of OOM in WAL replay leading to data loss (Michael Paquier <michael@paquier.xyz>) |
Ответы |
Re: Incorrect handling of OOM in WAL replay leading to data loss
|
Список | pgsql-hackers |
At Wed, 9 Aug 2023 16:35:09 +0900, Michael Paquier <michael@paquier.xyz> wrote in > Or perhaps just XLOG_READER_NO_ERROR? Looks fine. > > 0002 shifts the behavior for the OOM case from ending recovery to > > retrying at the same record. If the last record is really corrupted, > > the server won't be able to finish recovery. I doubt we are good with > > this behavior change. > > You mean on an incorrect xl_tot_len? Yes that could be possible. > Another possibility would be a retry logic with an hardcoded number of > attempts and a delay between each. Once the infrastructure is in > place, this still deserves more discussions but we can be flexible. > The immediate FATAL is choice. While it's a kind of bug in total, we encountered a case where an excessively large xl_tot_len actually came from a corrupted record. [1] I'm glad to see this infrastructure comes in, and I'm on board with retrying due to an OOM. However, I think we really need official steps to wrap up recovery when there is a truly broken, oversized xl_tot_len. [1] https://www.postgresql.org/message-id/17928-aa92416a70ff44a2@postgresql.org regards. -- Kyotaro Horiguchi NTT Open Source Software Center
В списке pgsql-hackers по дате отправления: