Re: Incorrect handling of OOM in WAL replay leading to data loss

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: Incorrect handling of OOM in WAL replay leading to data loss
Дата
Msg-id ZNMsKS5rfn36t0Bi@paquier.xyz
обсуждение исходный текст
Ответ на Re: Incorrect handling of OOM in WAL replay leading to data loss  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Ответы Re: Incorrect handling of OOM in WAL replay leading to data loss  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Список pgsql-hackers
On Tue, Aug 08, 2023 at 05:44:03PM +0900, Kyotaro Horiguchi wrote:
> I like the overall direction. Though, I'm considering enclosing the
> errormsg and errorcode in a struct.

Yes, this suggestion makes sense as it simplifies all the WAL routines
that need to report back a complete error state, and there are four of
them now:
XLogPrefetcherReadRecord()
XLogReadRecord()
XLogNextRecord()
DecodeXLogRecord()

I have spent more time on 0001, polishing it and fixing a few bugs
that I have found while reviewing the whole.  Most of them were
related to mistakes in resetting the error state when expected.  I
have also expanded DecodeXLogRecord() to use an error structure
instead of only an errmsg, giving more consistency.  The error state
now relies on two structures:
+typedef enum XLogReaderErrorCode
+{
+   XLOG_READER_NONE = 0,
+   XLOG_READER_OOM,            /* out-of-memory */
+   XLOG_READER_INVALID_DATA,   /* record data */
+} XLogReaderErrorCode;
+typedef struct XLogReaderError
+{
+   /* Buffer to hold error message */
+   char       *message;
+   bool        message_deferred;
+   /* Error code when filling *message */
+   XLogReaderErrorCode code;
+} XLogReaderError;

I'm kind of happy with this layer, now.

I have also spent some time on finding a more elegant solution for the
WAL replay, relying on the new facility from 0001.  And it happens
that it is easy enough to loop if facing an out-of-memory failure when
reading a record when we are in crash recovery, as the state is
actually close to what a standby does.  The trick is that we should
not change the state and avoid tracking a continuation record.  This
is done in 0002, making replay more robust.  With the addition of the
error injection tweak in 0003, I am able to finish recovery while the
startup process loops if under memory pressure.  As mentioned
previously, there are more code paths to consider, but that's a start
to fix the data loss problems.

Comments are welcome.
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Smith
Дата:
Сообщение: Re: Handle infinite recursion in logical replication setup
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Adding a LogicalRepWorker type field