Re: Incorrect handling of OOM in WAL replay leading to data loss
От | Kyotaro Horiguchi |
---|---|
Тема | Re: Incorrect handling of OOM in WAL replay leading to data loss |
Дата | |
Msg-id | 20230801.135113.1095735354684995020.horikyota.ntt@gmail.com обсуждение исходный текст |
Ответ на | Incorrect handling of OOM in WAL replay leading to data loss (Michael Paquier <michael@paquier.xyz>) |
Ответы |
Re: Incorrect handling of OOM in WAL replay leading to data loss
|
Список | pgsql-hackers |
At Tue, 1 Aug 2023 12:43:21 +0900, Michael Paquier <michael@paquier.xyz> wrote in > A colleague, Ethan Mertz (in CC), has discovered that we don't handle > correctly WAL records that are failing because of an OOM when > allocating their required space. In the case of Ethan, we have bumped > on the failure after an allocation failure on XLogReadRecordAlloc(): > "out of memory while trying to decode a record of length" I believe a database server is not supposed to be executed under such a memory-constrained environment. > In crash recovery, any records after the OOM would not be replayed. > At quick glance, it seems to me that this can also impact standbys, > where recovery could stop earlier than it should once a consistent > point has been reached. Actually the code is assuming that OOM happens solely due to a broken record length field. I believe that we intentionally put that assumption. > A patch is registered in the commit fest to improve the error > detection handling, but as far as I can see it fails to handle the OOM > case and replaces ReadRecord() to use a WARNING in the redo loop: > https://www.postgresql.org/message-id/20200228.160100.2210969269596489579.horikyota.ntt%40gmail.com It doesn't change behavior unrelated to the case where the last record is followed by zeroed trailing bytes. > On top of my mind, any solution I can think of needs to add more > information to XLogReaderState, where we'd either track the type of > error that happened close to errormsg_buf which is where these errors > are tracked, but any of that cannot be backpatched, unfortunately. One issue on changing that behavior is that there's not a simple way to detect a broken record before loading it into memory. We might be able to implement a fallback mechanism for example that loads the record into an already-allocated buffer (which is smaller than the specified length) just to verify if it's corrupted. However, I question whether it's worth the additional complexity. And I'm not sure what if the first allocation failed. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
В списке pgsql-hackers по дате отправления: