Incorrect handling of OOM in WAL replay leading to data loss
От | Michael Paquier |
---|---|
Тема | Incorrect handling of OOM in WAL replay leading to data loss |
Дата | |
Msg-id | ZMh/WV+CuknqePQQ@paquier.xyz обсуждение исходный текст |
Ответы |
Re: Incorrect handling of OOM in WAL replay leading to data loss
Re: Incorrect handling of OOM in WAL replay leading to data loss |
Список | pgsql-hackers |
Hi all, A colleague, Ethan Mertz (in CC), has discovered that we don't handle correctly WAL records that are failing because of an OOM when allocating their required space. In the case of Ethan, we have bumped on the failure after an allocation failure on XLogReadRecordAlloc(): "out of memory while trying to decode a record of length" As far as I can see, PerformWalRecovery() uses LOG as elevel for its private callback in the xlogreader, when doing through ReadRecord(), which leads to a failure being reported, but recovery considers that the failure is the end of WAL and decides to abruptly end recovery, leading to some data lost. In crash recovery, any records after the OOM would not be replayed. At quick glance, it seems to me that this can also impact standbys, where recovery could stop earlier than it should once a consistent point has been reached. Attached is a patch that can be applied on HEAD to inject an error, then just run the script xlogreader_oom.bash attached, or something similar, to see the failure in the logs: LOG: redo starts at 0/1913CD0 LOG: out of memory while trying to decode a record of length 57 LOG: redo done at 0/1917358 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s It also looks like recovery_prefetch may mitigate a bit the issue if we do a read in non-blocking mode, but that's not a strong guarantee either, especially if the host is under memory pressure. A patch is registered in the commit fest to improve the error detection handling, but as far as I can see it fails to handle the OOM case and replaces ReadRecord() to use a WARNING in the redo loop: https://www.postgresql.org/message-id/20200228.160100.2210969269596489579.horikyota.ntt%40gmail.com On top of my mind, any solution I can think of needs to add more information to XLogReaderState, where we'd either track the type of error that happened close to errormsg_buf which is where these errors are tracked, but any of that cannot be backpatched, unfortunately. Comments? -- Michael
Вложения
В списке pgsql-hackers по дате отправления: