Re: Race between KeepFileRestoredFromArchive() and restartpoint
От | David Steele |
---|---|
Тема | Re: Race between KeepFileRestoredFromArchive() and restartpoint |
Дата | |
Msg-id | e1f1fb79-f668-0e7c-9841-763a023a7187@pgmasters.net обсуждение исходный текст |
Ответ на | Re: Race between KeepFileRestoredFromArchive() and restartpoint (Noah Misch <noah@leadboat.com>) |
Ответы |
Re: Race between KeepFileRestoredFromArchive() and restartpoint
|
Список | pgsql-hackers |
On 7/31/22 02:17, Noah Misch wrote: > On Tue, Jul 26, 2022 at 07:21:29AM -0400, David Steele wrote: >> On 6/19/21 16:39, Noah Misch wrote: >>> On Tue, Feb 02, 2021 at 07:14:16AM -0800, Noah Misch wrote: >>>> Recycling and preallocation are wasteful during archive recovery, because >>>> KeepFileRestoredFromArchive() unlinks every entry in its path. I propose to >>>> fix the race by adding an XLogCtl flag indicating which regime currently owns >>>> the right to add long-term pg_wal directory entries. In the archive recovery >>>> regime, the checkpointer will not preallocate and will unlink old segments >>>> instead of recycling them (like wal_recycle=off). XLogFileInit() will fail. >>> >>> Here's the implementation. Patches 1-4 suffice to stop the user-visible >>> ERROR. Patch 5 avoids a spurious LOG-level message and wasted filesystem >>> writes, and it provides some future-proofing. >>> >>> I was tempted to (but did not) just remove preallocation. Creating one file >>> per checkpoint seems tiny relative to the max_wal_size=1GB default, so I >>> expect it's hard to isolate any benefit. Under the old checkpoint_segments=3 >>> default, a preallocated segment covered a respectable third of the next >>> checkpoint. Before commit 63653f7 (2002), preallocation created more files. >> >> This also seems like it would fix the link issues we are seeing in [1]. >> >> I wonder if that would make it worth a back patch? > > Perhaps. It's sad to have multiple people deep-diving into something fixed on > HEAD. On the other hand, I'm not eager to spend risk-of-backpatch points on > this. One alternative would be adding an errhint like "This is known to > happen occasionally during archive recovery, where it is harmless." That has > an unpolished look, but it's low-risk and may avoid deep-dive efforts. I think in this case a HINT might be sufficient to at least keep people from wasting time tracking down a problem that has already been fixed. However, there is another issue [1] that might argue for a back patch if this patch (as I believe) would fix the issue. Regards, -David [1] https://www.postgresql.org/message-id/CAHJZqBDxWfcd53jm0bFttuqpK3jV2YKWx%3D4W7KxNB4zzt%2B%2BqFg%40mail.gmail.com
В списке pgsql-hackers по дате отправления: