Race between KeepFileRestoredFromArchive() and restartpoint
От | Noah Misch |
---|---|
Тема | Race between KeepFileRestoredFromArchive() and restartpoint |
Дата | |
Msg-id | 20210202151416.GB3304930@rfd.leadboat.com обсуждение исходный текст |
Ответы |
Re: Race between KeepFileRestoredFromArchive() and restartpoint
|
Список | pgsql-hackers |
A race with KeepFileRestoredFromArchive() can cause a restartpoint to fail, as seen once on the buildfarm[1]. The attached patch adds a test case; it applies atop the "stop events" patch[2]. We have two systems for adding long-term pg_wal directory entries. KeepFileRestoredFromArchive() adds them during archive recovery, while InstallXLogFileSegment() does so at all times. Unfortunately, InstallXLogFileSegment() happens throughout archive recovery, via the checkpointer recycling segments and calling PreallocXlogFiles(). Multiple processes can run InstallXLogFileSegment(), which uses ControlFileLock to represent the authority to modify the directory listing of pg_wal. KeepFileRestoredFromArchive() just assumes it controls pg_wal. Recycling and preallocation are wasteful during archive recovery, because KeepFileRestoredFromArchive() unlinks every entry in its path. I propose to fix the race by adding an XLogCtl flag indicating which regime currently owns the right to add long-term pg_wal directory entries. In the archive recovery regime, the checkpointer will not preallocate and will unlink old segments instead of recycling them (like wal_recycle=off). XLogFileInit() will fail. Notable alternatives: - Release ControlFileLock at the end of XLogFileInit(), not at the end of InstallXLogFileSegment(). Add ControlFileLock acquisition to KeepFileRestoredFromArchive(). This provides adequate mutual exclusion, but XLogFileInit() could return a file descriptor for an unlinked file. That's fine for PreallocXlogFiles(), but it feels wrong. - During restartpoints, never preallocate or recycle segments. (Just delete obsolete WAL.) By denying those benefits, this presumably makes streaming recovery less efficient. - Make KeepFileRestoredFromArchive() call XLogFileInit() to open a segment, then copy bytes. This is simple, but it multiplies I/O. That might be negligible on account of caching, or it might not be. A variant, incurring extra fsyncs, would be to use durable_rename() to replace the segment we get from XLogFileInit(). - Make KeepFileRestoredFromArchive() rename without first unlinking. This avoids checkpoint failure, but a race could trigger noise from the LOG message in InstallXLogFileSegment -> durable_rename_excl. Does anyone prefer some alternative? It's probably not worth back-patching anything for a restartpoint failure this rare, because most restartpoint outcomes are not user-visible. Thanks, nm [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2020-10-05%2023%3A02%3A17 [2] https://postgr.es/m/CAPpHfdtSEOHX8dSk9Qp%2BZ%2B%2Bi4BGQoffKip6JDWngEA%2Bg7Z-XmQ%40mail.gmail.com
Вложения
В списке pgsql-hackers по дате отправления: