Re: race condition when writing pg_control
От | Thomas Munro |
---|---|
Тема | Re: race condition when writing pg_control |
Дата | |
Msg-id | CA+hUKGJ+rud16GvO1Gg9V0P26tz7Rvz3cYVfW144b8=8ENtJ0g@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: race condition when writing pg_control (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: race condition when writing pg_control
Re: race condition when writing pg_control |
Список | pgsql-hackers |
On Tue, May 5, 2020 at 9:51 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Tue, May 5, 2020 at 5:53 AM Bossart, Nathan <bossartn@amazon.com> wrote: > > I believe I've discovered a race condition between the startup and > > checkpointer processes that can cause a CRC mismatch in the pg_control > > file. If a cluster crashes at the right time, the following error > > appears when you attempt to restart it: > > > > FATAL: incorrect checksum in control file > > > > This appears to be caused by some code paths in xlog_redo() that > > update ControlFile without taking the ControlFileLock. The attached > > patch seems to be sufficient to prevent the CRC mismatch in the > > control file, but perhaps this is a symptom of a bigger problem with > > concurrent modifications of ControlFile->checkPointCopy.nextFullXid. > > This does indeed look pretty dodgy. CreateRestartPoint() running in > the checkpointer does UpdateControlFile() to compute a checksum and > write it out, but xlog_redo() processing > XLOG_CHECKPOINT_{ONLINE,SHUTDOWN} modifies that data without > interlocking. It looks like the ancestors of that line were there > since 35af5422f64 (2006), but back then RecoveryRestartPoint() ran > UpdateControLFile() directly in the startup process (immediately after > that update), so no interlocking problem. Then in cdd46c76548 (2009), > RecoveryRestartPoint() was split up so that CreateRestartPoint() ran > in another process. Here's a version with a commit message added. I'll push this to all releases in a day or two if there are no objections.
Вложения
В списке pgsql-hackers по дате отправления: