STRATEGY=WAL_LOG missing checkpoint interlocks and sync
От | Noah Misch |
---|---|
Тема | STRATEGY=WAL_LOG missing checkpoint interlocks and sync |
Дата | |
Msg-id | 20240130195003.0a.nmisch@google.com обсуждение исходный текст |
Ответы |
Re: STRATEGY=WAL_LOG missing checkpoint interlocks and sync
|
Список | pgsql-bugs |
(1) CreateDirAndVersionFile() uses a code sequence: XLogInsert(), XLogFlush(), write(PG_VERSION). This is missing the interlock with checkpoints, which pd_lsn usually secures. This breaks when one takes a base backup between the XLogInsert() and the write(). The transam/README "action first and then write a WAL entry" strategy works well. createdb_failure_callback() will remove the whole directory on failure. (2) RelationMapCopy() reasons that it doesn't need RelationMappingLock. CheckPointRelationMap() relies on RelationMappingLock to avoid completing a checkpoint between the WAL and the sync. A base backup taken in the middle of write_relmap_file() has no relmap after recovery. (3) CreateDirAndVersionFile() populates the PG_VERSION file without syncing it. An OS crash after the next checkpoint may leave PG_VERSION empty or missing. Let's fsync the file immediately. I used LazyFS to confirm the defect and fix. Though I was skeptical about adding a wait event in back branches, I plan to do so anyway. Commit 368ffde got away with it, and we do have this pattern of the sync event being separate from the write event. I'm also attaching a test for (1) and (2), but I plan not to commit it. If you want to see the bugs in action, you might find it informative. I abandoned it when this complication felt like it would take awhile to resolve: + # FIXME While this is an effective test of XLOG_DBASE_CREATE_WAL_LOG, the + # XLOG_RELMAP_UPDATE makes the backup's checkpoint hang waiting for + # RelationMappingLock. To have a test that both fails with the bug and + # passes with its fix, we'd need a procedure like this: + # + # while (backup client still not done) + # { + # run backup client until it's waiting on a lock CREATE DATABASE holds; + # unpause CREATE DATABASE and re-pause it after its next lock release; + # } As a generalization of that test, it would be great to have one that tries a base backup after every XLogInsert() of a CREATE DATABASE or even every XLogInsert() of a src/test/regress run. Reintroducing (1) or (2) isn't too likely, but some part of the system could have or acquire similar bugs. Thanks, nm
Вложения
В списке pgsql-bugs по дате отправления: