Обсуждение: STRATEGY=WAL_LOG missing checkpoint interlocks and sync

Поиск
Список
Период
Сортировка

STRATEGY=WAL_LOG missing checkpoint interlocks and sync

От
Noah Misch
Дата:
(1) CreateDirAndVersionFile() uses a code sequence: XLogInsert(), XLogFlush(),
write(PG_VERSION).  This is missing the interlock with checkpoints, which
pd_lsn usually secures.  This breaks when one takes a base backup between the
XLogInsert() and the write().  The transam/README "action first and then write
a WAL entry" strategy works well.  createdb_failure_callback() will remove the
whole directory on failure.

(2) RelationMapCopy() reasons that it doesn't need RelationMappingLock.
CheckPointRelationMap() relies on RelationMappingLock to avoid completing a
checkpoint between the WAL and the sync.  A base backup taken in the middle of
write_relmap_file() has no relmap after recovery.

(3) CreateDirAndVersionFile() populates the PG_VERSION file without syncing
it.  An OS crash after the next checkpoint may leave PG_VERSION empty or
missing.  Let's fsync the file immediately.  I used LazyFS to confirm the
defect and fix.  Though I was skeptical about adding a wait event in back
branches, I plan to do so anyway.  Commit 368ffde got away with it, and we do
have this pattern of the sync event being separate from the write event.


I'm also attaching a test for (1) and (2), but I plan not to commit it.  If
you want to see the bugs in action, you might find it informative.  I
abandoned it when this complication felt like it would take awhile to resolve:

+    # FIXME While this is an effective test of XLOG_DBASE_CREATE_WAL_LOG, the
+    # XLOG_RELMAP_UPDATE makes the backup's checkpoint hang waiting for
+    # RelationMappingLock.  To have a test that both fails with the bug and
+    # passes with its fix, we'd need a procedure like this:
+    #
+    # while (backup client still not done)
+    # {
+    #    run backup client until it's waiting on a lock CREATE DATABASE holds;
+    #    unpause CREATE DATABASE and re-pause it after its next lock release;
+    # }

As a generalization of that test, it would be great to have one that tries a
base backup after every XLogInsert() of a CREATE DATABASE or even every
XLogInsert() of a src/test/regress run.  Reintroducing (1) or (2) isn't too
likely, but some part of the system could have or acquire similar bugs.

Thanks,
nm

Вложения

Re: STRATEGY=WAL_LOG missing checkpoint interlocks and sync

От
Noah Misch
Дата:
On Tue, Jan 30, 2024 at 11:50:03AM -0800, Noah Misch wrote:
> (1) CreateDirAndVersionFile() uses a code sequence: XLogInsert(), XLogFlush(),
> write(PG_VERSION).  This is missing the interlock with checkpoints, which
> pd_lsn usually secures.  This breaks when one takes a base backup between the
> XLogInsert() and the write().  The transam/README "action first and then write
> a WAL entry" strategy works well.  createdb_failure_callback() will remove the
> whole directory on failure.
> 
> (2) RelationMapCopy() reasons that it doesn't need RelationMappingLock.
> CheckPointRelationMap() relies on RelationMappingLock to avoid completing a
> checkpoint between the WAL and the sync.  A base backup taken in the middle of
> write_relmap_file() has no relmap after recovery.
> 
> (3) CreateDirAndVersionFile() populates the PG_VERSION file without syncing
> it.  An OS crash after the next checkpoint may leave PG_VERSION empty or
> missing.  Let's fsync the file immediately.  I used LazyFS to confirm the
> defect and fix.  Though I was skeptical about adding a wait event in back
> branches, I plan to do so anyway.  Commit 368ffde got away with it, and we do
> have this pattern of the sync event being separate from the write event.

Pushed at commit 0b6517a.