Re: [HACKERS] WAL logging problem in 9.4.3?
От | Noah Misch |
---|---|
Тема | Re: [HACKERS] WAL logging problem in 9.4.3? |
Дата | |
Msg-id | 20200330044101.GA2324620@rfd.leadboat.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] WAL logging problem in 9.4.3? (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Ответы |
Re: [HACKERS] WAL logging problem in 9.4.3?
Re: [HACKERS] WAL logging problem in 9.4.3? |
Список | pgsql-hackers |
I think attached v41nm is ready for commit. Would anyone like to vote against back-patching this? It's hard to justify lack of back-patch for a data-loss bug, but this is atypically invasive. (I'm repeating the question, since some folks missed my 2020-02-18 question.) Otherwise, I'll push this on Saturday. On Mon, Mar 23, 2020 at 05:20:27PM +0900, Kyotaro Horiguchi wrote: > At Sat, 21 Mar 2020 15:49:20 -0700, Noah Misch <noah@leadboat.com> wrote in > > The proximate cause is the RelFileNodeSkippingWAL() call that we added to > > MarkBufferDirtyHint(). MarkBufferDirtyHint() runs in parallel workers, but > > parallel workers have zeroes for pendingSyncHash and rd_*Subid. > > Kyotaro, can you look through the affected code and propose a strategy for > > good coexistence of parallel query with the WAL skipping mechanism? > > Bi-directional communication between leader and workers is too-much. > It wouldn't be acceptable to inhibit the problematic operations on > workers such like heap-prune or btree pin removal. If we do pending > syncs just before worker start, it won't fix the issue. > > The attached patch passes a list of pending-sync relfilenodes at > worker start. If you were to issue pending syncs and also cease skipping WAL for affected relations, that would fix the issue. Your design is better, though. I made two notable changes: - The patch was issuing syncs or FPIs every time a parallel worker exited. I changed it to skip most of smgrDoPendingSyncs() in parallel workers, like AtEOXact_RelationMap() does. - PARALLEL_KEY_PENDING_SYNCS is most similar to PARALLEL_KEY_REINDEX_STATE and PARALLEL_KEY_COMBO_CID. parallel.c, not execParallel.c, owns those. I moved PARALLEL_KEY_PENDING_SYNCS to parallel.c, which also called for style changes in the associated storage.c functions. Since pendingSyncHash is always NULL under XLogIsNeeded(), I also removed some XLogIsNeeded() tests that immediately preceded !pendingSyncHash tests.
Вложения
В списке pgsql-hackers по дате отправления: