Обсуждение: hio.c does visibilitymap_pin()/IO while holding buffer lock
Hi, Starting with commit 7db0cd2145f2bce84cac92402e205e4d2b045bf2 Author: Tomas Vondra <tomas.vondra@postgresql.org> Date: 2021-01-17 22:11:39 +0100 Set PD_ALL_VISIBLE and visibility map bits in COPY FREEZE RelationGetBufferForTuple does /* * The page is empty, pin vmbuffer to set all_frozen bit. */ if (options & HEAP_INSERT_FROZEN) { Assert(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0); visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer); } while holding a buffer lock. visibilitymap_pin() reads pages, if vmbuffer doesn't already point to the right block. The lock ordering rules are to lock VM pages *before* locking heap pages. I think the reason this hasn't yet bitten us badly, is that INSERT_FROZEN effectively requires that the relation is access exclusive locked. There shouldn't be other backends locking multiple buffers in the relation (bgwriter / checkpointer can lock a single buffer at a time, but that's it). I see roughly two ways forward: 1) We add a comment explaining why it's safe to violate lock ordering rules in this one situation 2) Change relevant code so that we only return a valid vmbuffer if we could do so without blocking / IO and, obviously, skip updating the VM if we couldn't get the buffer. Greetings, Andres Freund
On 3/25/23 03:57, Andres Freund wrote: > Hi, > > Starting with > > commit 7db0cd2145f2bce84cac92402e205e4d2b045bf2 > Author: Tomas Vondra <tomas.vondra@postgresql.org> > Date: 2021-01-17 22:11:39 +0100 > > Set PD_ALL_VISIBLE and visibility map bits in COPY FREEZE > That's a bummer :-( > RelationGetBufferForTuple does > > /* > * The page is empty, pin vmbuffer to set all_frozen bit. > */ > if (options & HEAP_INSERT_FROZEN) > { > Assert(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0); > visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer); > } > > while holding a buffer lock. visibilitymap_pin() reads pages, if vmbuffer > doesn't already point to the right block. > > > The lock ordering rules are to lock VM pages *before* locking heap pages. > > > I think the reason this hasn't yet bitten us badly, is that INSERT_FROZEN > effectively requires that the relation is access exclusive locked. There > shouldn't be other backends locking multiple buffers in the relation (bgwriter > / checkpointer can lock a single buffer at a time, but that's it). > Right. Still, it seems a bit fragile ... > > I see roughly two ways forward: > > 1) We add a comment explaining why it's safe to violate lock ordering rules in > this one situation > Possible, although I feel uneasy about just documenting a broken rule. Would be better to maintain the locking order. > 2) Change relevant code so that we only return a valid vmbuffer if we could do > so without blocking / IO and, obviously, skip updating the VM if we > couldn't get the buffer. > I don't recall the exact details about the vm locking/pinning, but can't we just ensure we actually follow the proper locking order? I mean, this only deals with new pages, requested at line ~624: buffer = ReadBufferBI(relation, P_NEW, RBM_ZERO_AND_LOCK, bistate); Can't we ensure we actually lock the vm buffer too in ReadBufferBI, before calling ReadBufferExtended? Or am I confused and that's not possible for some reason? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2023-03-25 14:34:25 +0100, Tomas Vondra wrote: > On 3/25/23 03:57, Andres Freund wrote: > > 2) Change relevant code so that we only return a valid vmbuffer if we could do > > so without blocking / IO and, obviously, skip updating the VM if we > > couldn't get the buffer. > > > > I don't recall the exact details about the vm locking/pinning, but can't > we just ensure we actually follow the proper locking order? I mean, this > only deals with new pages, requested at line ~624: > > buffer = ReadBufferBI(relation, P_NEW, RBM_ZERO_AND_LOCK, bistate); > > Can't we ensure we actually lock the vm buffer too in ReadBufferBI, > before calling ReadBufferExtended? Or am I confused and that's not > possible for some reason? Note that this is using P_NEW. I.e. we don't know the buffer location yet. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2023-03-25 14:34:25 +0100, Tomas Vondra wrote: >> Can't we ensure we actually lock the vm buffer too in ReadBufferBI, >> before calling ReadBufferExtended? Or am I confused and that's not >> possible for some reason? > Note that this is using P_NEW. I.e. we don't know the buffer location yet. Maybe the relation-extension logic needs to include the ability to get the relevant vm page? regards, tom lane
Hi, On 2023-03-25 12:57:17 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2023-03-25 14:34:25 +0100, Tomas Vondra wrote: > >> Can't we ensure we actually lock the vm buffer too in ReadBufferBI, > >> before calling ReadBufferExtended? Or am I confused and that's not > >> possible for some reason? > > > Note that this is using P_NEW. I.e. we don't know the buffer location yet. > > Maybe the relation-extension logic needs to include the ability to get > the relevant vm page? I don't see how that's easily possible with the current lock ordering rules. At least without giving up using RBM_ZERO_AND_LOCK for extending or stuffing even more things to happen with the the extension lock held, which I don't think we want to. I don't think INSERT_FROZEN is worth that price. Perhaps we should just try to heuristically pin the right VM buffer before trying to extend? Thinking more about this, I think there's no inherent deadlock danger with reading the VM while holding a buffer lock, "just" an efficiency issue. If we avoid needing to do IO nearly all the time, by trying to pin the right page before extending, it's probably good enough. Greetings, Andres Freund
On Sat, Mar 25, 2023 at 11:17 AM Andres Freund <andres@anarazel.de> wrote: > Thinking more about this, I think there's no inherent deadlock danger with > reading the VM while holding a buffer lock, "just" an efficiency issue. If we > avoid needing to do IO nearly all the time, by trying to pin the right page > before extending, it's probably good enough. Uh, it was quite possible for lazy_vacuum_heap_page() to do that up until very recently (it was fixed by my commit 980ae17310). Since it would call visibilitymap_get_status() with an exclusive buffer lock on the heap page, which sometimes had to change the VM page. It potentially did an IO at that point, to read in a later VM page to the caller's initially-pinned one. In other words, up until recently there was a strange idiom used by lazy_vacuum_heap_page/lazy_vacuum_heap_rel, where we'd abuse visibilitymap_get_status() as a replacement for calling visibilitymap_pin() right before acquire a heap page buffer lock. But now the second heap pass does it the same way as the first heap pass. (Even still, I have no reason to believe that the previous approach was all that bad; it was just a bit ugly.) There are still a few visibilitymap_get_status()-with-buffer-lock calls in vacuumlazy.c, FWIW, but they don't run the risk of needing to change the vmbuffer we have pinned with the heap page buffer lock held. -- Peter Geoghegan
Hi, On 2023-03-25 11:17:07 -0700, Andres Freund wrote: > I don't see how that's easily possible with the current lock ordering > rules. At least without giving up using RBM_ZERO_AND_LOCK for extending or > stuffing even more things to happen with the the extension lock held, which I > don't think we want to. I don't think INSERT_FROZEN is worth that price. I think I might have been thinking of this too narrowly. It's extremely unlikely that another backend would discover the page. And we can use visibilitymap_pin_ok() to amortize the cost to almost nothing - there's a lot of bits in an 8k block... Here's a draft patch. The bulk relation patch I am polishing has a similar issue, except that there the problem is inserting into the FSM, instead of pinning a VM pageabout the FSM. Hence the patch above makes the infrastructure a bit more general than required for the HEAP_INSERT_FROZEN case alone (where we currently shouldn't ever have a valid otherBuffer). The way the parameter ordering for GetVisibilityMapPins() works make it somewhat unwieldy - see e.g the existing if (otherBuffer == InvalidBuffer || targetBlock <= otherBlock) GetVisibilityMapPins(relation, buffer, otherBuffer, targetBlock, otherBlock, vmbuffer, vmbuffer_other); else GetVisibilityMapPins(relation, otherBuffer, buffer, otherBlock, targetBlock, vmbuffer_other, vmbuffer); Which I now duplicated in yet another place. Perhaps we just ought to switch buffer1/block1 with buffer2/block2 inside GetVisibilityMapPins(), to avoid duplicating that code elsewhere? Because we now track whether the *targetBuffer* was ever unlocked, we can be a bit more narrow about the possibility of there not being sufficient space. The patch could be narrowed for backpatching. But as there's likely no practical problem at this point, I wouldn't want to backpatch anyway? Greetings, Andres Freund
Вложения
Hi, On 2023-03-28 18:21:02 -0700, Andres Freund wrote: > Here's a draft patch. Attached is v2, with a stupid bug fixed and a bit of comment / pgindent polish. Greetings, Andres Freund
Вложения
Hi, On 2023-03-28 19:17:21 -0700, Andres Freund wrote: > On 2023-03-28 18:21:02 -0700, Andres Freund wrote: > > Here's a draft patch. > > Attached is v2, with a stupid bug fixed and a bit of comment / pgindent > polish. I'd welcome some review (Tomas?), but otherwise I'm planning to push ahead with this. I'm still debating with myself whether this commit (or a prerequisite commit) should move logic dealing with the buffer ordering into GetVisibilityMapPins(), so we don't need two blocks like this: if (otherBuffer == InvalidBuffer || targetBlock <= otherBlock) GetVisibilityMapPins(relation, buffer, otherBuffer, targetBlock, otherBlock, vmbuffer, vmbuffer_other); else GetVisibilityMapPins(relation, otherBuffer, buffer, otherBlock, targetBlock, vmbuffer_other, vmbuffer); ... if (otherBuffer != InvalidBuffer) { if (GetVisibilityMapPins(relation, otherBuffer, buffer, otherBlock, targetBlock, vmbuffer_other, vmbuffer)) unlockedTargetBuffer = true; } else { if (GetVisibilityMapPins(relation, buffer, InvalidBuffer, targetBlock, InvalidBlockNumber, vmbuffer, InvalidBuffer)) unlockedTargetBuffer = true; } } Greetings, Andres Freund
On 4/3/23 00:40, Andres Freund wrote: > Hi, > > On 2023-03-28 19:17:21 -0700, Andres Freund wrote: >> On 2023-03-28 18:21:02 -0700, Andres Freund wrote: >>> Here's a draft patch. >> >> Attached is v2, with a stupid bug fixed and a bit of comment / pgindent >> polish. > > I'd welcome some review (Tomas?), but otherwise I'm planning to push ahead > with this. > I guess the 0001 part was already pushed, so I should be looking only at 0002, correct? I think 0002 makes RelationGetBufferForTuple() harder to understand. I'm not saying it's incorrect, but I find it hard to reason about the new combinations of conditions :-( I mean, it only had this condition: if (otherBuffer != InvalidBuffer) { ... } but now it have if (unlockedTargetBuffer) { ... } else if (otherBuffer != InvalidBuffer) { ... } if (unlockedTargetBuffer || otherBuffer != InvalidBuffer) { ... } Not sure how to improve that :-/ but not exactly trivial to figure out what's going to happen. Maybe this * If we unlocked the target buffer above, it's unlikely, but possible, * that another backend used space on this page. might say what we're going to do in this case. I mean - I understand some backend may use space in unlocked page, but what does that mean for this code? What is it going to do? (The same comment talks about the next condition in much more detail, for example.) Also, isn't it a bit strange the free space check now happens outside any if condition? It used to happen in the if (otherBuffer != InvalidBuffer) { ... } block, but now it happens outside. > I'm still debating with myself whether this commit (or a prerequisite commit) > should move logic dealing with the buffer ordering into > GetVisibilityMapPins(), so we don't need two blocks like this: > > > if (otherBuffer == InvalidBuffer || targetBlock <= otherBlock) > GetVisibilityMapPins(relation, buffer, otherBuffer, > targetBlock, otherBlock, vmbuffer, > vmbuffer_other); > else > GetVisibilityMapPins(relation, otherBuffer, buffer, > otherBlock, targetBlock, vmbuffer_other, > vmbuffer); > ... > > if (otherBuffer != InvalidBuffer) > { > if (GetVisibilityMapPins(relation, otherBuffer, buffer, > otherBlock, targetBlock, vmbuffer_other, > vmbuffer)) > unlockedTargetBuffer = true; > } > else > { > if (GetVisibilityMapPins(relation, buffer, InvalidBuffer, > targetBlock, InvalidBlockNumber, > vmbuffer, InvalidBuffer)) > unlockedTargetBuffer = true; > } > } > Yeah. I haven't tried, but I imagine it'd make RelationGetBufferForTuple a little bit. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2023-04-03 14:25:59 +0200, Tomas Vondra wrote: > On 4/3/23 00:40, Andres Freund wrote: > > Hi, > > > > On 2023-03-28 19:17:21 -0700, Andres Freund wrote: > >> On 2023-03-28 18:21:02 -0700, Andres Freund wrote: > >>> Here's a draft patch. > >> > >> Attached is v2, with a stupid bug fixed and a bit of comment / pgindent > >> polish. > > > > I'd welcome some review (Tomas?), but otherwise I'm planning to push ahead > > with this. > > > > I guess the 0001 part was already pushed, so I should be looking only at > 0002, correct? Yes. > I think 0002 makes RelationGetBufferForTuple() harder to understand. I'm > not saying it's incorrect, but I find it hard to reason about the new > combinations of conditions :-( > I mean, it only had this condition: > > if (otherBuffer != InvalidBuffer) > { > ... > } > > but now it have > > if (unlockedTargetBuffer) > { > ... > } > else if (otherBuffer != InvalidBuffer) > { > ... > } > > if (unlockedTargetBuffer || otherBuffer != InvalidBuffer) > { > ... > } > > Not sure how to improve that :-/ but not exactly trivial to figure out > what's going to happen. It's not great, I agree. I tried to make it easier to read in this version by a) changing GetVisibilityMapPins() as I proposed b) added a new variable "recheckVmPins", that gets set in if (unlockedTargetBuffer) and if (otherBuffer != InvalidBuffer) c) reformulated comments > Maybe this > > * If we unlocked the target buffer above, it's unlikely, but possible, > * that another backend used space on this page. > > might say what we're going to do in this case. I mean - I understand > some backend may use space in unlocked page, but what does that mean for > this code? What is it going to do? (The same comment talks about the > next condition in much more detail, for example.) There's a comment about that detail further below. But you're right, it wasn't clear as-is. How about now? > Also, isn't it a bit strange the free space check now happens outside > any if condition? It used to happen in the > > if (otherBuffer != InvalidBuffer) > { > ... > } > > block, but now it happens outside. Well, the alternative is to repeat it in the two branches, which doesn't seem great either. Particularly because there'll be a third branch after the bulk extension patch. Greetings, Andres Freund
Вложения
Hi, On 2023-04-03 12:00:30 -0700, Andres Freund wrote: > It's not great, I agree. I tried to make it easier to read in this version by > a) changing GetVisibilityMapPins() as I proposed > b) added a new variable "recheckVmPins", that gets set in > if (unlockedTargetBuffer) > and > if (otherBuffer != InvalidBuffer) > c) reformulated comments I pushed this version a couple hours ago, after a bit more polishing. Greetings, Andres Freund