Обсуждение: hio.c does visibilitymap_pin()/IO while holding buffer lock

Поиск
Список
Период
Сортировка

hio.c does visibilitymap_pin()/IO while holding buffer lock

От
Andres Freund
Дата:
Hi,

Starting with

commit 7db0cd2145f2bce84cac92402e205e4d2b045bf2
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Date:   2021-01-17 22:11:39 +0100

    Set PD_ALL_VISIBLE and visibility map bits in COPY FREEZE

RelationGetBufferForTuple does

    /*
     * The page is empty, pin vmbuffer to set all_frozen bit.
     */
    if (options & HEAP_INSERT_FROZEN)
    {
        Assert(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0);
        visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
    }

while holding a buffer lock. visibilitymap_pin() reads pages, if vmbuffer
doesn't already point to the right block.


The lock ordering rules are to lock VM pages *before* locking heap pages.


I think the reason this hasn't yet bitten us badly, is that INSERT_FROZEN
effectively requires that the relation is access exclusive locked.  There
shouldn't be other backends locking multiple buffers in the relation (bgwriter
/ checkpointer can lock a single buffer at a time, but that's it).


I see roughly two ways forward:

1) We add a comment explaining why it's safe to violate lock ordering rules in
   this one situation

2) Change relevant code so that we only return a valid vmbuffer if we could do
   so without blocking / IO and, obviously, skip updating the VM if we
   couldn't get the buffer.

Greetings,

Andres Freund



Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

От
Tomas Vondra
Дата:
On 3/25/23 03:57, Andres Freund wrote:
> Hi,
> 
> Starting with
> 
> commit 7db0cd2145f2bce84cac92402e205e4d2b045bf2
> Author: Tomas Vondra <tomas.vondra@postgresql.org>
> Date:   2021-01-17 22:11:39 +0100
> 
>     Set PD_ALL_VISIBLE and visibility map bits in COPY FREEZE
> 

That's a bummer :-(

> RelationGetBufferForTuple does
> 
>     /*
>      * The page is empty, pin vmbuffer to set all_frozen bit.
>      */
>     if (options & HEAP_INSERT_FROZEN)
>     {
>         Assert(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0);
>         visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
>     }
> 
> while holding a buffer lock. visibilitymap_pin() reads pages, if vmbuffer
> doesn't already point to the right block.
> 
> 
> The lock ordering rules are to lock VM pages *before* locking heap pages.
> 
> 
> I think the reason this hasn't yet bitten us badly, is that INSERT_FROZEN
> effectively requires that the relation is access exclusive locked.  There
> shouldn't be other backends locking multiple buffers in the relation (bgwriter
> / checkpointer can lock a single buffer at a time, but that's it).
>

Right. Still, it seems a bit fragile ...

> 
> I see roughly two ways forward:
> 
> 1) We add a comment explaining why it's safe to violate lock ordering rules in
>    this one situation
> 

Possible, although I feel uneasy about just documenting a broken rule.
Would be better to maintain the locking order.

> 2) Change relevant code so that we only return a valid vmbuffer if we could do
>    so without blocking / IO and, obviously, skip updating the VM if we
>    couldn't get the buffer.
> 

I don't recall the exact details about the vm locking/pinning, but can't
we just ensure we actually follow the proper locking order? I mean, this
only deals with new pages, requested at line ~624:

  buffer = ReadBufferBI(relation, P_NEW, RBM_ZERO_AND_LOCK, bistate);

Can't we ensure we actually lock the vm buffer too in ReadBufferBI,
before calling ReadBufferExtended? Or am I confused and that's not
possible for some reason?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

От
Andres Freund
Дата:
Hi,

On 2023-03-25 14:34:25 +0100, Tomas Vondra wrote:
> On 3/25/23 03:57, Andres Freund wrote:
> > 2) Change relevant code so that we only return a valid vmbuffer if we could do
> >    so without blocking / IO and, obviously, skip updating the VM if we
> >    couldn't get the buffer.
> > 
> 
> I don't recall the exact details about the vm locking/pinning, but can't
> we just ensure we actually follow the proper locking order? I mean, this
> only deals with new pages, requested at line ~624:
> 
>   buffer = ReadBufferBI(relation, P_NEW, RBM_ZERO_AND_LOCK, bistate);
> 
> Can't we ensure we actually lock the vm buffer too in ReadBufferBI,
> before calling ReadBufferExtended? Or am I confused and that's not
> possible for some reason?

Note that this is using P_NEW. I.e. we don't know the buffer location yet.

Greetings,

Andres Freund



Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2023-03-25 14:34:25 +0100, Tomas Vondra wrote:
>> Can't we ensure we actually lock the vm buffer too in ReadBufferBI,
>> before calling ReadBufferExtended? Or am I confused and that's not
>> possible for some reason?

> Note that this is using P_NEW. I.e. we don't know the buffer location yet.

Maybe the relation-extension logic needs to include the ability to get
the relevant vm page?

            regards, tom lane



Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

От
Andres Freund
Дата:
Hi,

On 2023-03-25 12:57:17 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2023-03-25 14:34:25 +0100, Tomas Vondra wrote:
> >> Can't we ensure we actually lock the vm buffer too in ReadBufferBI,
> >> before calling ReadBufferExtended? Or am I confused and that's not
> >> possible for some reason?
> 
> > Note that this is using P_NEW. I.e. we don't know the buffer location yet.
> 
> Maybe the relation-extension logic needs to include the ability to get
> the relevant vm page?

I don't see how that's easily possible with the current lock ordering
rules. At least without giving up using RBM_ZERO_AND_LOCK for extending or
stuffing even more things to happen with the the extension lock held, which I
don't think we want to. I don't think INSERT_FROZEN is worth that price.

Perhaps we should just try to heuristically pin the right VM buffer before
trying to extend?

Thinking more about this, I think there's no inherent deadlock danger with
reading the VM while holding a buffer lock, "just" an efficiency issue. If we
avoid needing to do IO nearly all the time, by trying to pin the right page
before extending, it's probably good enough.

Greetings,

Andres Freund



Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

От
Peter Geoghegan
Дата:
On Sat, Mar 25, 2023 at 11:17 AM Andres Freund <andres@anarazel.de> wrote:
> Thinking more about this, I think there's no inherent deadlock danger with
> reading the VM while holding a buffer lock, "just" an efficiency issue. If we
> avoid needing to do IO nearly all the time, by trying to pin the right page
> before extending, it's probably good enough.

Uh, it was quite possible for lazy_vacuum_heap_page() to do that up
until very recently (it was fixed by my commit 980ae17310). Since it
would call visibilitymap_get_status() with an exclusive buffer lock on
the heap page, which sometimes had to change the VM page. It
potentially did an IO at that point, to read in a later VM page to the
caller's initially-pinned one.

In other words, up until recently there was a strange idiom used by
lazy_vacuum_heap_page/lazy_vacuum_heap_rel, where we'd abuse
visibilitymap_get_status() as a replacement for calling
visibilitymap_pin() right before acquire a heap page buffer lock. But
now the second heap pass does it the same way as the first heap pass.
(Even still, I have no reason to believe that the previous approach
was all that bad; it was just a bit ugly.)

There are still a few visibilitymap_get_status()-with-buffer-lock
calls in vacuumlazy.c, FWIW, but they don't run the risk of needing to
change the vmbuffer we have pinned with the heap page buffer lock
held.

--
Peter Geoghegan



Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

От
Andres Freund
Дата:
Hi,

On 2023-03-25 11:17:07 -0700, Andres Freund wrote:
> I don't see how that's easily possible with the current lock ordering
> rules. At least without giving up using RBM_ZERO_AND_LOCK for extending or
> stuffing even more things to happen with the the extension lock held, which I
> don't think we want to. I don't think INSERT_FROZEN is worth that price.

I think I might have been thinking of this too narrowly. It's extremely
unlikely that another backend would discover the page. And we can use
visibilitymap_pin_ok() to amortize the cost to almost nothing - there's a lot
of bits in an 8k block...


Here's a draft patch.


The bulk relation patch I am polishing has a similar issue, except that there
the problem is inserting into the FSM, instead of pinning a VM pageabout the
FSM. Hence the patch above makes the infrastructure a bit more general than
required for the HEAP_INSERT_FROZEN case alone (where we currently shouldn't
ever have a valid otherBuffer).


The way the parameter ordering for GetVisibilityMapPins() works make it
somewhat unwieldy - see e.g the existing
        if (otherBuffer == InvalidBuffer || targetBlock <= otherBlock)
            GetVisibilityMapPins(relation, buffer, otherBuffer,
                                 targetBlock, otherBlock, vmbuffer,
                                 vmbuffer_other);
        else
            GetVisibilityMapPins(relation, otherBuffer, buffer,
                                 otherBlock, targetBlock, vmbuffer_other,
                                 vmbuffer);

Which I now duplicated in yet another place.

Perhaps we just ought to switch buffer1/block1 with buffer2/block2 inside
GetVisibilityMapPins(), to avoid duplicating that code elsewhere?


Because we now track whether the *targetBuffer* was ever unlocked, we can be a
bit more narrow about the possibility of there not being sufficient space.


The patch could be narrowed for backpatching. But as there's likely no
practical problem at this point, I wouldn't want to backpatch anyway?

Greetings,

Andres Freund

Вложения

Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

От
Andres Freund
Дата:
Hi,

On 2023-03-28 18:21:02 -0700, Andres Freund wrote:
> Here's a draft patch.

Attached is v2, with a stupid bug fixed and a bit of comment / pgindent
polish.

Greetings,

Andres Freund

Вложения

Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

От
Andres Freund
Дата:
Hi,

On 2023-03-28 19:17:21 -0700, Andres Freund wrote:
> On 2023-03-28 18:21:02 -0700, Andres Freund wrote:
> > Here's a draft patch.
> 
> Attached is v2, with a stupid bug fixed and a bit of comment / pgindent
> polish.

I'd welcome some review (Tomas?), but otherwise I'm planning to push ahead
with this.

I'm still debating with myself whether this commit (or a prerequisite commit)
should move logic dealing with the buffer ordering into
GetVisibilityMapPins(), so we don't need two blocks like this:


        if (otherBuffer == InvalidBuffer || targetBlock <= otherBlock)
            GetVisibilityMapPins(relation, buffer, otherBuffer,
                                 targetBlock, otherBlock, vmbuffer,
                                 vmbuffer_other);
        else
            GetVisibilityMapPins(relation, otherBuffer, buffer,
                                 otherBlock, targetBlock, vmbuffer_other,
                                 vmbuffer);
...

        if (otherBuffer != InvalidBuffer)
        {
            if (GetVisibilityMapPins(relation, otherBuffer, buffer,
                                     otherBlock, targetBlock, vmbuffer_other,
                                     vmbuffer))
                unlockedTargetBuffer = true;
        }
        else
        {
            if (GetVisibilityMapPins(relation, buffer, InvalidBuffer,
                                     targetBlock, InvalidBlockNumber,
                                     vmbuffer, InvalidBuffer))
                unlockedTargetBuffer = true;
        }
    }


Greetings,

Andres Freund



Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

От
Tomas Vondra
Дата:
On 4/3/23 00:40, Andres Freund wrote:
> Hi,
> 
> On 2023-03-28 19:17:21 -0700, Andres Freund wrote:
>> On 2023-03-28 18:21:02 -0700, Andres Freund wrote:
>>> Here's a draft patch.
>>
>> Attached is v2, with a stupid bug fixed and a bit of comment / pgindent
>> polish.
> 
> I'd welcome some review (Tomas?), but otherwise I'm planning to push ahead
> with this.
> 

I guess the 0001 part was already pushed, so I should be looking only at
0002, correct?

I think 0002 makes RelationGetBufferForTuple() harder to understand. I'm
not saying it's incorrect, but I find it hard to reason about the new
combinations of conditions :-(

I mean, it only had this condition:

    if (otherBuffer != InvalidBuffer)
    {
        ...
    }

but now it have

    if (unlockedTargetBuffer)
    {
       ...
    }
    else if (otherBuffer != InvalidBuffer)
    {
        ...
    }

    if (unlockedTargetBuffer || otherBuffer != InvalidBuffer)
    {
        ...
    }

Not sure how to improve that :-/ but not exactly trivial to figure out
what's going to happen.

Maybe this

 * If we unlocked the target buffer above, it's unlikely, but possible,
 * that another backend used space on this page.

might say what we're going to do in this case. I mean - I understand
some backend may use space in unlocked page, but what does that mean for
this code? What is it going to do? (The same comment talks about the
next condition in much more detail, for example.)

Also, isn't it a bit strange the free space check now happens outside
any if condition? It used to happen in the

    if (otherBuffer != InvalidBuffer)
    {
        ...
    }

block, but now it happens outside.

> I'm still debating with myself whether this commit (or a prerequisite commit)
> should move logic dealing with the buffer ordering into
> GetVisibilityMapPins(), so we don't need two blocks like this:
> 
> 
>         if (otherBuffer == InvalidBuffer || targetBlock <= otherBlock)
>             GetVisibilityMapPins(relation, buffer, otherBuffer,
>                                  targetBlock, otherBlock, vmbuffer,
>                                  vmbuffer_other);
>         else
>             GetVisibilityMapPins(relation, otherBuffer, buffer,
>                                  otherBlock, targetBlock, vmbuffer_other,
>                                  vmbuffer);
> ...
> 
>         if (otherBuffer != InvalidBuffer)
>         {
>             if (GetVisibilityMapPins(relation, otherBuffer, buffer,
>                                      otherBlock, targetBlock, vmbuffer_other,
>                                      vmbuffer))
>                 unlockedTargetBuffer = true;
>         }
>         else
>         {
>             if (GetVisibilityMapPins(relation, buffer, InvalidBuffer,
>                                      targetBlock, InvalidBlockNumber,
>                                      vmbuffer, InvalidBuffer))
>                 unlockedTargetBuffer = true;
>         }
>     }
> 

Yeah. I haven't tried, but I imagine it'd make RelationGetBufferForTuple
a little bit.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

От
Andres Freund
Дата:
Hi,

On 2023-04-03 14:25:59 +0200, Tomas Vondra wrote:
> On 4/3/23 00:40, Andres Freund wrote:
> > Hi,
> >
> > On 2023-03-28 19:17:21 -0700, Andres Freund wrote:
> >> On 2023-03-28 18:21:02 -0700, Andres Freund wrote:
> >>> Here's a draft patch.
> >>
> >> Attached is v2, with a stupid bug fixed and a bit of comment / pgindent
> >> polish.
> >
> > I'd welcome some review (Tomas?), but otherwise I'm planning to push ahead
> > with this.
> >
>
> I guess the 0001 part was already pushed, so I should be looking only at
> 0002, correct?

Yes.


> I think 0002 makes RelationGetBufferForTuple() harder to understand. I'm
> not saying it's incorrect, but I find it hard to reason about the new
> combinations of conditions :-(

> I mean, it only had this condition:
>
>     if (otherBuffer != InvalidBuffer)
>     {
>         ...
>     }
>
> but now it have
>
>     if (unlockedTargetBuffer)
>     {
>        ...
>     }
>     else if (otherBuffer != InvalidBuffer)
>     {
>         ...
>     }
>
>     if (unlockedTargetBuffer || otherBuffer != InvalidBuffer)
>     {
>         ...
>     }
>
> Not sure how to improve that :-/ but not exactly trivial to figure out
> what's going to happen.

It's not great, I agree. I tried to make it easier to read in this version by
a) changing GetVisibilityMapPins() as I proposed
b) added a new variable "recheckVmPins", that gets set in
   if (unlockedTargetBuffer)
   and
   if (otherBuffer != InvalidBuffer)
c) reformulated comments


> Maybe this
>
>  * If we unlocked the target buffer above, it's unlikely, but possible,
>  * that another backend used space on this page.
>
> might say what we're going to do in this case. I mean - I understand
> some backend may use space in unlocked page, but what does that mean for
> this code? What is it going to do? (The same comment talks about the
> next condition in much more detail, for example.)

There's a comment about that detail further below. But you're right, it wasn't
clear as-is. How about now?


> Also, isn't it a bit strange the free space check now happens outside
> any if condition? It used to happen in the
>
>     if (otherBuffer != InvalidBuffer)
>     {
>         ...
>     }
>
> block, but now it happens outside.

Well, the alternative is to repeat it in the two branches, which doesn't seem
great either. Particularly because there'll be a third branch after the bulk
extension patch.

Greetings,

Andres Freund

Вложения

Re: hio.c does visibilitymap_pin()/IO while holding buffer lock

От
Andres Freund
Дата:
Hi,

On 2023-04-03 12:00:30 -0700, Andres Freund wrote:
> It's not great, I agree. I tried to make it easier to read in this version by
> a) changing GetVisibilityMapPins() as I proposed
> b) added a new variable "recheckVmPins", that gets set in
>    if (unlockedTargetBuffer)
>    and
>    if (otherBuffer != InvalidBuffer)
> c) reformulated comments

I pushed this version a couple hours ago, after a bit more polishing.

Greetings,

Andres Freund