Обсуждение: Visibility bug in tuple lock

Поиск

Список

Период

Сортировка

Visibility bug in tuple lock

От

Jasper Smit

Дата:

13 октября 2025 г., 10:48:45

Hi,

We found a bug in heap_lock_tuple(). It seems to be unsafe to follow
the t_ctid pointer of updated tuples,
 in the presence of aborted tuples. Vacuum can mark aborted tuples
LP_UNUSED, even if there are still
 visible tuples that have a t_ctid pointing to them.

We would like to address this issue in postgres. Attached is a patch
that modifies VACUUM behavior.
Could you advise us on the best way to proceed? How can we get this
patch included in postgres, or would you recommend
pursuing an alternative approach?

The following scenario will lead to an assertion in debug or to wrong
results in release.

1. (session 1) A tuple is being UPDATEd in an active transaction.
2. (session 2) Another session locks that tuple, for example, SELECT
.. FOR UPDATE.
3. heap_lock_tuple() calls HeapTupleSatisfiesUpdate(), which returns
TM_BeingModified.
   It will record the xmax and t_ctid of the tuple.
4. Since the tuple is actively being updated, we need to wait for
session 1 to complete.
5. The UPDATE of session 1 aborts, leaving the original tuple with an
aborted xmax and t_ctid pointing to a
    tuple with an aborted xmin.
6. VACUUM marks aborted tuples as unused, regardless of whether they
are still referenced by a visible tuple through t_ctid.
   As a result, the aborted tuple is marked unused.
7. An INSERT happens and the tuple is placed at the recycled location
of the aborted tuple.
8. The newly INSERTed tuple is UPDATEd.
9. heap_lock_tuple() of session 2 resumes, and will next call
heap_lock_updated_tuple() using the t_ctid which was recorded in step
3.
   Note that the code does not check the status of the transaction it
was waiting on.
   It will proceed, regardless of the transaction it was waiting on
committed or aborted.
10. heap_lock_updated_tuple() will now work on the inserted tuple of
step 7. It will see that this tuple updated,
    and thus the return value of heap_lock_updated_tuple() will be TM_Updated.
11. This will eventually lead to the assertion (result ==
TM_WouldBlock) || !(tuple->t_data->t_infomask & HEAP_XMAX_INVALID)

These concrete steps reproduce the problem:

Used version REL_18_0
Run with configuration:
io_combine_limit = '1'

session 1:
drop table if exists t;
create table t (id int);
insert into t (select generate_series(1, 452));
begin;
update t set id = 1000 where id = 1;

session 2:
gdb: b heapam.c:5033
gdb: continue
select * from t where id = 1 for update;

session 1:
abort;
select * from t;
VACUUM (TRUNCATE off);
insert into t values (453) returning ctid; -- Should be (2,1)
update t set id = 454 where id = 453 returning ctid;

session 2: (should now be at the breakpoint)
gdb: continue

We get the assertion:
(result == TM_WouldBlock) || !(tuple->t_data->t_infomask & HEAP_XMAX_INVALID).
Stacktrace included as attachment.

We have also seen the error "t_xmin %u is uncommitted in tuple (%u,%u)
to be updated in table \"%s\"" in our testing.
 We believe that this has a similar cause. However, we have not been
able to come up with reproduction steps for that assertion
specifically.

We think that this problem can be fixed in two different ways.
1. Check the commit status after waiting in heap_lock_updated_tuple(),
and don't proceed checking the updated tuple when the commit has
aborted.
or check, like in heap_get_latest_tid(), if the xmin of the updated
tuples matches the xmax of the old tuple.
2. Change vacuum, to delay marking aborted tuples dead until no
visible tuple will have a t_ctid pointing to that tuple.

For 2. we have a patch included in the e-mail. This patch alters the
visibility check during vacuum, aborted tuples will
be treated similar to RECENTLY_DEAD tuples. They are not recycled
immediately, but instead the xmin is used to decide whether
they can be marked unused.

Regards,

Jasper Smit
Oleksii Kozlov
Luc Vlaming
Spiros Agathos
Servicenow

Вложения

Re: Visibility bug in tuple lock

От

Heikki Linnakangas

Дата:

11 декабря 2025 г., 19:30:35

On 13/10/2025 10:48, Jasper Smit wrote:
> We found a bug in heap_lock_tuple(). It seems to be unsafe to follow
> the t_ctid pointer of updated tuples,
>   in the presence of aborted tuples. Vacuum can mark aborted tuples
> LP_UNUSED, even if there are still
>   visible tuples that have a t_ctid pointing to them.
> 
> We would like to address this issue in postgres. Attached is a patch
> that modifies VACUUM behavior.
> Could you advise us on the best way to proceed? How can we get this
> patch included in postgres, or would you recommend
> pursuing an alternative approach?

You're at the right place :-). Thanks for the script and the patch!

> The following scenario will lead to an assertion in debug or to wrong
> results in release.
> 
> 1. (session 1) A tuple is being UPDATEd in an active transaction.
> 2. (session 2) Another session locks that tuple, for example, SELECT
> .. FOR UPDATE.
> 3. heap_lock_tuple() calls HeapTupleSatisfiesUpdate(), which returns
> TM_BeingModified.
>     It will record the xmax and t_ctid of the tuple.
> 4. Since the tuple is actively being updated, we need to wait for
> session 1 to complete.
> 5. The UPDATE of session 1 aborts, leaving the original tuple with an
> aborted xmax and t_ctid pointing to a
>      tuple with an aborted xmin.
> 6. VACUUM marks aborted tuples as unused, regardless of whether they
> are still referenced by a visible tuple through t_ctid.
>     As a result, the aborted tuple is marked unused.
> 7. An INSERT happens and the tuple is placed at the recycled location
> of the aborted tuple.
> 8. The newly INSERTed tuple is UPDATEd.
> 9. heap_lock_tuple() of session 2 resumes, and will next call
> heap_lock_updated_tuple() using the t_ctid which was recorded in step
> 3.
>     Note that the code does not check the status of the transaction it
> was waiting on.
>     It will proceed, regardless of the transaction it was waiting on
> committed or aborted.
> 10. heap_lock_updated_tuple() will now work on the inserted tuple of
> step 7. It will see that this tuple updated,
>      and thus the return value of heap_lock_updated_tuple() will be TM_Updated.
> 11. This will eventually lead to the assertion (result ==
> TM_WouldBlock) || !(tuple->t_data->t_infomask & HEAP_XMAX_INVALID)
> 
> These concrete steps reproduce the problem:
> 
> Used version REL_18_0
> Run with configuration:
> io_combine_limit = '1'
> 
> session 1:
> drop table if exists t;
> create table t (id int);
> insert into t (select generate_series(1, 452));
> begin;
> update t set id = 1000 where id = 1;
> 
> session 2:
> gdb: b heapam.c:5033
> gdb: continue
> select * from t where id = 1 for update;
> 
> session 1:
> abort;
> select * from t;
> VACUUM (TRUNCATE off);
> insert into t values (453) returning ctid; -- Should be (2,1)
> update t set id = 454 where id = 453 returning ctid;
> 
> session 2: (should now be at the breakpoint)
> gdb: continue
> 
> We get the assertion:
> (result == TM_WouldBlock) || !(tuple->t_data->t_infomask & HEAP_XMAX_INVALID).
> Stacktrace included as attachment.

I can reproduce this.

> We have also seen the error "t_xmin %u is uncommitted in tuple (%u,%u)
> to be updated in table \"%s\"" in our testing.
>   We believe that this has a similar cause. However, we have not been
> able to come up with reproduction steps for that assertion
> specifically.

Yeah, sounds quite plausible that you could get that too.

> We think that this problem can be fixed in two different ways.
> 1. Check the commit status after waiting in heap_lock_updated_tuple(),
> and don't proceed checking the updated tuple when the commit has
> aborted.

Hmm, I think heap_lock_updated_tuple() would still lock incorrect tuples 
in that case, which could lead to unrelated transactions failing, or at 
least waiting unnecessarily.

> or check, like in heap_get_latest_tid(), if the xmin of the updated
> tuples matches the xmax of the old tuple.

+1 for that approach. heap_lock_updated_tuple() actually does that check 
too, but not for the first tuple.

> 2. Change vacuum, to delay marking aborted tuples dead until no
> visible tuple will have a t_ctid pointing to that tuple.

I guess that works too, but the downside is that we can't vacuum aborted 
tuples as fast.

Attached is a fix by checking the prior xmax in 
heap_lock_updated_tuple() for the first tuple already. The first commit 
adds your repro script as a regression test using an injection point. It 
hits the assertion failure, or returns incorrect result if assertions 
are disabled. The second commit fixes the bug and makes the test pass.

The comment on heap_lock_updated_tuple says:

>  * The initial tuple is assumed to be already locked.

But AFAICS that's not true. The caller holds the heavy-weight tuple 
lock, to establish priority if there are multiple concurrent lockers, 
but it doesn't stamp the initial tuple's xmax until after the 
heap_lock_updated_tuple() call. I guess we just need to update that 
comment, but I would love to get another pair of eyes on this, this code 
is hairy.

- Heikki

Вложения

Re: Visibility bug in tuple lock

От

Jasper Smit

Дата:

12 декабря 2025 г., 16:19:29

Thanks, for looking into this and for creating the patch.

> +1 for that approach. heap_lock_updated_tuple() actually does that check
> too, but not for the first tuple.

I think this will for sure fix the problem, however we are still
accessing completely unrelated tuples. It feels unsafe to access
tuples that might have been vacuumed/pruned away. Is it possible for
example that the page we are accessing has been truncated away in the
meantime?

> I guess that works too, but the downside is that we can't vacuum aborted
> tuples as fast.

I don't know why we need aborted tuples to be vacuumed so much faster
than dead tuples caused by inserts/updates. I would think that
generally you would have much more of those. In our code base we have
chosen for this approach, since this fix also safeguards us for
other potentially problematic cases where the ctid pointer is followed.

>>  * The initial tuple is assumed to be already locked.
> But AFAICS that's not true. The caller holds the heavy-weight tuple

This does not look true to me either.

I simplified your patch a little bit by extracting common code to the
heap_lock_updated_tuple() function. I also added the new test to the
Makefile.

Jasper Smit

On Thu, Dec 11, 2025 at 5:30 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> On 13/10/2025 10:48, Jasper Smit wrote:
> > We found a bug in heap_lock_tuple(). It seems to be unsafe to follow
> > the t_ctid pointer of updated tuples,
> >   in the presence of aborted tuples. Vacuum can mark aborted tuples
> > LP_UNUSED, even if there are still
> >   visible tuples that have a t_ctid pointing to them.
> >
> > We would like to address this issue in postgres. Attached is a patch
> > that modifies VACUUM behavior.
> > Could you advise us on the best way to proceed? How can we get this
> > patch included in postgres, or would you recommend
> > pursuing an alternative approach?
>
> You're at the right place :-). Thanks for the script and the patch!
>
> > The following scenario will lead to an assertion in debug or to wrong
> > results in release.
> >
> > 1. (session 1) A tuple is being UPDATEd in an active transaction.
> > 2. (session 2) Another session locks that tuple, for example, SELECT
> > .. FOR UPDATE.
> > 3. heap_lock_tuple() calls HeapTupleSatisfiesUpdate(), which returns
> > TM_BeingModified.
> >     It will record the xmax and t_ctid of the tuple.
> > 4. Since the tuple is actively being updated, we need to wait for
> > session 1 to complete.
> > 5. The UPDATE of session 1 aborts, leaving the original tuple with an
> > aborted xmax and t_ctid pointing to a
> >      tuple with an aborted xmin.
> > 6. VACUUM marks aborted tuples as unused, regardless of whether they
> > are still referenced by a visible tuple through t_ctid.
> >     As a result, the aborted tuple is marked unused.
> > 7. An INSERT happens and the tuple is placed at the recycled location
> > of the aborted tuple.
> > 8. The newly INSERTed tuple is UPDATEd.
> > 9. heap_lock_tuple() of session 2 resumes, and will next call
> > heap_lock_updated_tuple() using the t_ctid which was recorded in step
> > 3.
> >     Note that the code does not check the status of the transaction it
> > was waiting on.
> >     It will proceed, regardless of the transaction it was waiting on
> > committed or aborted.
> > 10. heap_lock_updated_tuple() will now work on the inserted tuple of
> > step 7. It will see that this tuple updated,
> >      and thus the return value of heap_lock_updated_tuple() will be TM_Updated.
> > 11. This will eventually lead to the assertion (result ==
> > TM_WouldBlock) || !(tuple->t_data->t_infomask & HEAP_XMAX_INVALID)
> >
> > These concrete steps reproduce the problem:
> >
> > Used version REL_18_0
> > Run with configuration:
> > io_combine_limit = '1'
> >
> > session 1:
> > drop table if exists t;
> > create table t (id int);
> > insert into t (select generate_series(1, 452));
> > begin;
> > update t set id = 1000 where id = 1;
> >
> > session 2:
> > gdb: b heapam.c:5033
> > gdb: continue
> > select * from t where id = 1 for update;
> >
> > session 1:
> > abort;
> > select * from t;
> > VACUUM (TRUNCATE off);
> > insert into t values (453) returning ctid; -- Should be (2,1)
> > update t set id = 454 where id = 453 returning ctid;
> >
> > session 2: (should now be at the breakpoint)
> > gdb: continue
> >
> > We get the assertion:
> > (result == TM_WouldBlock) || !(tuple->t_data->t_infomask & HEAP_XMAX_INVALID).
> > Stacktrace included as attachment.
>
> I can reproduce this.
>
> > We have also seen the error "t_xmin %u is uncommitted in tuple (%u,%u)
> > to be updated in table \"%s\"" in our testing.
> >   We believe that this has a similar cause. However, we have not been
> > able to come up with reproduction steps for that assertion
> > specifically.
>
> Yeah, sounds quite plausible that you could get that too.
>
> > We think that this problem can be fixed in two different ways.
> > 1. Check the commit status after waiting in heap_lock_updated_tuple(),
> > and don't proceed checking the updated tuple when the commit has
> > aborted.
>
> Hmm, I think heap_lock_updated_tuple() would still lock incorrect tuples
> in that case, which could lead to unrelated transactions failing, or at
> least waiting unnecessarily.
>
> > or check, like in heap_get_latest_tid(), if the xmin of the updated
> > tuples matches the xmax of the old tuple.
>
> +1 for that approach. heap_lock_updated_tuple() actually does that check
> too, but not for the first tuple.
>
> > 2. Change vacuum, to delay marking aborted tuples dead until no
> > visible tuple will have a t_ctid pointing to that tuple.
>
> I guess that works too, but the downside is that we can't vacuum aborted
> tuples as fast.
>
> Attached is a fix by checking the prior xmax in
> heap_lock_updated_tuple() for the first tuple already. The first commit
> adds your repro script as a regression test using an injection point. It
> hits the assertion failure, or returns incorrect result if assertions
> are disabled. The second commit fixes the bug and makes the test pass.
>
> The comment on heap_lock_updated_tuple says:
>
> >  * The initial tuple is assumed to be already locked.
>
> But AFAICS that's not true. The caller holds the heavy-weight tuple
> lock, to establish priority if there are multiple concurrent lockers,
> but it doesn't stamp the initial tuple's xmax until after the
> heap_lock_updated_tuple() call. I guess we just need to update that
> comment, but I would love to get another pair of eyes on this, this code
> is hairy.
>
> - Heikki

Вложения

Re: Visibility bug in tuple lock

От

Heikki Linnakangas

Дата:

15 декабря 2025 г., 22:52:29

On 12/12/2025 15:19, Jasper Smit wrote:
> Thanks, for looking into this and for creating the patch.
> 
>> +1 for that approach. heap_lock_updated_tuple() actually does that check
>> too, but not for the first tuple.
> 
> I think this will for sure fix the problem, however we are still
> accessing completely unrelated tuples. It feels unsafe to access
> tuples that might have been vacuumed/pruned away. Is it possible for
> example that the page we are accessing has been truncated away in the
> meantime?

It should be OK, we do it in other places too. For example, 
heapam_lock_tuple() follows the ctid after the call to 
heap_lock_tuple(), when called with TUPLE_LOCK_FLAG_FIND_LAST_VERSION.

heap_lock_updated_tuple_rec() handles the case that the tuple is missing 
altogether, and the xmin == priorXmax check ensures that it's the 
correct tuple.

Vacuum acquires an AccessExclusiveLock when truncating the relation, 
which ensures that no backend can be in the middle of following the 
update chain. That's not great - it would be nice to not need the 
AccesseExclusiveLock - but we do rely on it in many places currently.

>> I guess that works too, but the downside is that we can't vacuum aborted
>> tuples as fast.
> 
> I don't know why we need aborted tuples to be vacuumed so much faster
> than dead tuples caused by inserts/updates. I would think that
> generally you would have much more of those. In our code base we have
> chosen for this approach, since this fix also safeguards us for
> other potentially problematic cases where the ctid pointer is followed.

Yeah, for most workloads, it's probably not important that aborted 
tuples are vacuumed quickly. But I could imagine some workloads with 
lots of rollbacks and long running transactions where it would matter. 
Making a behavioral change like that in backbranches is a bad idea, when 
we have a much more localized patch available.

>>>   * The initial tuple is assumed to be already locked.
>> But AFAICS that's not true. The caller holds the heavy-weight tuple
> 
> This does not look true to me either.
> 
> I simplified your patch a little bit by extracting common code to the
> heap_lock_updated_tuple() function. 

Thanks! That's not quite correct though. This is very subtle, but the 
'tuple' argument to heap_lock_updated_tuple() points to the buffer 
holding the original tuple. So doing 
HeapTupleHeaderGetUpdateXid(tuple->t_data) reads the current xmax of the 
tuple, which can now be different than what it was earlier, i.e. 
different from the xwait that we waited for. It's important that the 
'ctid' and 'priorXmax' that we use to follow the chain are read together.

I expanded the test to demonstrate what can go wrong with that. If the 
original tuple was locked or updated concurrently, we try to incorrectly 
follow the update chain again and get the same assertion failure.

Hmm, now that i look at it, this existing check there looks a little 
suspicious too:

>     /*
>      * If the tuple has not been updated, or has moved into another partition
>      * (effectively a delete) stop here.
>      */
>     if (!HeapTupleHeaderIndicatesMovedPartitions(tuple->t_data) &&
>         !ItemPointerEquals(&tuple->t_self, ctid))
>     {

It's possible that tuple->t_data->t_ctid is modified concurrently by 
another backend, while we're doing the 
HeapTupleHeaderIndicatesMovedPartitions() check. I suppose it's 
harmless: if the tuple is updated concurrently, depending on the timing 
we'll either exit here quickly with TM_Ok, or we proceed to follow the 
ctid, will fail the priorXmax check, and return TM_Ok from there.

It seems sketchy to read the t_ctid without holding a lock on the buffer 
however. It's not guaranteed to be atomic.

Attached is a new patch version:

* I kept the code to get the updating xid from a multixid in 
heap_lock_updated_tuple() like you did, but it now correctly uses the 
original xmax of the tuple, instead of reading it from the buffer.

* Modified that HeapTupleHeaderIndicatesMovedPartitions() call to use 
the passed-in ctid instead of reading it from the buffer.

* I removed the 'tuple' pointer argument from heap_lock_updated_tuple() 
altogether. Seems safer that way. The function really shouldn't be 
accessing tuple->t_data because we're not holding a buffer lock. The 
"ItemPointerEquals(&tuple->t_self, ctid)" was safe, because 
tuple->t_self is just local memory, but I moved that to the callers so 
that I could remove the 'tuple' argument.

* Fixed the comment to not claim that the initial tuple has already been 
locked.

- Heikki

Вложения

v3-0001-Fix-bug-in-following-update-chain-when-locking-a-.patch

Re: Visibility bug in tuple lock

От

Jasper Smit

Дата:

17 декабря 2025 г., 17:51:17

> Thanks! That's not quite correct though. This is very subtle, but the
> 'tuple' argument to heap_lock_updated_tuple() points to the buffer
> holding the original tuple. So doing
> HeapTupleHeaderGetUpdateXid(tuple->t_data) reads the current xmax of the
> tuple, which can now be different than what it was earlier, i.e.
> different from the xwait that we waited for. It's important that the
> 'ctid' and 'priorXmax' that we use to follow the chain are read together.

Oops, you are right, I was too quick, it has already been quite some
time since i was dealing with this problem initially.

I see you moved the check ItemPointerEquals(&tuple->t_self, ctid) out
to heap_lock_tuple()
but isn't this redundant in the check `if (follow_updates && updated
&& !ItemPointerEquals(&tuple->t_self, &t_ctid))` if `updated` is
already true in this condition?

Also in `heap_lock_updated_tuple_rec()` is the check for
`TransactionIdIsValid(priorXmax)` now redundant too?

/ * Locking the initial tuple where those
  * values came from is the caller's responsibility. */

I think this is still ambiguous, you can still interpret that as the
initial tuple needs to be locked before the function is called. Maybe
it is better to change it to something like:
"This function will not lock the initial tuple."

Unrelated question, I was wondering why these functions take a "const
ItemPointerData *" as an argument as opposed to just pass the ctid by
value?

Jasper

Re: Visibility bug in tuple lock

От

Heikki Linnakangas

Дата:

17 декабря 2025 г., 18:42:41

On 17/12/2025 16:51, Jasper Smit wrote:
>> Thanks! That's not quite correct though. This is very subtle, but the
>> 'tuple' argument to heap_lock_updated_tuple() points to the buffer
>> holding the original tuple. So doing
>> HeapTupleHeaderGetUpdateXid(tuple->t_data) reads the current xmax of the
>> tuple, which can now be different than what it was earlier, i.e.
>> different from the xwait that we waited for. It's important that the
>> 'ctid' and 'priorXmax' that we use to follow the chain are read together.
> 
> Oops, you are right, I was too quick, it has already been quite some
> time since i was dealing with this problem initially.
> 
> I see you moved the check ItemPointerEquals(&tuple->t_self, ctid) out
> to heap_lock_tuple()
> but isn't this redundant in the check `if (follow_updates && updated
> && !ItemPointerEquals(&tuple->t_self, &t_ctid))` if `updated` is
> already true in this condition?

Hmm, I don't think so, HEAP_KEYS_UPDATED is also set on deleted tuples.

I tried to add an assertion for that and ran the regression tests, but 
nothing hit it. I was a little surprised. I guess we just don't have 
test coverage for the deletion case?

> Also in `heap_lock_updated_tuple_rec()` is the check for
> `TransactionIdIsValid(priorXmax)` now redundant too?

Yep. I'll replace it with an assertion.

> / * Locking the initial tuple where those
>    * values came from is the caller's responsibility. */
> 
> I think this is still ambiguous, you can still interpret that as the
> initial tuple needs to be locked before the function is called. Maybe
> it is better to change it to something like:
> "This function will not lock the initial tuple."

Ok.

> Unrelated question, I was wondering why these functions take a "const
> ItemPointerData *" as an argument as opposed to just pass the ctid by
> value?

I hate it too :-). Mostly historical reasons, it's always been like 
that, and that's the convention we use almost everywhere. On 32-bit 
systems, passing by reference made more sense.

I chatted about that with Andres Freund just the other day, and he said 
that compilers are not good at optimizing passing them by value. I'll 
take his word on that.

And while we're at it, when we're passing around ItemPointerDatas, it 
would make sense to not store the hi and low bits of the block number 
separately, they just need to be reassembled into a 32-bit BlockNumber 
before every use. I think we should have a separate in-memory 
representation of ItemPointerData, packed into a uint64, and use that in 
all function signatures. It's a lot of code churn, and would affect 
extensions too, but I feel it probably would be worth it.

- Heikki

Re: Visibility bug in tuple lock

От

Andres Freund

Дата:

17 декабря 2025 г., 20:11:35

Hi,

On 2025-12-15 21:52:29 +0200, Heikki Linnakangas wrote:
> Attached is a new patch version:

Note that the tests included in this seem to fail reliably on 32bit:
https://cirrus-ci.com/task/5787424783073280

Greetings,

Andres Freund

Re: Visibility bug in tuple lock

От

Heikki Linnakangas

Дата:

17 декабря 2025 г., 20:14:47

On 17/12/2025 19:11, Andres Freund wrote:
> On 2025-12-15 21:52:29 +0200, Heikki Linnakangas wrote:
>> Attached is a new patch version:
> 
> Note that the tests included in this seem to fail reliably on 32bit:
> https://cirrus-ci.com/task/5787424783073280

Ah yeah, the test is sensitive to how exactly the tuples land on the 
pages. It'll also fail with e.g. different block sizes. If we want to 
include that test permanently, we'll need to make it more robust.

- Heikki

Re: Visibility bug in tuple lock

От

Heikki Linnakangas

Дата:

17 декабря 2025 г., 23:05:50

On 17/12/2025 17:42, Heikki Linnakangas wrote:
> On 17/12/2025 16:51, Jasper Smit wrote:
>>> Thanks! That's not quite correct though. This is very subtle, but the
>>> 'tuple' argument to heap_lock_updated_tuple() points to the buffer
>>> holding the original tuple. So doing
>>> HeapTupleHeaderGetUpdateXid(tuple->t_data) reads the current xmax of the
>>> tuple, which can now be different than what it was earlier, i.e.
>>> different from the xwait that we waited for. It's important that the
>>> 'ctid' and 'priorXmax' that we use to follow the chain are read 
>>> together.
>>
>> Oops, you are right, I was too quick, it has already been quite some
>> time since i was dealing with this problem initially.
>>
>> I see you moved the check ItemPointerEquals(&tuple->t_self, ctid) out
>> to heap_lock_tuple()
>> but isn't this redundant in the check `if (follow_updates && updated
>> && !ItemPointerEquals(&tuple->t_self, &t_ctid))` if `updated` is
>> already true in this condition?
> 
> Hmm, I don't think so, HEAP_KEYS_UPDATED is also set on deleted tuples.

Ah, but this codepath is taken when HEAP_KEYS_UPDATED is *not* set. I 
got that backwards. So I agree the ItemPointerEquals(&tuple->t_self, 
ctid) check is redundant.

This is so subtle that I'm inclined to nevertheless keep it, at least in 
backbranches. Just in case we're missing something. In master, we could 
perhaps add an assertion for it.

Here's a new patch version. I worked some more on the test, making it 
less sensitive to the tuple layout. It should now pass on 32-bit systems 
and with different block sizes.

- Heikki

Вложения

v4-0001-Fix-bug-in-following-update-chain-when-locking-a-.patch

Re: Visibility bug in tuple lock

От

Jasper Smit

Дата:

18 декабря 2025 г., 12:56:25

The test is really nice with the injection points and the dynamically
sized data.

> Ah, but this codepath is taken when HEAP_KEYS_UPDATED is *not* set. I
> got that backwards. So I agree the ItemPointerEquals(&tuple->t_self,
> ctid) check is redundant.

Ok, I did not think about deletes. So the boolean updated here could
mean both update and delete, that makes sense to me.

> I chatted about that with Andres Freund just the other day, and he said
> that compilers are not good at optimizing passing them by value. I'll
> take his word on that.

I believe that too, but in heap_lock_updated_tuple_rec() the first
thing we do is ItemPointerCopy(tid, &tupid); , which makes it probably
just as inefficient.

Jasper

Re: Visibility bug in tuple lock

От

Heikki Linnakangas

Дата:

23 декабря 2025 г., 15:07:48

On 18/12/2025 11:56, Jasper Smit wrote:
> The test is really nice with the injection points and the dynamically
> sized data.
> 
>> Ah, but this codepath is taken when HEAP_KEYS_UPDATED is *not* set. I
>> got that backwards. So I agree the ItemPointerEquals(&tuple->t_self,
>> ctid) check is redundant.
> 
> Ok, I did not think about deletes. So the boolean updated here could
> mean both update and delete, that makes sense to me.

Committed and backpatched this now. Thanks!

- Heikki

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Visibility bug in tuple lock

Вложения

Вложения

Вложения

Вложения

Вложения