Re: Visibility bug in tuple lock

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Visibility bug in tuple lock
Дата
Msg-id 556a347e-e96e-4cf0-972d-16bde26c9b94@iki.fi
обсуждение исходный текст
Ответ на Re: Visibility bug in tuple lock  (Jasper Smit <jbsmit@gmail.com>)
Список pgsql-hackers
On 12/12/2025 15:19, Jasper Smit wrote:
> Thanks, for looking into this and for creating the patch.
> 
>> +1 for that approach. heap_lock_updated_tuple() actually does that check
>> too, but not for the first tuple.
> 
> I think this will for sure fix the problem, however we are still
> accessing completely unrelated tuples. It feels unsafe to access
> tuples that might have been vacuumed/pruned away. Is it possible for
> example that the page we are accessing has been truncated away in the
> meantime?

It should be OK, we do it in other places too. For example, 
heapam_lock_tuple() follows the ctid after the call to 
heap_lock_tuple(), when called with TUPLE_LOCK_FLAG_FIND_LAST_VERSION.

heap_lock_updated_tuple_rec() handles the case that the tuple is missing 
altogether, and the xmin == priorXmax check ensures that it's the 
correct tuple.

Vacuum acquires an AccessExclusiveLock when truncating the relation, 
which ensures that no backend can be in the middle of following the 
update chain. That's not great - it would be nice to not need the 
AccesseExclusiveLock - but we do rely on it in many places currently.

>> I guess that works too, but the downside is that we can't vacuum aborted
>> tuples as fast.
> 
> I don't know why we need aborted tuples to be vacuumed so much faster
> than dead tuples caused by inserts/updates. I would think that
> generally you would have much more of those. In our code base we have
> chosen for this approach, since this fix also safeguards us for
> other potentially problematic cases where the ctid pointer is followed.

Yeah, for most workloads, it's probably not important that aborted 
tuples are vacuumed quickly. But I could imagine some workloads with 
lots of rollbacks and long running transactions where it would matter. 
Making a behavioral change like that in backbranches is a bad idea, when 
we have a much more localized patch available.

>>>   * The initial tuple is assumed to be already locked.
>> But AFAICS that's not true. The caller holds the heavy-weight tuple
> 
> This does not look true to me either.
> 
> I simplified your patch a little bit by extracting common code to the
> heap_lock_updated_tuple() function. 

Thanks! That's not quite correct though. This is very subtle, but the 
'tuple' argument to heap_lock_updated_tuple() points to the buffer 
holding the original tuple. So doing 
HeapTupleHeaderGetUpdateXid(tuple->t_data) reads the current xmax of the 
tuple, which can now be different than what it was earlier, i.e. 
different from the xwait that we waited for. It's important that the 
'ctid' and 'priorXmax' that we use to follow the chain are read together.

I expanded the test to demonstrate what can go wrong with that. If the 
original tuple was locked or updated concurrently, we try to incorrectly 
follow the update chain again and get the same assertion failure.


Hmm, now that i look at it, this existing check there looks a little 
suspicious too:

>     /*
>      * If the tuple has not been updated, or has moved into another partition
>      * (effectively a delete) stop here.
>      */
>     if (!HeapTupleHeaderIndicatesMovedPartitions(tuple->t_data) &&
>         !ItemPointerEquals(&tuple->t_self, ctid))
>     {

It's possible that tuple->t_data->t_ctid is modified concurrently by 
another backend, while we're doing the 
HeapTupleHeaderIndicatesMovedPartitions() check. I suppose it's 
harmless: if the tuple is updated concurrently, depending on the timing 
we'll either exit here quickly with TM_Ok, or we proceed to follow the 
ctid, will fail the priorXmax check, and return TM_Ok from there.

It seems sketchy to read the t_ctid without holding a lock on the buffer 
however. It's not guaranteed to be atomic.

Attached is a new patch version:

* I kept the code to get the updating xid from a multixid in 
heap_lock_updated_tuple() like you did, but it now correctly uses the 
original xmax of the tuple, instead of reading it from the buffer.

* Modified that HeapTupleHeaderIndicatesMovedPartitions() call to use 
the passed-in ctid instead of reading it from the buffer.

* I removed the 'tuple' pointer argument from heap_lock_updated_tuple() 
altogether. Seems safer that way. The function really shouldn't be 
accessing tuple->t_data because we're not holding a buffer lock. The 
"ItemPointerEquals(&tuple->t_self, ctid)" was safe, because 
tuple->t_self is just local memory, but I moved that to the callers so 
that I could remove the 'tuple' argument.

* Fixed the comment to not claim that the initial tuple has already been 
locked.

- Heikki

Вложения

В списке pgsql-hackers по дате отправления: