Re: BUG #13681: Serialization failures caused by new multixact code of 9.3 (back-patch request)

Поиск

Список

Период

Сортировка

От	Alvaro Herrera
Тема	Re: BUG #13681: Serialization failures caused by new multixact code of 9.3 (back-patch request)
Дата	18 декабря 2015 г. 17:24:25
Msg-id	20151218172413.GS2618@alvherre.pgsql обсуждение исходный текст
Ответ на	Re: BUG #13681: Serialization failures caused by new multixact code of 9.3 (back-patch request) (Kevin Grittner <kgrittn@gmail.com>)
Ответы	Re: BUG #13681: Serialization failures caused by new multixact code of 9.3 (back-patch request)
Список	pgsql-bugs

Дерево обсуждения

Kevin Grittner wrote:
> On Thu, Dec 17, 2015 at 12:31 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:

> > we have a transaction that takes a lock-only multi in table
> > users, and then when we do the second update we don't look it up
> > because ...??
>
> The referencing column value did not change.  (We would not have
> looked up on the first update either, since it also didn't change
> there.)
>
> >  And then this causes the test case not to fail because ..?
>
> The concurrent update of the referencing table is not seen as a
> write conflict (because it didn't actually change).

Okay, but I don't understand why that particular patch fixes the
problem.  That patch was only supposed to skip looking up members of
multixacts when they were not interesting; so it's just a performance
optimization which shouldn't change the behavior of anything.  However
in this case, it is changing behavior and I'm concerned that it might
have introduced other bugs.

Tracing through the test case, what happens is that s1's second update
calls SELECT FOR SHARE of the users tuple (via the RI code); this calls
ExecLockRows which calls heap_lock_tuple which calls
HeapTupleSatisfiesUpdate.  The latter returns HeapTupleUpdated, because
it sees that the tuple has been modified by a transaction that already
committed (s2).  But we already hold a for-key-share lock on the old
version of the tuple -- that HeapTupleUpdated result value should be
examined carefully by heap_lock_tuple to determine that, yes, it can
acquire the row lock without raising an error.

So, if I patch heap_lock_rows like below, the test passes too:

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 559970f..aaf8e8e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -4005,7 +4005,7 @@ l3:
         UnlockReleaseBuffer(*buffer);
         elog(ERROR, "attempted to lock invisible tuple");
     }
-    else if (result == HeapTupleBeingUpdated)
+    else if (result == HeapTupleBeingUpdated || result == HeapTupleUpdated)
     {
         TransactionId xwait;
         uint16        infomask;

I think heap_lock_rows had that shape (only consider BeingUpdated as a
reason to check/wait) only because it was possible to lock a row that
was being locked by someone else, but it wasn't possible to lock a row
that had been updated by someone else -- which became possible in 9.3.
So this patch is necessary, and not just to fix this one bug.

I need to study the implications of this patch more closely, but I think
in theory it is correct.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BUG #13681: Serialization failures caused by new multixact code of 9.3 (back-patch request)