Re: "ERROR: latch already owned" on gharial
От | Heikki Linnakangas |
---|---|
Тема | Re: "ERROR: latch already owned" on gharial |
Дата | |
Msg-id | 30a17cd9-9ae1-4aaf-b08c-f8d1a8e4287d@iki.fi обсуждение исходный текст |
Ответ на | Re: "ERROR: latch already owned" on gharial (Soumyadeep Chakraborty <soumyadeep2007@gmail.com>) |
Ответы |
Re: "ERROR: latch already owned" on gharial
|
Список | pgsql-hackers |
On 08/02/2024 04:08, Soumyadeep Chakraborty wrote: > A possible ordering of events: > > (1) DisownLatch() is called by pid Y during ProcKill() and the write for > latch->owner_pid = 0 is NOT yet flushed to shmem. > > (2) The PGPROC object for pid Y is returned to the free list. > > (3) Pid X sees the same PGPROC object on the free list and grabs it. > > (4) Pid X does sanity check inside OwnLatch during InitProcess and > still sees the > old value of latch->owner_pid = Y (and not = 0), and trips the ERROR. > > The above sequence of operations should apply to PG HEAD as well. > > Suggestion: > > Should we do a pg_memory_barrier() at the end of DisownLatch(), like in > ResetLatch(), like the one introduced in [3]? This would ensure that the write > latch->owner_pid = 0; is flushed to shmem. The attached patch does this. Hmm, there is a pair of SpinLockAcquire() and SpinLockRelease() in ProcKill(), before step 3 can happen. Comment in spin.h about SpinLockAcquire/Release: > * Load and store operations in calling code are guaranteed not to be > * reordered with respect to these operations, because they include a > * compiler barrier. (Before PostgreSQL 9.5, callers needed to use a > * volatile qualifier to access data protected by spinlocks.) That talks about a compiler barrier, though, not a memory barrier. But looking at the implementations in s_lock.h, I believe they do act as memory barrier, too. So you might indeed have that problem on 9.4, but AFAICS not on later versions. -- Heikki Linnakangas Neon (https://neon.tech)
В списке pgsql-hackers по дате отправления: