Re: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs

Поиск

Список

Период

Сортировка

От	Robert Haas
Тема	Re: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs
Дата	8 августа 2011 г. 09:47:35
Msg-id	CA+TgmobyqhvXtizQ7ugR7M27fdtto78LB4c-x407KVVH=SxJug@mail.gmail.com обсуждение исходный текст
Ответ на	Yes, WaitLatch is vulnerable to weak-memory-ordering bugs (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs Re: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs
Список	pgsql-hackers

Дерево обсуждения

On Sun, Aug 7, 2011 at 1:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Any protocol of that sort has *obviously* got a race condition between
> changes of the latch state and changes of the separate flag state, if run
> in a weak-memory-ordering machine.
>
> At least on the hardware I'm testing, it seems that the key requirement
> for a failure to be seen is that there's a lot of contention for the cache
> line containing the flag, but not for the cache line containing the latch.
> This allows the write to the flag to take longer to propagate to main
> memory than the write to the latch.

This is interesting research, especially because it provides sobering
evidence of how easily a real problem could be overlooked in testing.
If it took several minutes for this to fail on an 8-CPU system, it's
easy to imagine a less egregious example sitting in our code-base for
years undetected.  (Indeed, I think we probably have a few of those
already...)

On the flip side, I'm not sure that anyone ever really expected that a
latch could be safely used this way.  Normally, I'd expect the flag to
be some piece of state protected by an LWLock, and I think that ought
to be OK provided that the lwlock is released before setting or
checking the latch.  Maybe we should try to document the contract here
a little better; I think it's just that there must be a full memory
barrier (such as LWLockRelease) in both processes involved in the
interaction.

> I'm not sure whether we need to do anything about this for 9.1; it may be
> that none of the existing uses of latches are subject to the kind of
> contention that would create a risk.  But I'm entirely convinced that we
> need to insert some memory-barrier instructions into the latch code before
> we expand our uses of latches much more.  Since we were already finding
> that memory barriers will be needed for performance reasons elsewhere,
> I think it's time to bite the bullet and develop that infrastructure.

I've been thinking about this too and actually went so far as to do
some research and put together something that I hope covers most of
the interesting cases.  The attached patch is pretty much entirely
untested, but reflects my present belief about how things ought to
work.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

barrier-v1.patch

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs

Вложения