Re: s_lock() seems too aggressive for machines with many sockets

Поиск

Список

Период

Сортировка

От	Robert Haas
Тема	Re: s_lock() seems too aggressive for machines with many sockets
Дата	10 июня 2015 г. 17:52:25
Msg-id	CA+TgmoY-ZDmYg86TpLxvteMzD1kjWEeS_YXRUod++zVpLgYtjQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: s_lock() seems too aggressive for machines with many sockets (Andres Freund <andres@anarazel.de>)
Ответы	Re: s_lock() seems too aggressive for machines with many sockets
Список	pgsql-hackers

Дерево обсуждения

On Wed, Jun 10, 2015 at 1:39 PM, Andres Freund <andres@anarazel.de> wrote:
> In the uncontended case lwlocks are just as fast as spinlocks now, with
> the exception of the local tracking array. They're faster if there's
> differences with read/write lockers.

If nothing else, the spinlock calls are inline, while the lwlock calls
involve a function call.  So in the uncontended case I think the
spinlock stuff is like 2 instructions, an atomic and a branch.

>> I'm not altogether convinced that a simple replacement of spinlocks
>> with atomics is going to solve this problem.  If it does, great.  But
>> that doesn't eliminate spinning; it just moves it from the spinlock
>> loop to a CAS loop.
>
> Well, not necessarily. If you can write your algorithm in a way that
> xadd etc are used, instead of a lock cmpxchg, you're actually never
> spinning on x86 as it's guaranteed to succeed.  I think that's possible
> for most of the places that currently lock buffer headers.

Well, it will succeed by looping inside the instruction, I suppose.  But OK.

> (I had a version of the lwlock changes that used xadd for shared lock
> acquisition - but the code needed to back out in error cases made things
> more complicated, and the benefit on a four socket machine wasn't that
> large)

Now that we (EnterpriseDB) have this 8-socket machine, maybe we could
try your patch there, bound to varying numbers of sockets.

>> This is clearly better: when the CAS works,
>> you're done, as opposed to having to then manipulate the cache line
>> further and release the spinlock, during any of which the cache line
>> may get stolen from you.
>
> It's not just that, it's also that there's no chance of sleeping while
> holding a lock. Which doesn't happen that infrequently in contended, cpu
> bound workloads.

That's a really good point.

>> However, I'm worried that it will still be possible to create the same
>> kinds of performance collapses that we see with spinlocks with the CAS
>> loops with which we place them - or even worse problems, because
>> clearly the spin_delay stuff *works*, and we're unlikely to
>> incorporate similarly sophisticated logic into every CAS loop.
>
> I doubt it's really neccessary, but It shouldn't be too hard to
> generalize the sleeping logic so it could be used with little code in
> the individual callsites.

We should probably experiment with it to see whether it is needed.  I
am fine with leaving it out if it isn't, but it's worth testing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: s_lock() seems too aggressive for machines with many sockets