Re: s_lock() seems too aggressive for machines with many sockets
От | Robert Haas |
---|---|
Тема | Re: s_lock() seems too aggressive for machines with many sockets |
Дата | |
Msg-id | CA+TgmoY-ZDmYg86TpLxvteMzD1kjWEeS_YXRUod++zVpLgYtjQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: s_lock() seems too aggressive for machines with many sockets (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: s_lock() seems too aggressive for machines with many
sockets
|
Список | pgsql-hackers |
On Wed, Jun 10, 2015 at 1:39 PM, Andres Freund <andres@anarazel.de> wrote: > In the uncontended case lwlocks are just as fast as spinlocks now, with > the exception of the local tracking array. They're faster if there's > differences with read/write lockers. If nothing else, the spinlock calls are inline, while the lwlock calls involve a function call. So in the uncontended case I think the spinlock stuff is like 2 instructions, an atomic and a branch. >> I'm not altogether convinced that a simple replacement of spinlocks >> with atomics is going to solve this problem. If it does, great. But >> that doesn't eliminate spinning; it just moves it from the spinlock >> loop to a CAS loop. > > Well, not necessarily. If you can write your algorithm in a way that > xadd etc are used, instead of a lock cmpxchg, you're actually never > spinning on x86 as it's guaranteed to succeed. I think that's possible > for most of the places that currently lock buffer headers. Well, it will succeed by looping inside the instruction, I suppose. But OK. > (I had a version of the lwlock changes that used xadd for shared lock > acquisition - but the code needed to back out in error cases made things > more complicated, and the benefit on a four socket machine wasn't that > large) Now that we (EnterpriseDB) have this 8-socket machine, maybe we could try your patch there, bound to varying numbers of sockets. >> This is clearly better: when the CAS works, >> you're done, as opposed to having to then manipulate the cache line >> further and release the spinlock, during any of which the cache line >> may get stolen from you. > > It's not just that, it's also that there's no chance of sleeping while > holding a lock. Which doesn't happen that infrequently in contended, cpu > bound workloads. That's a really good point. >> However, I'm worried that it will still be possible to create the same >> kinds of performance collapses that we see with spinlocks with the CAS >> loops with which we place them - or even worse problems, because >> clearly the spin_delay stuff *works*, and we're unlikely to >> incorporate similarly sophisticated logic into every CAS loop. > > I doubt it's really neccessary, but It shouldn't be too hard to > generalize the sleeping logic so it could be used with little code in > the individual callsites. We should probably experiment with it to see whether it is needed. I am fine with leaving it out if it isn't, but it's worth testing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: