Re: Better LWLocks with compare-and-swap (9.4)
От | Daniel Farina |
---|---|
Тема | Re: Better LWLocks with compare-and-swap (9.4) |
Дата | |
Msg-id | CAAZKuFbDf4+HPYNNpjjvgAsYzte7_SMnVLgRmYuN7UWES6KxUg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Better LWLocks with compare-and-swap (9.4) (Daniel Farina <daniel@heroku.com>) |
Список | pgsql-hackers |
On Wed, May 15, 2013 at 3:08 PM, Daniel Farina <daniel@heroku.com> wrote: > On Mon, May 13, 2013 at 5:50 AM, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: >> pgbench -S is such a workload. With 9.3beta1, I'm seeing this profile, when >> I run "pgbench -S -c64 -j64 -T60 -M prepared" on a 32-core Linux machine: >> >> - 64.09% postgres postgres [.] tas >> - tas >> - 99.83% s_lock >> - 53.22% LWLockAcquire >> + 99.87% GetSnapshotData >> - 46.78% LWLockRelease >> GetSnapshotData >> + GetTransactionSnapshot >> + 2.97% postgres postgres [.] tas >> + 1.53% postgres libc-2.13.so [.] 0x119873 >> + 1.44% postgres postgres [.] GetSnapshotData >> + 1.29% postgres [kernel.kallsyms] [k] arch_local_irq_enable >> + 1.18% postgres postgres [.] AllocSetAlloc >> ... >> >> So, on this test, a lot of time is wasted spinning on the mutex of >> ProcArrayLock. If you plot a graph of TPS vs. # of clients, there is a >> surprisingly steep drop in performance once you go beyond 29 clients >> (attached, pgbench-lwlock-cas-local-clients-sets.png, red line). My theory >> is that after that point all the cores are busy, and processes start to be >> sometimes context switched while holding the spinlock, which kills >> performance. I accidentally some important last words from Heikki's last words in his mail, which make my correspondence pretty bizarre to understand at the outset. Apologies. He wrote: >> [...] Has anyone else seen that pattern? > I have, I also used linux perf to come to this conclusion, and my > determination was similar: a system was undergoing increasingly heavy > load, in this case with processes >> number of processors. It was > also a phase-change type of event: at one moment everything would be > going great, but once a critical threshold was hit, s_lock would > consume enormous amount of CPU time. I figured preemption while in > the spinlock was to blame at the time, given the extreme nature.
В списке pgsql-hackers по дате отправления: