Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA
От | Jesper Pedersen |
---|---|
Тема | Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA |
Дата | |
Msg-id | 8e1dce77-6d4c-cfdb-00e0-8e9b3321aaba@redhat.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA (Sokolov Yura <funny.falcon@postgrespro.ru>) |
Ответы |
Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA
|
Список | pgsql-hackers |
Hi, On 09/08/2017 03:35 PM, Sokolov Yura wrote: >> I'm seeing >> >> -M prepared: Up to 11% improvement >> -M prepared -S: No improvement, no regression ("noise") >> -M prepared -N: Up to 12% improvement >> >> for all runs the improvement shows up the closer you get to the number >> of CPU threads, or above. Although I'm not seeing the same >> improvements as you on very large client counts there are definitely >> improvements :) > > It is expected: > - patch "fixes NUMA": for example, it doesn't give improvement on 1 socket > at all (I've tested it using numactl to bind to 1 socket) > - and certainly it gives less improvement on 2 sockets than on 4 sockets > (and 28 cores vs 72 cores also gives difference), > - one of hot points were CLogControlLock, and it were fixed with > "Group mode for CLOG updates" [1] > I'm planning to re-test that patch. >> >> +static inline bool >> +LWLockAttemptLockOrQueueSelf(LWLock *lock, LWLockMode mode, >> LWLockMode waitmode) >> >> I'll leave it to the Committer to decide if this method is too big to >> be "inline". > > GCC 4.9 doesn't want to inline it without directive, and function call > is then remarkable in profile. > > Attach contains version with all suggestions applied except remove of > "inline". > Yes, ideally the method will be kept at "inline". >> Open questions: >> --------------- >> * spins_per_delay as extern >> * Calculation of skip_wait_list > > Currently calculation of skip_wait_list is mostly empirical (ie best > i measured). > Ok, good to know. > I strongly think that instead of spins_per_delay something dependent > on concrete lock should be used. I tried to store it in a LWLock > itself, but it were worse. Yes, LWLock should be kept as small as possible, and cache line aligned due to the cache storms, as shown by perf c2c. > Recently I understand it should be stored in array indexed by tranche, > but I didn't implement it yet, and therefore didn't measure. > Different constants for the LWLock could have an impact, but the constants would also be dependent on machine setup, and work load. Thanks for working on this ! Best regards, Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
В списке pgsql-hackers по дате отправления: