cache false-sharing in lwlocks
От | Rayson Ho |
---|---|
Тема | cache false-sharing in lwlocks |
Дата | |
Msg-id | 73a01bf21001110924u9d754dby96bdd14f78e2840d@mail.gmail.com обсуждение исходный текст |
Список | pgsql-performance |
Hi, LWLockPadded is either 16 or 32 bytes, so modern systems (e.g. Core2 or AMD Opteron [1]) with cacheline size of 64 bytes can get false-sharing in lwlocks. I changed LWLOCK_PADDED_SIZE in src/backend/storage/lmgr/lwlock.c to 64, and ran sysbench OLTP read-only benchmark, and got a slight improvement in throughput: Hardware: single-socket Core 2, quad-core, Q6600 @ 2.40GHz Software: Linux 2.6.28-17, glibc 2.9, gcc 4.3.3 PostgreSQL: 8.5alpha3 sysbench parameters: sysbench --num-threads=4 --max-requests=0 --max-time=120 --oltp-read-only=on --test=oltp original: 3227, 3243, 3243 after: 3256, 3255, 3253 So there is a speedup of 1.005 or what other people usually call it, a 0.5% improvement. However, it's a single socket machine, so all the cache traffic does not need to go off-chip. Can someone with a multi-socket machine help me run some test so that we can get a better idea of how this change (patch attached) performs in bigger systems?? Thanks, Rayson P.S. And I just googled and found similar discussions about padding LWLOCK_PADDED_SIZE, but the previous work was done on an IBM POWER system, and the benchmark used was apachebench. IMO, the setup was too complex to measure a small performance improvement in PostgreSQL. [1] Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™ ccNUMA Multiprocessor Systems Application Note
Вложения
В списке pgsql-performance по дате отправления: