Re: tweaking MemSet() performance - 7.4.5
От | Manfred Spraul |
---|---|
Тема | Re: tweaking MemSet() performance - 7.4.5 |
Дата | |
Msg-id | 414C567C.3060503@colorfullife.com обсуждение исходный текст |
Ответ на | Re: tweaking MemSet() performance - 7.4.5 (Marc Colosimo <mcolosimo@mitre.org>) |
Ответы |
Re: tweaking MemSet() performance - 7.4.5
|
Список | pgsql-hackers |
Marc Colosimo wrote: > Oops, I used the same setting as in the old hacking message (-O2, gcc > 3.3). If I understand what you are saying, then it turns out yes, PG's > MemSet is faster for smaller blocksizes (see below, between 32 and > 64). I just replaced the whole MemSet with memset and it is not very > low when I profile. Could you check what the OS-X memset function does internally? One trick to speed up memset it to bypass the cache and bulk-write directly from write buffers to main memory. i386 cpus support that and in microbenchmarks it's 3 times faster (or something like that). Unfortunately it's a loss in real-world tests: Typically a structure is initialized with memset and then immediately accessed. If the memset bypasses the cache then the following access will cause a cache line miss, which can be so slow that using the faster memset can result in a net performance loss. > I could squeeze more out of it if I spent more time trying to > understand it (change MEMSET_LOOP_LIMIT to 32 and then add memset > after that?). I'm now working one understanding Spin Locks and > friends. Putting in a sync call (in s_lock.h) is really a time killer > and bad for performance (it takes up 35 cycles). > That's the price you pay for weakly ordered memory access. Linux on ppc uses eieio, on ppc64 lwsync is used. Could you check if they are faster? -- Manfred
В списке pgsql-hackers по дате отправления: