Re: tweaking MemSet() performance - 7.4.5

Поиск

Список

Период

Сортировка

От	Manfred Spraul
Тема	Re: tweaking MemSet() performance - 7.4.5
Дата	18 сентября 2004 г. 16:38:57
Msg-id	414C567C.3060503@colorfullife.com обсуждение исходный текст
Ответ на	Re: tweaking MemSet() performance - 7.4.5 (Marc Colosimo <mcolosimo@mitre.org>)
Ответы	Re: tweaking MemSet() performance - 7.4.5
Список	pgsql-hackers

Дерево обсуждения

Marc Colosimo wrote:

> Oops, I used the same setting as in the old hacking message (-O2, gcc 
> 3.3). If I understand what you are saying, then it turns out yes, PG's 
> MemSet is faster for smaller blocksizes (see below, between 32 and 
> 64). I just replaced the whole MemSet with memset and it is not very 
> low when I profile.

Could you check what the OS-X memset function does internally?
One trick to speed up memset it to bypass the cache and bulk-write 
directly from write buffers to main memory. i386 cpus support that and 
in microbenchmarks it's 3 times faster (or something like that). 
Unfortunately it's a loss in real-world tests: Typically a structure is 
initialized with memset and then immediately accessed. If the memset 
bypasses the cache then the following access will cause a cache line 
miss, which can be so slow that using the faster memset can result in a 
net performance loss.

> I could squeeze more out of it if I spent more time trying to 
> understand it (change MEMSET_LOOP_LIMIT to 32 and then add memset 
> after that?). I'm now working one understanding  Spin Locks and 
> friends. Putting in a sync call (in s_lock.h) is really a time killer 
> and bad for performance (it takes up 35 cycles).
>
That's the price you pay for weakly ordered memory access.
Linux on ppc uses eieio, on ppc64 lwsync is used. Could you check if 
they are faster?

--   Manfred

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: tweaking MemSet() performance - 7.4.5