Re: [PATCH v4] Avoid manual shift-and-test logic in AllocSetFreeIndex
От | Stefan Kaltenbrunner |
---|---|
Тема | Re: [PATCH v4] Avoid manual shift-and-test logic in AllocSetFreeIndex |
Дата | |
Msg-id | 4A64B0A0.80107@kaltenbrunner.cc обсуждение исходный текст |
Ответ на | Re: [PATCH v4] Avoid manual shift-and-test logic in AllocSetFreeIndex (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: [PATCH v4] Avoid manual shift-and-test logic in AllocSetFreeIndex
|
Список | pgsql-hackers |
Tom Lane wrote: > Jeremy Kerr <jk@ozlabs.org> writes: >> Rather than testing single bits in a loop, change AllocSetFreeIndex to >> use the __builtin_clz() function to calculate the chunk index. > >> This requires a new check for __builtin_clz in the configure script. > >> Results in a ~2% performance increase on sysbench on PowerPC. > > I did some performance testing on this by extracting the > AllocSetFreeIndex function into a standalone test program that just > executed it a lot of times in a loop. And there's a problem: on > x86_64 it is not much of a win. The code sequence that gcc generates > for __builtin_clz is basically > > bsrl %eax, %eax > xorl $31, %eax > > and it turns out that Intel hasn't seen fit to put a lot of effort into > the BSR instruction. It's constant time, all right, but on most of > their CPUs that constant time is like 8 or 16 times slower than an ADD; > cf http://www.intel.com/Assets/PDF/manual/248966.pdf hmm interesting - I don't have the exact numbers any more but that patch(or a previous version of it) definitly showed a noticable improvement when I tested with sysbench on a current generation Intel Nehalem... Stefan
В списке pgsql-hackers по дате отправления: