Re: Using POPCNT and other advanced bit manipulation instructions
От | Andres Freund |
---|---|
Тема | Re: Using POPCNT and other advanced bit manipulation instructions |
Дата | |
Msg-id | 20190215165513.64ptbtt3cn3ezfxb@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: Using POPCNT and other advanced bit manipulation instructions (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
Hi, On 2019-02-14 16:45:38 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2019-02-14 15:47:13 -0300, Alvaro Herrera wrote: > >> Hah, I just realized you have to add -mlzcnt in order for these builtins > >> to use the lzcnt instructions. It goes from something like > >> > >> bsrq %rax, %rax > >> xorq $63, %rax > > > I'm confused how this is a general count leading zero operation? Did you > > use constants or something that allowed ot infer a range in the test? If > > so the compiler probably did some optimizations allowing it to do the > > above. > > No. If you compile > > int myclz(unsigned long long x) > { > return __builtin_clzll(x); > } > > at -O2, on just about any x86_64 gcc, you will get > > myclz: > .LFB1: > .cfi_startproc > bsrq %rdi, %rax > xorq $63, %rax > ret > .cfi_endproc Yea, sorry for the noise. I misremembered the bsrq mnemonic. bsr has a latency of three cycles, xor of one. lzcnt a latency of three. So it's mildly faster to use lzcnt (it uses fewer ports, and has a shorter latency). But I doubt we have code where that's noticable. Greetings, Andres Freund
В списке pgsql-hackers по дате отправления: