Re: Using POPCNT and other advanced bit manipulation instructions
| От | Dmitry Dolgov |
|---|---|
| Тема | Re: Using POPCNT and other advanced bit manipulation instructions |
| Дата | |
| Msg-id | CA+q6zcWaOS50A16_-RsisnFaA3RDkW0=BUoZJgVgL=27+j6tFQ@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: Using POPCNT and other advanced bit manipulation instructions (David Rowley <david.rowley@2ndquadrant.com>) |
| Список | pgsql-hackers |
> On Fri, Jan 4, 2019 at 1:38 PM David Rowley <david.rowley@2ndquadrant.com> wrote: > > On Thu, 20 Dec 2018 at 23:56, Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > I've checked for Clang 6, it turns out that indeed it generates popcnt without > > any macro, but only in one place for bloom_prop_bits_set. After looking at this > > function it seems that it would be benefitial to actually use popcnt there too. > > Yeah, that's the pattern that's mentioned in > https://lemire.me/blog/2016/05/23/the-surprising-cleverness-of-modern-compilers/ > It would need to be changed to call the popcount function. This > existing makes me a bit more worried that some extension could be > using a similar pattern and end up being compiled with -mpopcnt due to > pg_config having that CFLAG. That's all fine until the binary makes > it's way over to a machine without that instruction. It surprises me, that it's not that obvious how to disable this feature for clang. I guess one should be able to turn it off by invoking opt manually: clang -S -mpopcnt -emit-llvm *.c opt -S -mattr=+popcnt <all the options without -loop-idiom> *.ll llc -mattr=+popcnt *.optimized.ll clang -mpopcnt *optimized.s But for some reason this doesn't work for me (popcnt is not appearing in the first place). > > > I am able to measure performance gains from the patch. In a 3.4GB > > > table containing a single column with just 10 statistics targets, I > > > got the following times after running ANALYZE on the table. > > > > I've tested it too a bit, and got similar results when the patched version is > > slightly faster. But then I wonder if popcnt is the best solution here, since > > after some short research I found a paper [1], where authors claim that: > > > > Maybe surprisingly, we show that a vectorized approach using SIMD > > instructions can be twice as fast as using the dedicated instructions on > > recent Intel processors. > > > > > > [1]: https://arxiv.org/pdf/1611.07612.pdf > > I can't imagine that using the number_of_ones[] array processing > 8-bits at a time would be slower than POPCNT though. Yeah, probably you're right. If I understand correctly even with the lookup table in the cache the access would be a bit slower than a POPCNT instruction.
В списке pgsql-hackers по дате отправления: