Re: Popcount optimization using AVX512
От | Noah Misch |
---|---|
Тема | Re: Popcount optimization using AVX512 |
Дата | |
Msg-id | 20240210235238.eb@rfd.leadboat.com обсуждение исходный текст |
Ответ на | Re: Popcount optimization using AVX512 (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: Popcount optimization using AVX512
(Nathan Bossart <nathandbossart@gmail.com>)
|
Список | pgsql-hackers |
On Fri, Feb 09, 2024 at 08:33:23PM -0800, Andres Freund wrote: > On 2024-02-09 15:27:57 -0800, Noah Misch wrote: > > On Fri, Feb 09, 2024 at 10:24:32AM -0800, Andres Freund wrote: > > > On 2024-01-26 07:42:33 +0100, Alvaro Herrera wrote: > > > > This suggests that finding a way to make the ifunc stuff work (with good > > > > performance) is critical to this work. > > > > > > Ifuncs are effectively implemented as a function call via a pointer, they're > > > not magic, unfortunately. The sole trick they provide is that you don't > > > manually have to use the function pointer. > > > > The IFUNC creators introduced it so glibc could use arch-specific memcpy with > > the instruction sequence of a non-pointer, extern function call, not the > > instruction sequence of a function pointer call. > > My understanding is that the ifunc mechanism just avoid the need for repeated > indirect calls/jumps to implement a single function call, not the use of > indirect function calls at all. Calls into shared libraries, like libc, are > indirected via the GOT / PLT, i.e. an indirect function call/jump. Without > ifuncs, the target of the function call would then have to dispatch to the > resolved function. Ifuncs allow to avoid this repeated dispatch by moving the > dispatch to the dynamic linker stage, modifying the contents of the GOT/PLT to > point to the right function. Thus ifuncs are an optimization when calling a > function in a shared library that's then dispatched depending on the cpu > capabilities. > > However, in our case, where the code is in the same binary, function calls > implemented in the main binary directly (possibly via a static library) don't > go through GOT/PLT. In such a case, use of ifuncs turns a normal direct > function call into one going through the GOT/PLT, i.e. makes it indirect. The > same is true for calls within a shared library if either explicit symbol > visibility is used, or -symbolic, -Wl,-Bsymbolic or such is used. Therefore > there's no efficiency gain of ifuncs over a call via function pointer. > > > This isn't because ifunc is implemented badly or something - the reason for > this is that dynamic relocations aren't typically implemented by patching all > callsites (".text relocations"), which is what you would need to avoid the > need for an indirect call to something that fundamentally cannot be a constant > address at link time. The reason text relocations are disfavored is that > they can make program startup quite slow, that they require allowing > modifications to executable pages which are disliked due to the security > implications, and that they make the code non-shareable, as the in-memory > executable code has to differ from the on-disk code. > > > I actually think ifuncs within the same binary are a tad *slower* than plain > function pointer calls, unless -fno-plt is used. Without -fno-plt, an ifunc is > called by 1) a direct call into the PLT, 2) loading the target address from > the GOT, 3) making an an indirect jump to that address. Whereas a "plain > indirect function call" is just 1) load target address from variable 2) making > an indirect jump to that address. With -fno-plt the callsites themselves load > the address from the GOT. That sounds more accurate than what I wrote. Thanks.
В списке pgsql-hackers по дате отправления: