Re: Popcount optimization using AVX512

Поиск
Список
Период
Сортировка
От Noah Misch
Тема Re: Popcount optimization using AVX512
Дата
Msg-id 20240210235238.eb@rfd.leadboat.com
обсуждение исходный текст
Ответ на Re: Popcount optimization using AVX512  (Andres Freund <andres@anarazel.de>)
Ответы Re: Popcount optimization using AVX512  (Nathan Bossart <nathandbossart@gmail.com>)
Список pgsql-hackers
On Fri, Feb 09, 2024 at 08:33:23PM -0800, Andres Freund wrote:
> On 2024-02-09 15:27:57 -0800, Noah Misch wrote:
> > On Fri, Feb 09, 2024 at 10:24:32AM -0800, Andres Freund wrote:
> > > On 2024-01-26 07:42:33 +0100, Alvaro Herrera wrote:
> > > > This suggests that finding a way to make the ifunc stuff work (with good
> > > > performance) is critical to this work.
> > > 
> > > Ifuncs are effectively implemented as a function call via a pointer, they're
> > > not magic, unfortunately. The sole trick they provide is that you don't
> > > manually have to use the function pointer.
> > 
> > The IFUNC creators introduced it so glibc could use arch-specific memcpy with
> > the instruction sequence of a non-pointer, extern function call, not the
> > instruction sequence of a function pointer call.
> 
> My understanding is that the ifunc mechanism just avoid the need for repeated
> indirect calls/jumps to implement a single function call, not the use of
> indirect function calls at all. Calls into shared libraries, like libc, are
> indirected via the GOT / PLT, i.e. an indirect function call/jump.  Without
> ifuncs, the target of the function call would then have to dispatch to the
> resolved function. Ifuncs allow to avoid this repeated dispatch by moving the
> dispatch to the dynamic linker stage, modifying the contents of the GOT/PLT to
> point to the right function. Thus ifuncs are an optimization when calling a
> function in a shared library that's then dispatched depending on the cpu
> capabilities.
> 
> However, in our case, where the code is in the same binary, function calls
> implemented in the main binary directly (possibly via a static library) don't
> go through GOT/PLT. In such a case, use of ifuncs turns a normal direct
> function call into one going through the GOT/PLT, i.e. makes it indirect. The
> same is true for calls within a shared library if either explicit symbol
> visibility is used, or -symbolic, -Wl,-Bsymbolic or such is used. Therefore
> there's no efficiency gain of ifuncs over a call via function pointer.
> 
> 
> This isn't because ifunc is implemented badly or something - the reason for
> this is that dynamic relocations aren't typically implemented by patching all
> callsites (".text relocations"), which is what you would need to avoid the
> need for an indirect call to something that fundamentally cannot be a constant
> address at link time. The reason text relocations are disfavored is that
> they can make program startup quite slow, that they require allowing
> modifications to executable pages which are disliked due to the security
> implications, and that they make the code non-shareable, as the in-memory
> executable code has to differ from the on-disk code.
> 
> 
> I actually think ifuncs within the same binary are a tad *slower* than plain
> function pointer calls, unless -fno-plt is used. Without -fno-plt, an ifunc is
> called by 1) a direct call into the PLT, 2) loading the target address from
> the GOT, 3) making an an indirect jump to that address.  Whereas a "plain
> indirect function call" is just 1) load target address from variable 2) making
> an indirect jump to that address. With -fno-plt the callsites themselves load
> the address from the GOT.

That sounds more accurate than what I wrote.  Thanks.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nathan Bossart
Дата:
Сообщение: Re: glibc qsort() vulnerability
Следующее
От: jian he
Дата:
Сообщение: Re: Patch: Add parse_type Function