Re: Avoid stack frame setup in performance critical routines using tail calls
От | Andres Freund |
---|---|
Тема | Re: Avoid stack frame setup in performance critical routines using tail calls |
Дата | |
Msg-id | 20210720155723.dau4xqsnfq72uih5@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: Avoid stack frame setup in performance critical routines using tail calls (David Rowley <dgrowleyml@gmail.com>) |
Список | pgsql-hackers |
Hi, On 2021-07-20 19:37:46 +1200, David Rowley wrote: > On Tue, 20 Jul 2021 at 19:04, Andres Freund <andres@anarazel.de> wrote: > > > * AllocateSetAlloc.txt > > > * palloc.txt > > > * percent.txt > > > > Huh, that's interesting. You have some control flow enforcement stuff turned on (the endbr64). And it looks like it hasa non zero cost (or maybe it's just skid). Did you enable that intentionally? If not, what compiler/version/distro isit? I think at least on GCC that's -fcf-protection=... > > It's ubuntu 21.04 with gcc 10.3 (specifically gcc version 10.3.0 > (Ubuntu 10.3.0-1ubuntu1) > > I've attached the same results from compiling with clang 12 > (12.0.0-3ubuntu1~21.04.1) It looks like the ubuntu folks have changed the default for CET to on. andres@ubuntu2020:~$ echo 'int foo(void) { return 17;}' > test.c && gcc -O2 -c -o test.o test.c && objdump -S test.o test.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <foo>: 0: f3 0f 1e fa endbr64 4: b8 11 00 00 00 mov $0x11,%eax 9: c3 retq andres@ubuntu2020:~$ echo 'int foo(void) { return 17;}' > test.c && gcc -O2 -fcf-protection=none -c -o test.o test.c && objdump-S test.o test.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <foo>: 0: b8 11 00 00 00 mov $0x11,%eax 5: c3 retq Independent of this patch, it might be worth running a benchmark with the default options, and one with -fcf-protection=none. None of my machines support it... $ cpuid -1|grep CET CET_SS: CET shadow stack = false CET_IBT: CET indirect branch tracking = false XCR0 supported: CET_U state = false XCR0 supported: CET_S state = false Here it adds about 40kB of .text, but I can't measure the CET overhead... Greetings, Andres Freund
В списке pgsql-hackers по дате отправления: