On Fri, Dec 15, 2023 at 5:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I did check that the v1 patch successfully inlines newNode() and
> reduces it to just a MemoryContextAllocZeroAligned call, so it's
> correct that modern compilers do that better than whatever I tested
> in 2008. But I wonder what is happening in v2 to reduce the code
> size so much. MemoryContextAllocZeroAligned is not 20kB by itself.
I poked at this a bit and it seems to come from what Heikki said
upthread about fewer instructions before the calls: Running objdump on
v1 and v2 copyfuncs.o and diff'ing shows there are fewer MOV
instructions (some extraneous stuff removed):
e9 da 5f 00 00 jmp <_copyReindexStmt>
- 48 8b 05 00 00 00 00 mov rax,QWORD PTR [rip+0x0]
- be 18 00 00 00 mov esi,0x18
- 48 8b 38 mov rdi,QWORD PTR [rax]
- e8 00 00 00 00 call MemoryContextAllocZeroAligned-0x4
+ bf 18 00 00 00 mov edi,0x18
+ e8 00 00 00 00 call palloc0-0x4
That's 10 bytes savings.
- 48 8b 05 00 00 00 00 mov rax,QWORD PTR [rip+0x0]
- 48 8b 38 mov rdi,QWORD PTR [rax]
- e8 00 00 00 00 call MemoryContextAllocZeroAligned-0x4
+ e8 00 00 00 00 call palloc0-0x4
...another 10 bytes. Over and over again.
Because of the size differences, the compiler is inlining more: e.g.
in v1 _copyFieldStore has 4 call sites, but in v2 it got inlined.
About the patch, I'm wondering if this whitespace is intentional, but
it's otherwise straightforward:
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -132,6 +132,7 @@ typedef struct Node
#define nodeTag(nodeptr) (((const Node*)(nodeptr))->type)
+
/*