Обсуждение: Re: Why our Valgrind reports suck
Andres Freund <andres@anarazel.de> writes: > On 2025-05-08 22:04:06 -0400, Tom Lane wrote: >> A nearby thread [1] reminded me to wonder why we seem to have >> so many false-positive leaks reported by Valgrind these days. > Huh. We use the memory pool client requests to inform valgrind about memory > contexts. I seem to recall that that "hid" many leak warnings from valgrind. I > wonder if we somehow broke (or weakened) that. The problem with dynahash has been there since day one, so I think we've just gotten used to ignoring "leak" reports associated with hash tables --- I have, anyway. But the other triggers I listed have appeared within the last five-ish years, if memory serves, and we just didn't notice because of the existing dynahash noise. > We currently don't reset TopMemoryContext at exit, which, obviously, does > massively increase the number of leaks. But OTOH, without that there's not a > whole lot of value in the leak check... Hmm. Yeah, we could just reset or delete TopMemoryContext, but surely that would be counterproductive. It would mask any real leak of palloc'd blocks. I'm a little suspicious of your other idea of shutting down the caches, for the same reason: I wonder if it wouldn't hide leaks rather than help find them. One thing I noticed while reading the Valgrind manual is that they describe a facility for "two level" tracking of custom allocators such as ours. Apparently, what you're really supposed to do is use VALGRIND_MEMPOOL_ALLOC to mark the malloc blocks that the allocator works in, and VALGRIND_MALLOCLIKE_BLOCK to mark the sub-blocks handed out by the allocator. I wonder if this feature postdates our implementation of Valgrind support, and I wonder even more if using it would improve our results. I did experiment with marking context headers as accessible with VALGRIND_MEMPOOL_ALLOC, and that made the complaints about MemoryContextCopyAndSetIdentifier strings go away, confirming that Valgrind is simply not considering the context->ident pointers. Unfortunately it also added a bunch of other failures, so that evidently is Not The Right Thing. I suspect what is going on is related to this bit in valgrind.h: For Memcheck users: if you use VALGRIND_MALLOCLIKE_BLOCK to carve out custom blocks from within a heap block, B, that has been allocated with malloc/calloc/new/etc, then block B will be *ignored* during leak-checking -- the custom blocks will take precedence. We're not using VALGRIND_MALLOCLIKE_BLOCK (yet, anyway), but I'm suspecting that Valgrind probably also ignores heap blocks that match VALGRIND_CREATE_MEMPOOL requests, except for the portions thereof that are covered by VALGRIND_MEMPOOL_ALLOC requests. Anyway, I'm now feeling motivated to go try some experiments. Watch this space ... regards, tom lane
I wrote: > One thing I noticed while reading the Valgrind manual is that > they describe a facility for "two level" tracking of custom > allocators such as ours. And, since there's nothing new under the sun around here, we already had a discussion about that back in 2021: https://www.postgresql.org/message-id/flat/3471359.1615937770%40sss.pgh.pa.us That thread seems to have led to fixing some specific bugs, but we never committed any of the discussed valgrind infrastructure improvements. I'll have a go at resurrecting that... regards, tom lane
I wrote: > And, since there's nothing new under the sun around here, > we already had a discussion about that back in 2021: > https://www.postgresql.org/message-id/flat/3471359.1615937770%40sss.pgh.pa.us > That thread seems to have led to fixing some specific bugs, > but we never committed any of the discussed valgrind infrastructure > improvements. I'll have a go at resurrecting that... Okay, here is a patch series that updates the 0001-Make-memory-contexts-themselves-more-visible-to-valg.patch patch you posted in that thread, and makes various follow-up fixes that either fix or paper over various leaks. Some of it is committable I think, but other parts are just WIP. Anyway, as of the 0010 patch we can run through the core regression tests and see no more than a couple of kilobytes total reported leakage in any process, except for two tests that expose leaks in TS dictionary building. (That's fixable but I ran out of time, and I wanted to get this posted before Montreal.) There is work left to do before we can remove the suppressions added in 0002, but this is already huge progress compared to where we were. A couple of these patches are bug fixes that need to be applied and even back-patched. In particular, I had not realized that autovacuum leaks a nontrivial amount of memory per relation processed (cf 0009), and apparently has done for a few releases now. This is horrid in databases with many tables, and I'm surprised that we've not gotten complaints about it. regards, tom lane From d4ca3e2e4824ff1f11ee5324b4ba46c4dd971ce9 Mon Sep 17 00:00:00 2001 From: Tom Lane <tgl@sss.pgh.pa.us> Date: Sun, 11 May 2025 12:39:59 -0400 Subject: [PATCH v1 01/11] Improve our support for Valgrind's leak tracking. When determining whether an allocated chunk is still reachable, Valgrind will consider only pointers within what it believes to be allocated chunks. Normally, all of a block obtained from malloc() would be considered "allocated" --- but it turns out that if we use VALGRIND_MEMPOOL_ALLOC to designate sub-section(s) of a malloc'ed block as allocated, all the rest of that malloc'ed block is ignored. This leads to lots of false positives of course. In particular, in any multi-malloc-block context, all but the primary block were reported as leaked. We also had a problem with context "ident" strings, which were reported as leaked unless there was some other pointer to them besides the one in the context header. To fix, we need to use VALGRIND_MEMPOOL_ALLOC to designate a context's management structs (the context struct itself and any per-block headers) as allocated chunks. That forces moving the VALGRIND_CREATE_MEMPOOL/VALGRIND_DESTROY_MEMPOOL calls into the per-context-type code, so that the pool identifier can be made as soon as we've allocated the initial block, but otherwise it's fairly straightforward. Note that in Valgrind's eyes there is no distinction between these allocations and the allocations that the mmgr modules hand out to user code. That's fine for now, but perhaps someday we'll want to do better yet. Author: Andres Freund <andres@anarazel.de> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us Discussion: https://postgr.es/m/20210317181531.7oggpqevzz6bka3g@alap3.anarazel.de --- src/backend/utils/mmgr/aset.c | 69 ++++++++++++++++++++++++++++- src/backend/utils/mmgr/bump.c | 31 ++++++++++++- src/backend/utils/mmgr/generation.c | 29 ++++++++++++ src/backend/utils/mmgr/mcxt.c | 15 ++++--- src/backend/utils/mmgr/slab.c | 32 +++++++++++++ src/include/utils/memdebug.h | 1 + 6 files changed, 168 insertions(+), 9 deletions(-) diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c index 666ecd8f78d..21490213842 100644 --- a/src/backend/utils/mmgr/aset.c +++ b/src/backend/utils/mmgr/aset.c @@ -103,6 +103,8 @@ #define ALLOC_BLOCKHDRSZ MAXALIGN(sizeof(AllocBlockData)) #define ALLOC_CHUNKHDRSZ sizeof(MemoryChunk) +#define FIRST_BLOCKHDRSZ (MAXALIGN(sizeof(AllocSetContext)) + \ + ALLOC_BLOCKHDRSZ) typedef struct AllocBlockData *AllocBlock; /* forward reference */ @@ -458,6 +460,21 @@ AllocSetContextCreateInternal(MemoryContext parent, * we'd leak the header/initial block if we ereport in this stretch. */ + /* Create a Valgrind "pool" associated with the context */ + VALGRIND_CREATE_MEMPOOL(set, 0, false); + + /* + * Create a Valgrind "chunk" covering both the AllocSetContext struct and + * the keeper block's header. (It would be more sensible for these to be + * two separate chunks, but doing that seems to tickle bugs in some + * versions of Valgrind.) We must have these chunks, and also a chunk for + * each subsequently-added block header, so that Valgrind considers the + * pointers within them while checking for leaked memory. Note that + * Valgrind doesn't distinguish between these chunks and those created by + * mcxt.c for the user-accessible-data chunks we allocate. + */ + VALGRIND_MEMPOOL_ALLOC(set, set, FIRST_BLOCKHDRSZ); + /* Fill in the initial block's block header */ block = KeeperBlock(set); block->aset = set; @@ -585,6 +602,14 @@ AllocSetReset(MemoryContext context) #ifdef CLOBBER_FREED_MEMORY wipe_mem(block, block->freeptr - ((char *) block)); #endif + + /* + * We need to free Valgrind's block-header chunks explicitly, + * although the user-data chunks within will go away in the TRIM + * below. Otherwise Valgrind complains about leaked allocations. + */ + VALGRIND_MEMPOOL_FREE(set, block); + free(block); } block = next; @@ -592,6 +617,14 @@ AllocSetReset(MemoryContext context) Assert(context->mem_allocated == keepersize); + /* + * Instruct Valgrind to throw away all the chunks associated with this + * context, except for the one covering the AllocSetContext and + * keeper-block header. This gets rid of the chunks for whatever user + * data is getting discarded by the context reset. + */ + VALGRIND_MEMPOOL_TRIM(set, set, FIRST_BLOCKHDRSZ); + /* Reset block size allocation sequence, too */ set->nextBlockSize = set->initBlockSize; } @@ -648,6 +681,9 @@ AllocSetDelete(MemoryContext context) freelist->first_free = (AllocSetContext *) oldset->header.nextchild; freelist->num_free--; + /* Destroy the context's Valgrind pool --- see notes below */ + VALGRIND_DESTROY_MEMPOOL(oldset); + /* All that remains is to free the header/initial block */ free(oldset); } @@ -675,13 +711,24 @@ AllocSetDelete(MemoryContext context) #endif if (!IsKeeperBlock(set, block)) + { + /* As in AllocSetReset, free block-header chunks explicitly */ + VALGRIND_MEMPOOL_FREE(set, block); free(block); + } block = next; } Assert(context->mem_allocated == keepersize); + /* + * Destroy the Valgrind pool. We don't seem to need to explicitly free + * the initial block's header chunk, nor any user-data chunks that + * Valgrind still knows about. + */ + VALGRIND_DESTROY_MEMPOOL(set); + /* Finally, free the context header, including the keeper block */ free(set); } @@ -716,6 +763,9 @@ AllocSetAllocLarge(MemoryContext context, Size size, int flags) if (block == NULL) return MemoryContextAllocationFailure(context, size, flags); + /* Make a Valgrind chunk covering the new block's header */ + VALGRIND_MEMPOOL_ALLOC(set, block, ALLOC_BLOCKHDRSZ); + context->mem_allocated += blksize; block->aset = set; @@ -922,6 +972,9 @@ AllocSetAllocFromNewBlock(MemoryContext context, Size size, int flags, if (block == NULL) return MemoryContextAllocationFailure(context, size, flags); + /* Make a Valgrind chunk covering the new block's header */ + VALGRIND_MEMPOOL_ALLOC(set, block, ALLOC_BLOCKHDRSZ); + context->mem_allocated += blksize; block->aset = set; @@ -1104,6 +1157,10 @@ AllocSetFree(void *pointer) #ifdef CLOBBER_FREED_MEMORY wipe_mem(block, block->freeptr - ((char *) block)); #endif + + /* As in AllocSetReset, free block-header chunks explicitly */ + VALGRIND_MEMPOOL_FREE(set, block); + free(block); } else @@ -1184,6 +1241,7 @@ AllocSetRealloc(void *pointer, Size size, int flags) * realloc() to make the containing block bigger, or smaller, with * minimum space wastage. */ + AllocBlock newblock; Size chksize; Size blksize; Size oldblksize; @@ -1223,14 +1281,21 @@ AllocSetRealloc(void *pointer, Size size, int flags) blksize = chksize + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ; oldblksize = block->endptr - ((char *) block); - block = (AllocBlock) realloc(block, blksize); - if (block == NULL) + newblock = (AllocBlock) realloc(block, blksize); + if (newblock == NULL) { /* Disallow access to the chunk header. */ VALGRIND_MAKE_MEM_NOACCESS(chunk, ALLOC_CHUNKHDRSZ); return MemoryContextAllocationFailure(&set->header, size, flags); } + /* + * Move Valgrind's block-header chunk explicitly. (mcxt.c will take + * care of moving the chunk for the user data.) + */ + VALGRIND_MEMPOOL_CHANGE(set, block, newblock, ALLOC_BLOCKHDRSZ); + block = newblock; + /* updated separately, not to underflow when (oldblksize > blksize) */ set->header.mem_allocated -= oldblksize; set->header.mem_allocated += blksize; diff --git a/src/backend/utils/mmgr/bump.c b/src/backend/utils/mmgr/bump.c index f7a37d1b3e8..271bf78c583 100644 --- a/src/backend/utils/mmgr/bump.c +++ b/src/backend/utils/mmgr/bump.c @@ -45,7 +45,9 @@ #include "utils/memutils_memorychunk.h" #include "utils/memutils_internal.h" -#define Bump_BLOCKHDRSZ MAXALIGN(sizeof(BumpBlock)) +#define Bump_BLOCKHDRSZ MAXALIGN(sizeof(BumpBlock)) +#define FIRST_BLOCKHDRSZ (MAXALIGN(sizeof(BumpContext)) + \ + Bump_BLOCKHDRSZ) /* No chunk header unless built with MEMORY_CONTEXT_CHECKING */ #ifdef MEMORY_CONTEXT_CHECKING @@ -189,6 +191,12 @@ BumpContextCreate(MemoryContext parent, const char *name, Size minContextSize, * Avoid writing code that can fail between here and MemoryContextCreate; * we'd leak the header and initial block if we ereport in this stretch. */ + + /* See comments about Valgrind interactions in aset.c */ + VALGRIND_CREATE_MEMPOOL(set, 0, false); + /* This chunk covers the BumpContext and the keeper block header */ + VALGRIND_MEMPOOL_ALLOC(set, set, FIRST_BLOCKHDRSZ); + dlist_init(&set->blocks); /* Fill in the initial block's block header */ @@ -262,6 +270,14 @@ BumpReset(MemoryContext context) BumpBlockFree(set, block); } + /* + * Instruct Valgrind to throw away all the chunks associated with this + * context, except for the one covering the BumpContext and keeper-block + * header. This gets rid of the chunks for whatever user data is getting + * discarded by the context reset. + */ + VALGRIND_MEMPOOL_TRIM(set, set, FIRST_BLOCKHDRSZ); + /* Reset block size allocation sequence, too */ set->nextBlockSize = set->initBlockSize; @@ -279,6 +295,10 @@ BumpDelete(MemoryContext context) { /* Reset to release all releasable BumpBlocks */ BumpReset(context); + + /* Destroy the Valgrind pool -- see notes in aset.c */ + VALGRIND_DESTROY_MEMPOOL(context); + /* And free the context header and keeper block */ free(context); } @@ -318,6 +338,9 @@ BumpAllocLarge(MemoryContext context, Size size, int flags) if (block == NULL) return MemoryContextAllocationFailure(context, size, flags); + /* Make a Valgrind chunk covering the new block's header */ + VALGRIND_MEMPOOL_ALLOC(set, block, Bump_BLOCKHDRSZ); + context->mem_allocated += blksize; /* the block is completely full */ @@ -455,6 +478,9 @@ BumpAllocFromNewBlock(MemoryContext context, Size size, int flags, if (block == NULL) return MemoryContextAllocationFailure(context, size, flags); + /* Make a Valgrind chunk covering the new block's header */ + VALGRIND_MEMPOOL_ALLOC(set, block, Bump_BLOCKHDRSZ); + context->mem_allocated += blksize; /* initialize the new block */ @@ -606,6 +632,9 @@ BumpBlockFree(BumpContext *set, BumpBlock *block) wipe_mem(block, ((char *) block->endptr - (char *) block)); #endif + /* As in aset.c, free block-header chunks explicitly */ + VALGRIND_MEMPOOL_FREE(set, block); + free(block); } diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c index 18679ad4f1e..e476df534d7 100644 --- a/src/backend/utils/mmgr/generation.c +++ b/src/backend/utils/mmgr/generation.c @@ -45,6 +45,8 @@ #define Generation_BLOCKHDRSZ MAXALIGN(sizeof(GenerationBlock)) #define Generation_CHUNKHDRSZ sizeof(MemoryChunk) +#define FIRST_BLOCKHDRSZ (MAXALIGN(sizeof(GenerationContext)) + \ + Generation_BLOCKHDRSZ) #define Generation_CHUNK_FRACTION 8 @@ -221,6 +223,12 @@ GenerationContextCreate(MemoryContext parent, * Avoid writing code that can fail between here and MemoryContextCreate; * we'd leak the header if we ereport in this stretch. */ + + /* See comments about Valgrind interactions in aset.c */ + VALGRIND_CREATE_MEMPOOL(set, 0, false); + /* This chunk covers the GenerationContext and the keeper block header */ + VALGRIND_MEMPOOL_ALLOC(set, set, FIRST_BLOCKHDRSZ); + dlist_init(&set->blocks); /* Fill in the initial block's block header */ @@ -309,6 +317,14 @@ GenerationReset(MemoryContext context) GenerationBlockFree(set, block); } + /* + * Instruct Valgrind to throw away all the chunks associated with this + * context, except for the one covering the GenerationContext and + * keeper-block header. This gets rid of the chunks for whatever user + * data is getting discarded by the context reset. + */ + VALGRIND_MEMPOOL_TRIM(set, set, FIRST_BLOCKHDRSZ); + /* set it so new allocations to make use of the keeper block */ set->block = KeeperBlock(set); @@ -329,6 +345,10 @@ GenerationDelete(MemoryContext context) { /* Reset to release all releasable GenerationBlocks */ GenerationReset(context); + + /* Destroy the Valgrind pool -- see notes in aset.c */ + VALGRIND_DESTROY_MEMPOOL(context); + /* And free the context header and keeper block */ free(context); } @@ -365,6 +385,9 @@ GenerationAllocLarge(MemoryContext context, Size size, int flags) if (block == NULL) return MemoryContextAllocationFailure(context, size, flags); + /* Make a Valgrind chunk covering the new block's header */ + VALGRIND_MEMPOOL_ALLOC(set, block, Generation_BLOCKHDRSZ); + context->mem_allocated += blksize; /* block with a single (used) chunk */ @@ -487,6 +510,9 @@ GenerationAllocFromNewBlock(MemoryContext context, Size size, int flags, if (block == NULL) return MemoryContextAllocationFailure(context, size, flags); + /* Make a Valgrind chunk covering the new block's header */ + VALGRIND_MEMPOOL_ALLOC(set, block, Generation_BLOCKHDRSZ); + context->mem_allocated += blksize; /* initialize the new block */ @@ -677,6 +703,9 @@ GenerationBlockFree(GenerationContext *set, GenerationBlock *block) wipe_mem(block, block->blksize); #endif + /* As in aset.c, free block-header chunks explicitly */ + VALGRIND_MEMPOOL_FREE(set, block); + free(block); } diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c index 7d28ca706eb..6fce916d11e 100644 --- a/src/backend/utils/mmgr/mcxt.c +++ b/src/backend/utils/mmgr/mcxt.c @@ -8,6 +8,15 @@ * context-type-specific operations via the function pointers in a * context's MemoryContextMethods struct. * + * A note about Valgrind support: the context-type-specific code is + * responsible for creating and deleting a Valgrind "pool" for each context, + * and also for creating Valgrind "chunks" to cover its management data + * structures such as block headers. (There must be a chunk that includes + * every pointer we want Valgrind to consider for leak-tracking purposes.) + * This module creates and deletes chunks that cover the caller-visible + * allocated areas. However, the context-type-specific code must handle + * cleaning up caller-visible chunks during memory context reset operations. + * * * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California @@ -449,8 +458,6 @@ MemoryContextResetOnly(MemoryContext context) context->methods->reset(context); context->isReset = true; - VALGRIND_DESTROY_MEMPOOL(context); - VALGRIND_CREATE_MEMPOOL(context, 0, false); } } @@ -557,8 +564,6 @@ MemoryContextDeleteOnly(MemoryContext context) context->ident = NULL; context->methods->delete_context(context); - - VALGRIND_DESTROY_MEMPOOL(context); } /* @@ -1212,8 +1217,6 @@ MemoryContextCreate(MemoryContext node, node->nextchild = NULL; node->allowInCritSection = false; } - - VALGRIND_CREATE_MEMPOOL(node, 0, false); } /* diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c index d32c0d318fb..563ac728db6 100644 --- a/src/backend/utils/mmgr/slab.c +++ b/src/backend/utils/mmgr/slab.c @@ -377,6 +377,11 @@ SlabContextCreate(MemoryContext parent, * we'd leak the header if we ereport in this stretch. */ + /* See comments about Valgrind interactions in aset.c */ + VALGRIND_CREATE_MEMPOOL(slab, 0, false); + /* This chunk covers the SlabContext only */ + VALGRIND_MEMPOOL_ALLOC(slab, slab, sizeof(SlabContext)); + /* Fill in SlabContext-specific header fields */ slab->chunkSize = (uint32) chunkSize; slab->fullChunkSize = (uint32) fullChunkSize; @@ -451,6 +456,10 @@ SlabReset(MemoryContext context) #ifdef CLOBBER_FREED_MEMORY wipe_mem(block, slab->blockSize); #endif + + /* As in aset.c, free block-header chunks explicitly */ + VALGRIND_MEMPOOL_FREE(slab, block); + free(block); context->mem_allocated -= slab->blockSize; } @@ -467,11 +476,23 @@ SlabReset(MemoryContext context) #ifdef CLOBBER_FREED_MEMORY wipe_mem(block, slab->blockSize); #endif + + /* As in aset.c, free block-header chunks explicitly */ + VALGRIND_MEMPOOL_FREE(slab, block); + free(block); context->mem_allocated -= slab->blockSize; } } + /* + * Instruct Valgrind to throw away all the chunks associated with this + * context, except for the one covering the SlabContext. This gets rid of + * the chunks for whatever user data is getting discarded by the context + * reset. + */ + VALGRIND_MEMPOOL_TRIM(slab, slab, sizeof(SlabContext)); + slab->curBlocklistIndex = 0; Assert(context->mem_allocated == 0); @@ -486,6 +507,10 @@ SlabDelete(MemoryContext context) { /* Reset to release all the SlabBlocks */ SlabReset(context); + + /* Destroy the Valgrind pool -- see notes in aset.c */ + VALGRIND_DESTROY_MEMPOOL(context); + /* And free the context header */ free(context); } @@ -567,6 +592,9 @@ SlabAllocFromNewBlock(MemoryContext context, Size size, int flags) if (unlikely(block == NULL)) return MemoryContextAllocationFailure(context, size, flags); + /* Make a Valgrind chunk covering the new block's header */ + VALGRIND_MEMPOOL_ALLOC(slab, block, Slab_BLOCKHDRSZ); + block->slab = slab; context->mem_allocated += slab->blockSize; @@ -795,6 +823,10 @@ SlabFree(void *pointer) #ifdef CLOBBER_FREED_MEMORY wipe_mem(block, slab->blockSize); #endif + + /* As in aset.c, free block-header chunks explicitly */ + VALGRIND_MEMPOOL_FREE(slab, block); + free(block); slab->header.mem_allocated -= slab->blockSize; } diff --git a/src/include/utils/memdebug.h b/src/include/utils/memdebug.h index 7309271834b..80692dcef93 100644 --- a/src/include/utils/memdebug.h +++ b/src/include/utils/memdebug.h @@ -29,6 +29,7 @@ #define VALGRIND_MEMPOOL_ALLOC(context, addr, size) do {} while (0) #define VALGRIND_MEMPOOL_FREE(context, addr) do {} while (0) #define VALGRIND_MEMPOOL_CHANGE(context, optr, nptr, size) do {} while (0) +#define VALGRIND_MEMPOOL_TRIM(context, addr, size) do {} while (0) #endif -- 2.43.5 From fb80149e0517b33dd265ca5b1f559b0f60405a5a Mon Sep 17 00:00:00 2001 From: Tom Lane <tgl@sss.pgh.pa.us> Date: Sun, 11 May 2025 12:52:32 -0400 Subject: [PATCH v1 02/11] Temporarily suppress some Valgrind complaints. This isn't meant for commit, but it is useful to reduce the large amount of leak-check noise generated by a couple of known problems: Temporary buffers, as well as some other structures, are allocated in a way that causes us not to hold any pointer to what Valgrind thinks is the start of the chunk. That causes "possible leak" reports. PL/pgSQL and SQL-function parsing leak stuff into the long-lived function cache context. This isn't really a huge problem, since the cruft will be recovered if we have to re-parse the function. But it leads to a lot of complaints from Valgrind. We need better answers for these things, but for the moment hide them to allow sorting through other leakage problems. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us --- src/tools/valgrind.supp | 42 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/src/tools/valgrind.supp b/src/tools/valgrind.supp index 7ea464c8094..7d4fb0b2c1d 100644 --- a/src/tools/valgrind.supp +++ b/src/tools/valgrind.supp @@ -180,3 +180,45 @@ Memcheck:Cond fun:PyObject_Realloc } + +# Suppress complaints about aligned allocations of temp buffers. +# This ought to be fixed properly, instead. +{ + hide_temp_buffer_allocations + Memcheck:Leak + match-leak-kinds: possible + fun:MemoryContextAlloc + fun:GetLocalBufferStorage +} + +# Suppress complaints about aligned allocations of other stuff. +# This ought to be fixed properly, instead. +{ + hide_alignment_leakage + Memcheck:Leak + match-leak-kinds: possible + fun:MemoryContextAllocExtended + fun:MemoryContextAllocAligned +} + +# Suppress complaints about stuff leaked by PL/pgSQL parsing. +# This ought to be fixed properly, instead. +{ + hide_plpgsql_parsing_leaks + Memcheck:Leak + match-leak-kinds: definite,possible,indirect + + ... + fun:plpgsql_compile_callback +} + +# And likewise for stuff leaked by SQL-function parsing. +# This ought to be fixed properly, instead. +{ + hide_sql_parsing_leaks + Memcheck:Leak + match-leak-kinds: definite,possible,indirect + + ... + fun:sql_compile_callback +} -- 2.43.5 From 2f7d138a0a222e9f598a6f54c233c2cc7f6391c9 Mon Sep 17 00:00:00 2001 From: Tom Lane <tgl@sss.pgh.pa.us> Date: Sun, 11 May 2025 12:56:23 -0400 Subject: [PATCH v1 03/11] Silence complaints about leaked dynahash storage. Because dynahash.c never frees hashtable storage except by deleting the whole hashtable context, it doesn't bother to track the individual blocks of elements allocated by element_alloc(). This results in "possibly lost" complaints from Valgrind except when the first element of each block is actively in use. (Otherwise it'll be on a freelist, but very likely only reachable via "interior pointers" within element blocks, which doesn't satisfy Valgrind.) To fix, if we're building with USE_VALGRIND, expend an extra pointer's worth of space in each element block so that we can chain them all together from the HTAB header. Skip this in shared hashtables though: Valgrind doesn't track those, and we'd need additional locking to make it safe to manipulate a shared chain. While here, update a comment obsoleted by 9c911ec06. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us --- src/backend/utils/hash/dynahash.c | 52 +++++++++++++++++++++++++++---- 1 file changed, 46 insertions(+), 6 deletions(-) diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c index 1ad155d446e..f81f9c199e4 100644 --- a/src/backend/utils/hash/dynahash.c +++ b/src/backend/utils/hash/dynahash.c @@ -22,10 +22,11 @@ * lookup key's hash value as a partition number --- this will work because * of the way calc_bucket() maps hash values to bucket numbers. * - * For hash tables in shared memory, the memory allocator function should - * match malloc's semantics of returning NULL on failure. For hash tables - * in local memory, we typically use palloc() which will throw error on - * failure. The code in this file has to cope with both cases. + * The memory allocator function should match malloc's semantics of returning + * NULL on failure. (This is essential for hash tables in shared memory. + * For hash tables in local memory, we used to use palloc() which will throw + * error on failure; but we no longer do, so it's untested whether this + * module will still cope with that behavior.) * * dynahash.c provides support for these types of lookup keys: * @@ -98,6 +99,7 @@ #include "access/xact.h" #include "common/hashfn.h" +#include "lib/ilist.h" #include "port/pg_bitutils.h" #include "storage/shmem.h" #include "storage/spin.h" @@ -236,6 +238,16 @@ struct HTAB Size keysize; /* hash key length in bytes */ long ssize; /* segment size --- must be power of 2 */ int sshift; /* segment shift = log2(ssize) */ + + /* + * In a USE_VALGRIND build, non-shared hashtables keep an slist chain of + * all the element blocks they have allocated. This pacifies Valgrind, + * which would otherwise often claim that the element blocks are "possibly + * lost" for lack of any non-interior pointers to their starts. + */ +#ifdef USE_VALGRIND + slist_head element_blocks; +#endif }; /* @@ -1708,6 +1720,8 @@ element_alloc(HTAB *hashp, int nelem, int freelist_idx) { HASHHDR *hctl = hashp->hctl; Size elementSize; + Size requestSize; + char *allocedBlock; HASHELEMENT *firstElement; HASHELEMENT *tmpElement; HASHELEMENT *prevElement; @@ -1719,12 +1733,38 @@ element_alloc(HTAB *hashp, int nelem, int freelist_idx) /* Each element has a HASHELEMENT header plus user data. */ elementSize = MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(hctl->entrysize); + requestSize = nelem * elementSize; + + /* Add space for slist_node list link if we need one. */ +#ifdef USE_VALGRIND + if (!hashp->isshared) + requestSize += MAXALIGN(sizeof(slist_node)); +#endif + + /* Allocate the memory. */ CurrentDynaHashCxt = hashp->hcxt; - firstElement = (HASHELEMENT *) hashp->alloc(nelem * elementSize); + allocedBlock = hashp->alloc(requestSize); - if (!firstElement) + if (!allocedBlock) return false; + /* + * If USE_VALGRIND, each allocated block of elements of a non-shared + * hashtable is chained into a list, so that Valgrind won't think it's + * been leaked. + */ +#ifdef USE_VALGRIND + if (hashp->isshared) + firstElement = (HASHELEMENT *) allocedBlock; + else + { + slist_push_head(&hashp->element_blocks, (slist_node *) allocedBlock); + firstElement = (HASHELEMENT *) (allocedBlock + MAXALIGN(sizeof(slist_node))); + } +#else + firstElement = (HASHELEMENT *) allocedBlock; +#endif + /* prepare to link all the new entries into the freelist */ prevElement = NULL; tmpElement = firstElement; -- 2.43.5 From 8f3915908e39ece6572198decb16d91a76de6947 Mon Sep 17 00:00:00 2001 From: Tom Lane <tgl@sss.pgh.pa.us> Date: Sun, 11 May 2025 13:09:53 -0400 Subject: [PATCH v1 04/11] Silence complaints about leaked catcache entries. Put the dlist_node fields of catctup and catclist structs first. This ensures that the dlist pointers point to the starts of these palloc blocks, and thus that Valgrind won't consider them "possibly lost". This is a hack, since it implies a similar requirement for every other struct that we chain into slists or dlists (unless there are other pointers to them, as there are for the CatCache structs). But there is no other obvious alternative. Fixing these two places seems to be enough to silence all such Valgrind complaints in simple test cases. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us --- src/include/utils/catcache.h | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/src/include/utils/catcache.h b/src/include/utils/catcache.h index 277ec33c00b..00808e23f49 100644 --- a/src/include/utils/catcache.h +++ b/src/include/utils/catcache.h @@ -87,6 +87,14 @@ typedef struct catcache typedef struct catctup { + /* + * Each tuple in a cache is a member of a dlist that stores the elements + * of its hash bucket. We keep each dlist in LRU order to speed repeated + * lookups. Keep the dlist_node field first so that Valgrind understands + * the struct is reachable. + */ + dlist_node cache_elem; /* list member of per-bucket list */ + int ct_magic; /* for identifying CatCTup entries */ #define CT_MAGIC 0x57261502 @@ -98,13 +106,6 @@ typedef struct catctup */ Datum keys[CATCACHE_MAXKEYS]; - /* - * Each tuple in a cache is a member of a dlist that stores the elements - * of its hash bucket. We keep each dlist in LRU order to speed repeated - * lookups. - */ - dlist_node cache_elem; /* list member of per-bucket list */ - /* * A tuple marked "dead" must not be returned by subsequent searches. * However, it won't be physically deleted from the cache until its @@ -158,13 +159,17 @@ typedef struct catctup */ typedef struct catclist { + /* + * Keep the dlist_node field first so that Valgrind understands the struct + * is reachable. + */ + dlist_node cache_elem; /* list member of per-catcache list */ + int cl_magic; /* for identifying CatCList entries */ #define CL_MAGIC 0x52765103 uint32 hash_value; /* hash value for lookup keys */ - dlist_node cache_elem; /* list member of per-catcache list */ - /* * Lookup keys for the entry, with the first nkeys elements being valid. * All by-reference are separately allocated. -- 2.43.5 From 206cff7a169626dc32c411751f472c2147cd7a98 Mon Sep 17 00:00:00 2001 From: Tom Lane <tgl@sss.pgh.pa.us> Date: Sun, 11 May 2025 13:14:52 -0400 Subject: [PATCH v1 05/11] Silence complaints about leaked CatCache structs. In USE_VALGRIND builds, don't attempt to cache-align CatCache structs. That prevents them from being reported as "possibly lost" because all the pointers to them look like interior pointers to Valgrind. This is a hack, not meant for final commit. The same hazard exists for every other use of MemoryContextAllocAligned, so we really ought to fix it there instead of touching this code. But this seems to be enough to silence all such Valgrind complaints in simple test cases, so just do this much for now. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us --- src/backend/utils/cache/catcache.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index 657648996c2..9e82d9fec68 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -923,12 +923,19 @@ InitCatCache(int id, } /* - * Allocate a new cache structure, aligning to a cacheline boundary + * Allocate a new cache structure. Normally we align it to a cacheline + * boundary, but skip that in USE_VALGRIND builds because it will confuse + * Valgrind. (XXX really we should fix that problem in + * MemoryContextAllocAligned, not here.) * * Note: we rely on zeroing to initialize all the dlist headers correctly */ +#ifndef USE_VALGRIND cp = (CatCache *) palloc_aligned(sizeof(CatCache), PG_CACHE_LINE_SIZE, MCXT_ALLOC_ZERO); +#else + cp = (CatCache *) palloc0(sizeof(CatCache)); +#endif cp->cc_bucket = palloc0(nbuckets * sizeof(dlist_head)); /* -- 2.43.5 From 6dbb1a4c32f394a7dc63073027bc942a5c1798a3 Mon Sep 17 00:00:00 2001 From: Tom Lane <tgl@sss.pgh.pa.us> Date: Sun, 11 May 2025 13:19:35 -0400 Subject: [PATCH v1 06/11] Silence complaints about save_ps_display_args. Valgrind seems not to consider the global "environ" variable as a valid root pointer; so when we allocate a new environment array, it claims that data is leaked. To fix that, keep our own statically-allocated copy of the pointer. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us --- src/backend/utils/misc/ps_status.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/src/backend/utils/misc/ps_status.c b/src/backend/utils/misc/ps_status.c index e08b26e8c14..4df25944deb 100644 --- a/src/backend/utils/misc/ps_status.c +++ b/src/backend/utils/misc/ps_status.c @@ -100,6 +100,17 @@ static void flush_ps_display(void); static int save_argc; static char **save_argv; +/* + * Valgrind seems not to consider the global "environ" variable as a valid + * root pointer; so when we allocate a new environment array, it claims that + * data is leaked. To fix that, keep our own statically-allocated copy of the + * pointer. (Oddly, this doesn't seem to be a problem for "argv".) + */ +#if defined(PS_USE_CLOBBER_ARGV) && defined(USE_VALGRIND) +extern char **ps_status_new_environ; +char **ps_status_new_environ; +#endif + /* * Call this early in startup to save the original argc/argv values. @@ -206,6 +217,11 @@ save_ps_display_args(int argc, char **argv) } new_environ[i] = NULL; environ = new_environ; + + /* See notes about Valgrind above. */ +#ifdef USE_VALGRIND + ps_status_new_environ = new_environ; +#endif } /* -- 2.43.5 From baf462f1854ffc8a38b0c4bdabb4575e36cea0f6 Mon Sep 17 00:00:00 2001 From: Tom Lane <tgl@sss.pgh.pa.us> Date: Sun, 11 May 2025 13:20:59 -0400 Subject: [PATCH v1 07/11] Don't leak the startup-packet buffer in ProcessStartupPacket. This is the first actual code bug fix in this patch series. I only bothered to free the buffer in the successful-exit code paths, not the error-exit ones, since we'd quit soon anyway in the latter cases. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us --- src/backend/tcop/backend_startup.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/src/backend/tcop/backend_startup.c b/src/backend/tcop/backend_startup.c index a7d1fec981f..0b956627934 100644 --- a/src/backend/tcop/backend_startup.c +++ b/src/backend/tcop/backend_startup.c @@ -627,6 +627,8 @@ ProcessStartupPacket(Port *port, bool ssl_done, bool gss_done) errmsg("received unencrypted data after SSL request"), errdetail("This could be either a client-software bug or evidence of an attempted man-in-the-middleattack."))); + pfree(buf); + /* * regular startup packet, cancel, etc packet should follow, but not * another SSL negotiation request, and a GSS request should only @@ -681,6 +683,8 @@ ProcessStartupPacket(Port *port, bool ssl_done, bool gss_done) errmsg("received unencrypted data after GSSAPI encryption request"), errdetail("This could be either a client-software bug or evidence of an attempted man-in-the-middleattack."))); + pfree(buf); + /* * regular startup packet, cancel, etc packet should follow, but not * another GSS negotiation request, and an SSL request should only @@ -858,6 +862,8 @@ ProcessStartupPacket(Port *port, bool ssl_done, bool gss_done) if (am_walsender && !am_db_walsender) port->database_name[0] = '\0'; + pfree(buf); + /* * Done filling the Port structure */ -- 2.43.5 From c8e124f6c6d209e39072e6c2859c476bce8c082e Mon Sep 17 00:00:00 2001 From: Tom Lane <tgl@sss.pgh.pa.us> Date: Sun, 11 May 2025 13:32:17 -0400 Subject: [PATCH v1 08/11] Fix some extremely broken code from 525392d57. This uselessly makes two copies of the plan tree during BuildCachedPlan; there should be only one, and it should be in the stmt_context not plan_context. Also, UpdateCachedPlan has a fragile and undocumented assumption that the length of the stmt_list can't change. (The XXX comments in that function have a very strong whiff of code that wasn't really ready to commit, but this is enough fixing to silence Valgrind's complaints about leaked data in the plan_context.) Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us --- src/backend/utils/cache/plancache.c | 16 +++++----------- 1 file changed, 5 insertions(+), 11 deletions(-) diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c index 9bcbc4c3e97..53e3255ad7f 100644 --- a/src/backend/utils/cache/plancache.c +++ b/src/backend/utils/cache/plancache.c @@ -1114,17 +1114,18 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist, ALLOCSET_START_SMALL_SIZES); MemoryContextCopyAndSetIdentifier(plan_context, plansource->query_string); - stmt_context = AllocSetContextCreate(CurrentMemoryContext, + stmt_context = AllocSetContextCreate(plan_context, "CachedPlan PlannedStmts", ALLOCSET_START_SMALL_SIZES); MemoryContextCopyAndSetIdentifier(stmt_context, plansource->query_string); - MemoryContextSetParent(stmt_context, plan_context); + /* + * Copy plans into the stmt_context. + */ MemoryContextSwitchTo(stmt_context); plist = copyObject(plist); MemoryContextSwitchTo(plan_context); - plist = list_copy(plist); } else plan_context = CurrentMemoryContext; @@ -1207,8 +1208,6 @@ UpdateCachedPlan(CachedPlanSource *plansource, int query_index, { List *query_list = plansource->query_list, *plan_list; - ListCell *l1, - *l2; CachedPlan *plan = plansource->gplan; MemoryContext oldcxt; @@ -1260,12 +1259,7 @@ UpdateCachedPlan(CachedPlanSource *plansource, int query_index, MemoryContextReset(plan->stmt_context); oldcxt = MemoryContextSwitchTo(plan->stmt_context); - forboth(l1, plan_list, l2, plan->stmt_list) - { - PlannedStmt *plannedstmt = lfirst(l1); - - lfirst(l2) = copyObject(plannedstmt); - } + plan->stmt_list = copyObject(plan_list); MemoryContextSwitchTo(oldcxt); /* -- 2.43.5 From f5b72b015066053ed3396f0ea010ae470911f3b6 Mon Sep 17 00:00:00 2001 From: Tom Lane <tgl@sss.pgh.pa.us> Date: Sun, 11 May 2025 13:38:31 -0400 Subject: [PATCH v1 09/11] Avoid per-relation leakage in autovacuum. PgStat_StatTabEntry and AutoVacOpts structs were leaked until the end of the autovacuum worker's run, which is bad news if there are a lot of relations in the database. Note: pfree'ing the PgStat_StatTabEntry structs here seems a bit risky, because pgstat_fetch_stat_tabentry_ext does not guarantee anything about whether its result is long-lived. That API could use a re-think. Also, free a couple of structures retail when USE_VALGRIND. This doesn't have any effect on the process's total memory consumption, but it does reduce noise in valgrind leakage reports. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us --- src/backend/postmaster/autovacuum.c | 47 ++++++++++++++++++++++++----- 1 file changed, 40 insertions(+), 7 deletions(-) diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index 4d4a1a3197e..ebe8af1056d 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -2077,6 +2077,12 @@ do_autovacuum(void) } } } + + /* Release stuff to avoid per-relation leakage */ + if (relopts) + pfree(relopts); + if (tabentry) + pfree(tabentry); } table_endscan(relScan); @@ -2093,7 +2099,8 @@ do_autovacuum(void) Form_pg_class classForm = (Form_pg_class) GETSTRUCT(tuple); PgStat_StatTabEntry *tabentry; Oid relid; - AutoVacOpts *relopts = NULL; + AutoVacOpts *relopts; + bool free_relopts = false; bool dovacuum; bool doanalyze; bool wraparound; @@ -2111,7 +2118,9 @@ do_autovacuum(void) * main rel */ relopts = extract_autovac_opts(tuple, pg_class_desc); - if (relopts == NULL) + if (relopts) + free_relopts = true; + else { av_relation *hentry; bool found; @@ -2132,6 +2141,12 @@ do_autovacuum(void) /* ignore analyze for toast tables */ if (dovacuum) table_oids = lappend_oid(table_oids, relid); + + /* Release stuff to avoid leakage */ + if (free_relopts) + pfree(relopts); + if (tabentry) + pfree(tabentry); } table_endscan(relScan); @@ -2503,6 +2518,8 @@ deleted: pg_atomic_test_set_flag(&MyWorkerInfo->wi_dobalance); } + list_free(table_oids); + /* * Perform additional work items, as requested by backends. */ @@ -2545,8 +2562,15 @@ deleted: /* * We leak table_toast_map here (among other things), but since we're - * going away soon, it's not a problem. + * going away soon, it's not a problem normally. But when using Valgrind, + * release some stuff to reduce complaints about leaked storage. */ +#ifdef USE_VALGRIND + hash_destroy(table_toast_map); + FreeTupleDesc(pg_class_desc); + if (bstrategy) + pfree(bstrategy); +#endif /* * Update pg_database.datfrozenxid, and truncate pg_xact if possible. We @@ -2684,8 +2708,8 @@ deleted2: /* * extract_autovac_opts * - * Given a relation's pg_class tuple, return the AutoVacOpts portion of - * reloptions, if set; otherwise, return NULL. + * Given a relation's pg_class tuple, return a palloc'd copy of the + * AutoVacOpts portion of reloptions, if set; otherwise, return NULL. * * Note: callers do not have a relation lock on the table at this point, * so the table could have been dropped, and its catalog rows gone, after @@ -2734,6 +2758,7 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map, autovac_table *tab = NULL; bool wraparound; AutoVacOpts *avopts; + bool free_avopts = false; /* fetch the relation's relcache entry */ classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid)); @@ -2746,8 +2771,10 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map, * main table reloptions if the toast table itself doesn't have. */ avopts = extract_autovac_opts(classTup, pg_class_desc); - if (classForm->relkind == RELKIND_TOASTVALUE && - avopts == NULL && table_toast_map != NULL) + if (avopts) + free_avopts = true; + else if (classForm->relkind == RELKIND_TOASTVALUE && + table_toast_map != NULL) { av_relation *hentry; bool found; @@ -2856,6 +2883,8 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map, avopts->vacuum_cost_delay >= 0)); } + if (free_avopts) + pfree(avopts); heap_freetuple(classTup); return tab; } @@ -2887,6 +2916,10 @@ recheck_relation_needs_vacanalyze(Oid relid, effective_multixact_freeze_max_age, dovacuum, doanalyze, wraparound); + /* Release tabentry to avoid leakage */ + if (tabentry) + pfree(tabentry); + /* ignore ANALYZE for toast tables */ if (classForm->relkind == RELKIND_TOASTVALUE) *doanalyze = false; -- 2.43.5 From 9815596aaecaec254c638e1a150cac7860954f1c Mon Sep 17 00:00:00 2001 From: Tom Lane <tgl@sss.pgh.pa.us> Date: Sun, 11 May 2025 13:44:22 -0400 Subject: [PATCH v1 10/11] Silence complaints about leaks in load_domaintype_info(). This code intentionally leaks some intermediate cruft into the long-lived DomainConstraintCache's memory context, reasoning that the amount of leakage will typically not be much so it's not worth doing a copyObject() of the final tree to avoid that. But Valgrind knows nothing of engineering tradeoffs and complains anyway. So modify the code to do it the slow clean way when USE_VALGRIND. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us --- src/backend/utils/cache/typcache.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c index f9aec38a11f..bf0e362eae6 100644 --- a/src/backend/utils/cache/typcache.c +++ b/src/backend/utils/cache/typcache.c @@ -1189,8 +1189,20 @@ load_domaintype_info(TypeCacheEntry *typentry) dcc->dccRefCount = 0; } - /* Create node trees in DomainConstraintCache's context */ + /* + * The clean way to do the next bit is to run stringToNode() and + * expression_planner() in the short-lived caller's context, then + * copy the result into the DomainConstraintCache's dccContext. + * Under Valgrind, we do it that way to silence complaints about + * leaking cruft into dccContext. However, the amount of stuff + * leaked is usually pretty minimal, and the cost of the extra + * copyObject() call is not negligible. So when not catering to + * Valgrind, we do it all in dccContext and accept some leakage. + */ +#ifndef USE_VALGRIND + /* Do all this work in DomainConstraintCache's context */ oldcxt = MemoryContextSwitchTo(dcc->dccContext); +#endif check_expr = (Expr *) stringToNode(constring); @@ -1206,10 +1218,19 @@ load_domaintype_info(TypeCacheEntry *typentry) */ check_expr = expression_planner(check_expr); +#ifdef USE_VALGRIND + /* Create only the minimally needed stuff in dccContext */ + oldcxt = MemoryContextSwitchTo(dcc->dccContext); +#endif + r = makeNode(DomainConstraintState); r->constrainttype = DOM_CONSTRAINT_CHECK; r->name = pstrdup(NameStr(c->conname)); +#ifdef USE_VALGRIND + r->check_expr = copyObject(check_expr); +#else r->check_expr = check_expr; +#endif r->check_exprstate = NULL; MemoryContextSwitchTo(oldcxt); -- 2.43.5 From 7adfb9e9dee4725c8950a49dabd9cac5999bb691 Mon Sep 17 00:00:00 2001 From: Tom Lane <tgl@sss.pgh.pa.us> Date: Sun, 11 May 2025 14:50:10 -0400 Subject: [PATCH v1 11/11] WIP: reduce leakages during PL/pgSQL function compilation. format_procedure leaks memory, so run it in a short-lived context not the session-lifespan cache context for the PL/pgSQL function. parse_datatype called the core parser in the function's long-lived cache context, thus leaking potentially a lot of storage into that context. We were also being a bit careless with the TypeName structures made in that code path and others. Most of the time we don't need to retain the TypeName, so make sure it is made in the short-lived temp context, and copy it only if we do need to retain it. These are far from the only leaks in PL/pgSQL compilation, but they're the biggest as far as I've seen. However, this is just WIP and may not get committed in this form at all: it's not clear that this line of attack is enough to remove all leaks in this area. It might be better to try to run exclusively within the short-term context and explicitly allocate what needs to go into the function parsetree. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us --- src/pl/plpgsql/src/pl_comp.c | 28 ++++++++++++++++++++++------ src/pl/plpgsql/src/pl_gram.y | 8 +++++++- src/tools/valgrind.supp | 11 ----------- 3 files changed, 29 insertions(+), 18 deletions(-) diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c index 519f7695d7c..5f3e8646fbb 100644 --- a/src/pl/plpgsql/src/pl_comp.c +++ b/src/pl/plpgsql/src/pl_comp.c @@ -177,6 +177,7 @@ plpgsql_compile_callback(FunctionCallInfo fcinfo, yyscan_t scanner; Datum prosrcdatum; char *proc_source; + char *proc_signature; HeapTuple typeTup; Form_pg_type typeStruct; PLpgSQL_variable *var; @@ -223,6 +224,9 @@ plpgsql_compile_callback(FunctionCallInfo fcinfo, plpgsql_check_syntax = forValidator; plpgsql_curr_compile = function; + /* format_procedure leaks memory, so run it in temp context */ + proc_signature = format_procedure(fcinfo->flinfo->fn_oid); + /* * All the permanent output of compilation (e.g. parse tree) is kept in a * per-function memory context, so it can be reclaimed easily. @@ -232,7 +236,7 @@ plpgsql_compile_callback(FunctionCallInfo fcinfo, ALLOCSET_DEFAULT_SIZES); plpgsql_compile_tmp_cxt = MemoryContextSwitchTo(func_cxt); - function->fn_signature = format_procedure(fcinfo->flinfo->fn_oid); + function->fn_signature = pstrdup(proc_signature); MemoryContextSetIdentifier(func_cxt, function->fn_signature); function->fn_oid = fcinfo->flinfo->fn_oid; function->fn_input_collation = fcinfo->fncollation; @@ -1658,6 +1662,11 @@ plpgsql_parse_wordrowtype(char *ident) { Oid classOid; Oid typOid; + TypeName *typName; + MemoryContext oldCxt; + + /* Avoid memory leaks in long-term function context */ + oldCxt = MemoryContextSwitchTo(plpgsql_compile_tmp_cxt); /* * Look up the relation. Note that because relation rowtypes have the @@ -1680,9 +1689,12 @@ plpgsql_parse_wordrowtype(char *ident) errmsg("relation \"%s\" does not have a composite type", ident))); + typName = makeTypeName(ident); + + MemoryContextSwitchTo(oldCxt); + /* Build and return the row type struct */ - return plpgsql_build_datatype(typOid, -1, InvalidOid, - makeTypeName(ident)); + return plpgsql_build_datatype(typOid, -1, InvalidOid, typName); } /* ---------- @@ -1696,6 +1708,7 @@ plpgsql_parse_cwordrowtype(List *idents) Oid classOid; Oid typOid; RangeVar *relvar; + TypeName *typName; MemoryContext oldCxt; /* @@ -1718,11 +1731,12 @@ plpgsql_parse_cwordrowtype(List *idents) errmsg("relation \"%s\" does not have a composite type", relvar->relname))); + typName = makeTypeNameFromNameList(idents); + MemoryContextSwitchTo(oldCxt); /* Build and return the row type struct */ - return plpgsql_build_datatype(typOid, -1, InvalidOid, - makeTypeNameFromNameList(idents)); + return plpgsql_build_datatype(typOid, -1, InvalidOid, typName); } /* @@ -1937,6 +1951,8 @@ plpgsql_build_recfield(PLpgSQL_rec *rec, const char *fldname) * origtypname is the parsed form of what the user wrote as the type name. * It can be NULL if the type could not be a composite type, or if it was * identified by OID to begin with (e.g., it's a function argument type). + * origtypname is in short-lived storage and must be copied if we choose + * to incorporate it into the function's parse tree. */ PLpgSQL_type * plpgsql_build_datatype(Oid typeOid, int32 typmod, @@ -2055,7 +2071,7 @@ build_datatype(HeapTuple typeTup, int32 typmod, errmsg("type %s is not composite", format_type_be(typ->typoid)))); - typ->origtypname = origtypname; + typ->origtypname = copyObject(origtypname); typ->tcache = typentry; typ->tupdesc_id = typentry->tupDesc_identifier; } diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y index 5612e66d023..6d53da4e79a 100644 --- a/src/pl/plpgsql/src/pl_gram.y +++ b/src/pl/plpgsql/src/pl_gram.y @@ -3848,6 +3848,7 @@ parse_datatype(const char *string, int location, yyscan_t yyscanner) int32 typmod; sql_error_callback_arg cbarg; ErrorContextCallback syntax_errcontext; + MemoryContext oldCxt; cbarg.location = location; cbarg.yyscanner = yyscanner; @@ -3857,9 +3858,14 @@ parse_datatype(const char *string, int location, yyscan_t yyscanner) syntax_errcontext.previous = error_context_stack; error_context_stack = &syntax_errcontext; - /* Let the main parser try to parse it under standard SQL rules */ + /* + * Let the main parser try to parse it under standard SQL rules. The + * parser leaks memory, so run it in temp context. + */ + oldCxt = MemoryContextSwitchTo(plpgsql_compile_tmp_cxt); typeName = typeStringToTypeName(string, NULL); typenameTypeIdAndMod(NULL, typeName, &type_id, &typmod); + MemoryContextSwitchTo(oldCxt); /* Restore former ereport callback */ error_context_stack = syntax_errcontext.previous; diff --git a/src/tools/valgrind.supp b/src/tools/valgrind.supp index 7d4fb0b2c1d..80268d6eed8 100644 --- a/src/tools/valgrind.supp +++ b/src/tools/valgrind.supp @@ -201,17 +201,6 @@ fun:MemoryContextAllocAligned } -# Suppress complaints about stuff leaked by PL/pgSQL parsing. -# This ought to be fixed properly, instead. -{ - hide_plpgsql_parsing_leaks - Memcheck:Leak - match-leak-kinds: definite,possible,indirect - - ... - fun:plpgsql_compile_callback -} - # And likewise for stuff leaked by SQL-function parsing. # This ought to be fixed properly, instead. { -- 2.43.5
I wrote: > Okay, here is a patch series that updates the > 0001-Make-memory-contexts-themselves-more-visible-to-valg.patch > patch you posted in that thread, I forgot to mention that I did try to implement the "two-level pool" scheme that the Valgrind documentation talks about, and could not make it work. There seem to be undocumented interactions between the outer and inner chunks, and it's not real clear to me that there's not outright bugs. Anyway, AFAICS that scheme would bring us no immediate advantages anyway, compared to the flat approach of just adding mempool chunks for the allocators' management areas. regards, tom lane
On 2025-May-11, Tom Lane wrote: > Okay, here is a patch series that updates the > 0001-Make-memory-contexts-themselves-more-visible-to-valg.patch > patch you posted in that thread, and makes various follow-up > fixes that either fix or paper over various leaks. Wow, that's a lot of extra fixes. I did a quick test run with all patches save the last, then run tests "test_setup plpgsql tablespace" to see if I'd get a leak report about plpgsql (to verify the setup was correct), and I did. Rerunning after applying that patch silences what seems to be most of them. One source of leaks still present is LLVM. It's not large numbers, 140303.log:==00:00:12:42.244 140303== possibly lost: 18,336 bytes in 207 blocks 140442.log:==00:00:13:43.070 140442== possibly lost: 18,456 bytes in 210 blocks 140729.log:==00:00:16:09.802 140729== possibly lost: 64,120 bytes in 840 blocks 140892.log:==00:00:17:39.300 140892== possibly lost: 18,128 bytes in 202 blocks 141001.log:==00:00:18:31.631 141001== possibly lost: 18,128 bytes in 202 blocks 141031.log:==00:00:19:03.528 141031== possibly lost: 64,232 bytes in 717 blocks 141055.log:==00:00:20:11.536 141055== possibly lost: 18,128 bytes in 202 blocks 141836.log:==00:00:29:10.344 141836== definitely lost: 29,666 bytes in 3,175 blocks 141836.log:==00:00:29:10.344 141836== indirectly lost: 13,977 bytes in 1,138 blocks 142061.log:==00:00:32:26.264 142061== possibly lost: 18,128 bytes in 202 blocks (The installcheck run is still going, but 90 tests in, those are the only reports of ~10kB or more). > In particular, I had not realized that autovacuum > leaks a nontrivial amount of memory per relation processed (cf 0009), > and apparently has done for a few releases now. This is horrid in > databases with many tables, and I'm surprised that we've not gotten > complaints about it. In PGConf Germany we got a lightning talk[1] that reported a problem that might be explained by this: with 10 million of relations, the OOM killer gets invoked on autovacuum workers on the reported case, so essentially autovacuum doesn't work at all. So clearly there is somebody that would appreciate that this problem is fixed. He actually blames it on relcache, but who knows how correct that is. Not much to see on the slides other than they turn to vacuumdb instead, but they're here: [1] https://www.postgresql.eu/events/pgconfde2025/sessions/session/6625/slides/681/06%20-%20LT%20-%20Jonas%20Marasus%20-%20War%20Story%20How%20Big%20Is%20Too%20Big%20For%20a%20Schema.pdf -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Computing is too important to be left to men." (Karen Spärck Jones)
=?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@kurilemu.de> writes: > On 2025-May-11, Tom Lane wrote: >> In particular, I had not realized that autovacuum >> leaks a nontrivial amount of memory per relation processed (cf 0009), >> and apparently has done for a few releases now. This is horrid in >> databases with many tables, and I'm surprised that we've not gotten >> complaints about it. > In PGConf Germany we got a lightning talk[1] that reported a problem > that might be explained by this: with 10 million of relations, the OOM > killer gets invoked on autovacuum workers on the reported case, so > essentially autovacuum doesn't work at all. So clearly there is somebody > that would appreciate that this problem is fixed. Interesting. > He actually blames it on relcache, but who knows how correct that is. I would not be surprised if the relcache is bloated too, but Valgrind wouldn't think that's a leak. I wonder if it'd be worth setting up a mechanism for autovacuum to drop the relcache entry for a table once it's done with it. regards, tom lane
On Mon, May 12, 2025 at 12:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> And, since there's nothing new under the sun around here,
> we already had a discussion about that back in 2021:
> https://www.postgresql.org/message-id/flat/3471359.1615937770%40sss.pgh.pa.us
> That thread seems to have led to fixing some specific bugs,
> but we never committed any of the discussed valgrind infrastructure
> improvements. I'll have a go at resurrecting that...
Okay, here is a patch series that updates the
0001-Make-memory-contexts-themselves-more-visible-to-valg.patch
patch you posted in that thread, and makes various follow-up
fixes that either fix or paper over various leaks. Some of it
is committable I think, but other parts are just WIP. Anyway,
as of the 0010 patch we can run through the core regression tests
and see no more than a couple of kilobytes total reported leakage
in any process, except for two tests that expose leaks in TS
dictionary building. (That's fixable but I ran out of time,
and I wanted to get this posted before Montreal.) There is
work left to do before we can remove the suppressions added in
0002, but this is already huge progress compared to where we were.
A couple of these patches are bug fixes that need to be applied and
even back-patched. In particular, I had not realized that autovacuum
leaks a nontrivial amount of memory per relation processed (cf 0009),
and apparently has done for a few releases now. This is horrid in
databases with many tables, and I'm surprised that we've not gotten
complaints about it.
regards, tom lane
Thanks for sharing the patch series. I've applied the patches on my end and rerun the tests. Valgrind now reports 8 bytes leakage only, and the previously noisy outputs are almost entirely gone.
Here's valgrind output:
==00:00:01:50.385 90463== LEAK SUMMARY:
==00:00:01:50.385 90463== definitely lost: 8 bytes in 1 blocks
==00:00:01:50.385 90463== indirectly lost: 0 bytes in 0 blocks
==00:00:01:50.385 90463== possibly lost: 0 bytes in 0 blocks
==00:00:01:50.385 90463== still reachable: 1,182,132 bytes in 2,989 blocks
==00:00:01:50.385 90463== suppressed: 0 bytes in 0 blocks
==00:00:01:50.385 90463== Rerun with --leak-check=full to see details of leaked memory
==00:00:01:50.385 90463==
==00:00:01:50.385 90463== For lists of detected and suppressed errors, rerun with: -s
==00:00:01:50.385 90463== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 34 from 3)
Here's valgrind output:
==00:00:01:50.385 90463== LEAK SUMMARY:
==00:00:01:50.385 90463== definitely lost: 8 bytes in 1 blocks
==00:00:01:50.385 90463== indirectly lost: 0 bytes in 0 blocks
==00:00:01:50.385 90463== possibly lost: 0 bytes in 0 blocks
==00:00:01:50.385 90463== still reachable: 1,182,132 bytes in 2,989 blocks
==00:00:01:50.385 90463== suppressed: 0 bytes in 0 blocks
==00:00:01:50.385 90463== Rerun with --leak-check=full to see details of leaked memory
==00:00:01:50.385 90463==
==00:00:01:50.385 90463== For lists of detected and suppressed errors, rerun with: -s
==00:00:01:50.385 90463== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 34 from 3)
Regards,
Yasir Hussain
Data Bene
Yasir Hussain
Data Bene
I wrote: > Here's a v2 patchset that reaches the goal of zero reported leaks > in the core regression tests, with some caveats: Oh, another caveat is that I ran this with a fairly minimalistic set of build options. In a more complete build, I observed a small leak in xml.c, which I posted a fix for in another thread [1]. I also see the leakage Alvaro mentioned with --with-llvm. I'm not sure how real that is though, because AFAICS all of it is reported as "possibly lost", not "definitely lost" or "indirectly lost". In Valgrind-speak that means there are pointers leading to the chunk, just not pointing right at its start. So this could be the result of allocation tricks being played by the C++ compiler. The Valgrind manual talks about some heuristics they use to handle C++ coding patterns, but they don't seem to help in my environment. In any case, the allocation points are mostly far enough down into LLVM functions that if the leaks are real, I'd tend to call them LLVM's bugs not ours. I haven't really tried our non-core test suites yet. Out of curiosity I ran the plperl, plpython, and pltcl suites. All of them show pretty enormous amounts of "possibly lost" data, which again seems likely to be an artifact of allocation games within those libraries rather than real leaks. I wonder if they have "valgrind friendly" build options that we'd need to use to get sane results? regards, tom lane [1] https://www.postgresql.org/message-id/1358967.1747858817%40sss.pgh.pa.us
Hi, 0001: This is rather wild, I really have only the dimmest memory of that other thread even though I apparently resorted to reading valgrind's source code... I think the vchunk/vpool naming, while not necessarily elegant, is helpful. 0002: Makes sense. 0003: 0004: 0005: Ugh, that's rather gross. Not that I have a better idea. So it's probably worth doing ... 0006: LGTM 0007: + /* Run the rest in xact context to avoid Valgrind leak complaints */ + MemoryContextSwitchTo(TopTransactionContext); It seems like it also protects at least somewhat against actual leaks? 0008: LGTM 0009: Seems like we really ought to do more here. But it's a substantial improvement, so let's not let perfect be the enemy of good. 0010: 0011: Not great, but better than not doing anything. 0012: Hm. Kinda feels like we should just always do it the USE_VALGRIND approach. ISTM that if domain typecache entry building is a bottleneck, there are way bigger problems than a copyObject()... 0014: I guess I personally would try to put the freeing code into a helper, but it's a close call. 0015: I'd probably put the list_free() after the LWLockRelease(LogicalRepWorkerLock)? 0016: Agreed with the concern stated in the commit message, but addressing that would obviously be a bigger project. 0017: A tad weird to leave the comments above the removed = NILs in place, even though it's obviously still correct. 0018: LGTM. > But this is the last step to get to zero reported leaks in a run of the core > regression tests, so let's do it. I assume that's just about the core tests, not more? I.e. I can't make skink enable leak checking? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > [ review ] Thanks for the comments! I'll go through them and post an updated version tomorrow. The cfbot is already nagging me for a rebase now that 0013 is moot. >> But this is the last step to get to zero reported leaks in a run of the core >> regression tests, so let's do it. > I assume that's just about the core tests, not more? I.e. I can't make skink > enable leak checking? No, we're not there yet. I've identified some other backend issues (in postgres_fdw in particular), and I've not looked at frontend programs at all. For most frontend programs, I'm dubious how much we care. Actually the big problem is I don't know what to do about plperl/plpython/pltcl. I suppose the big-hammer approach would be to put in suppression patterns covering those, at least till such time as someone has a better idea. I'm envisioning this patch series as v19 work, were you thinking we should be more aggressive? regards, tom lane
Hi, On 2025-05-22 21:48:24 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > >> But this is the last step to get to zero reported leaks in a run of the core > >> regression tests, so let's do it. > > > I assume that's just about the core tests, not more? I.e. I can't make skink > > enable leak checking? > > No, we're not there yet. I've identified some other backend issues (in > postgres_fdw in particular), and I've not looked at frontend programs > at all. For most frontend programs, I'm dubious how much we care. Skink only tests backend stuff anyway, but the other backend issues make it a no go for no... > I'm envisioning this patch series as v19 work, were you > thinking we should be more aggressive? Mostly agreed - but I am wondering if the AV fix should be backpatched? Greetings, Andres Freund
On Fri, May 23, 2025 at 11:42 AM Andres Freund <andres@anarazel.de> wrote: > > I'm envisioning this patch series as v19 work, were you > > thinking we should be more aggressive? > > Mostly agreed - but I am wondering if the AV fix should be backpatched? I think that it probably should be. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Fri, May 23, 2025 at 11:42 AM Andres Freund <andres@anarazel.de> wrote: >>> I'm envisioning this patch series as v19 work, were you >>> thinking we should be more aggressive? >> Mostly agreed - but I am wondering if the AV fix should be backpatched? > I think that it probably should be. Yeah, I agree, but didn't get to that yet. I've also gone ahead with committing a couple of clear bugs that I located while doing this. regards, tom lane