Обсуждение: Re: Why our Valgrind reports suck

Поиск
Список
Период
Сортировка

Re: Why our Valgrind reports suck

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2025-05-08 22:04:06 -0400, Tom Lane wrote:
>> A nearby thread [1] reminded me to wonder why we seem to have
>> so many false-positive leaks reported by Valgrind these days.

> Huh. We use the memory pool client requests to inform valgrind about memory
> contexts. I seem to recall that that "hid" many leak warnings from valgrind. I
> wonder if we somehow broke (or weakened) that.

The problem with dynahash has been there since day one, so I think
we've just gotten used to ignoring "leak" reports associated with
hash tables --- I have, anyway.  But the other triggers I listed
have appeared within the last five-ish years, if memory serves,
and we just didn't notice because of the existing dynahash noise.

> We currently don't reset TopMemoryContext at exit, which, obviously, does
> massively increase the number of leaks. But OTOH, without that there's not a
> whole lot of value in the leak check...

Hmm.  Yeah, we could just reset or delete TopMemoryContext, but
surely that would be counterproductive.  It would mask any real
leak of palloc'd blocks.  I'm a little suspicious of your other
idea of shutting down the caches, for the same reason: I wonder
if it wouldn't hide leaks rather than help find them.

One thing I noticed while reading the Valgrind manual is that
they describe a facility for "two level" tracking of custom
allocators such as ours.  Apparently, what you're really supposed
to do is use VALGRIND_MEMPOOL_ALLOC to mark the malloc blocks
that the allocator works in, and VALGRIND_MALLOCLIKE_BLOCK to
mark the sub-blocks handed out by the allocator.  I wonder if this
feature postdates our implementation of Valgrind support, and I
wonder even more if using it would improve our results.

I did experiment with marking context headers as accessible with
VALGRIND_MEMPOOL_ALLOC, and that made the complaints about
MemoryContextCopyAndSetIdentifier strings go away, confirming
that Valgrind is simply not considering the context->ident
pointers.  Unfortunately it also added a bunch of other failures,
so that evidently is Not The Right Thing.  I suspect what is
going on is related to this bit in valgrind.h:

   For Memcheck users: if you use VALGRIND_MALLOCLIKE_BLOCK to carve out
   custom blocks from within a heap block, B, that has been allocated with
   malloc/calloc/new/etc, then block B will be *ignored* during leak-checking
   -- the custom blocks will take precedence.

We're not using VALGRIND_MALLOCLIKE_BLOCK (yet, anyway), but I'm
suspecting that Valgrind probably also ignores heap blocks that
match VALGRIND_CREATE_MEMPOOL requests, except for the portions
thereof that are covered by VALGRIND_MEMPOOL_ALLOC requests.

Anyway, I'm now feeling motivated to go try some experiments.
Watch this space ...

            regards, tom lane



Re: Why our Valgrind reports suck

От
Tom Lane
Дата:
I wrote:
> One thing I noticed while reading the Valgrind manual is that
> they describe a facility for "two level" tracking of custom
> allocators such as ours.

And, since there's nothing new under the sun around here,
we already had a discussion about that back in 2021:

https://www.postgresql.org/message-id/flat/3471359.1615937770%40sss.pgh.pa.us

That thread seems to have led to fixing some specific bugs,
but we never committed any of the discussed valgrind infrastructure
improvements.  I'll have a go at resurrecting that...

            regards, tom lane



Re: Why our Valgrind reports suck

От
Tom Lane
Дата:
I wrote:
> And, since there's nothing new under the sun around here,
> we already had a discussion about that back in 2021:
> https://www.postgresql.org/message-id/flat/3471359.1615937770%40sss.pgh.pa.us
> That thread seems to have led to fixing some specific bugs,
> but we never committed any of the discussed valgrind infrastructure
> improvements.  I'll have a go at resurrecting that...

Okay, here is a patch series that updates the
0001-Make-memory-contexts-themselves-more-visible-to-valg.patch
patch you posted in that thread, and makes various follow-up
fixes that either fix or paper over various leaks.  Some of it
is committable I think, but other parts are just WIP.  Anyway,
as of the 0010 patch we can run through the core regression tests
and see no more than a couple of kilobytes total reported leakage
in any process, except for two tests that expose leaks in TS
dictionary building.  (That's fixable but I ran out of time,
and I wanted to get this posted before Montreal.)  There is
work left to do before we can remove the suppressions added in
0002, but this is already huge progress compared to where we were.

A couple of these patches are bug fixes that need to be applied and
even back-patched.  In particular, I had not realized that autovacuum
leaks a nontrivial amount of memory per relation processed (cf 0009),
and apparently has done for a few releases now.  This is horrid in
databases with many tables, and I'm surprised that we've not gotten
complaints about it.

            regards, tom lane

From d4ca3e2e4824ff1f11ee5324b4ba46c4dd971ce9 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun, 11 May 2025 12:39:59 -0400
Subject: [PATCH v1 01/11] Improve our support for Valgrind's leak tracking.

When determining whether an allocated chunk is still reachable,
Valgrind will consider only pointers within what it believes to be
allocated chunks.  Normally, all of a block obtained from malloc()
would be considered "allocated" --- but it turns out that if we use
VALGRIND_MEMPOOL_ALLOC to designate sub-section(s) of a malloc'ed
block as allocated, all the rest of that malloc'ed block is ignored.
This leads to lots of false positives of course.  In particular,
in any multi-malloc-block context, all but the primary block were
reported as leaked.  We also had a problem with context "ident"
strings, which were reported as leaked unless there was some other
pointer to them besides the one in the context header.

To fix, we need to use VALGRIND_MEMPOOL_ALLOC to designate
a context's management structs (the context struct itself and
any per-block headers) as allocated chunks.  That forces moving
the VALGRIND_CREATE_MEMPOOL/VALGRIND_DESTROY_MEMPOOL calls into
the per-context-type code, so that the pool identifier can be
made as soon as we've allocated the initial block, but otherwise
it's fairly straightforward.  Note that in Valgrind's eyes there
is no distinction between these allocations and the allocations
that the mmgr modules hand out to user code.  That's fine for
now, but perhaps someday we'll want to do better yet.

Author: Andres Freund <andres@anarazel.de>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
Discussion: https://postgr.es/m/20210317181531.7oggpqevzz6bka3g@alap3.anarazel.de
---
 src/backend/utils/mmgr/aset.c       | 69 ++++++++++++++++++++++++++++-
 src/backend/utils/mmgr/bump.c       | 31 ++++++++++++-
 src/backend/utils/mmgr/generation.c | 29 ++++++++++++
 src/backend/utils/mmgr/mcxt.c       | 15 ++++---
 src/backend/utils/mmgr/slab.c       | 32 +++++++++++++
 src/include/utils/memdebug.h        |  1 +
 6 files changed, 168 insertions(+), 9 deletions(-)

diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 666ecd8f78d..21490213842 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -103,6 +103,8 @@

 #define ALLOC_BLOCKHDRSZ    MAXALIGN(sizeof(AllocBlockData))
 #define ALLOC_CHUNKHDRSZ    sizeof(MemoryChunk)
+#define FIRST_BLOCKHDRSZ    (MAXALIGN(sizeof(AllocSetContext)) + \
+                             ALLOC_BLOCKHDRSZ)

 typedef struct AllocBlockData *AllocBlock;    /* forward reference */

@@ -458,6 +460,21 @@ AllocSetContextCreateInternal(MemoryContext parent,
      * we'd leak the header/initial block if we ereport in this stretch.
      */

+    /* Create a Valgrind "pool" associated with the context */
+    VALGRIND_CREATE_MEMPOOL(set, 0, false);
+
+    /*
+     * Create a Valgrind "chunk" covering both the AllocSetContext struct and
+     * the keeper block's header.  (It would be more sensible for these to be
+     * two separate chunks, but doing that seems to tickle bugs in some
+     * versions of Valgrind.)  We must have these chunks, and also a chunk for
+     * each subsequently-added block header, so that Valgrind considers the
+     * pointers within them while checking for leaked memory.  Note that
+     * Valgrind doesn't distinguish between these chunks and those created by
+     * mcxt.c for the user-accessible-data chunks we allocate.
+     */
+    VALGRIND_MEMPOOL_ALLOC(set, set, FIRST_BLOCKHDRSZ);
+
     /* Fill in the initial block's block header */
     block = KeeperBlock(set);
     block->aset = set;
@@ -585,6 +602,14 @@ AllocSetReset(MemoryContext context)
 #ifdef CLOBBER_FREED_MEMORY
             wipe_mem(block, block->freeptr - ((char *) block));
 #endif
+
+            /*
+             * We need to free Valgrind's block-header chunks explicitly,
+             * although the user-data chunks within will go away in the TRIM
+             * below.  Otherwise Valgrind complains about leaked allocations.
+             */
+            VALGRIND_MEMPOOL_FREE(set, block);
+
             free(block);
         }
         block = next;
@@ -592,6 +617,14 @@ AllocSetReset(MemoryContext context)

     Assert(context->mem_allocated == keepersize);

+    /*
+     * Instruct Valgrind to throw away all the chunks associated with this
+     * context, except for the one covering the AllocSetContext and
+     * keeper-block header.  This gets rid of the chunks for whatever user
+     * data is getting discarded by the context reset.
+     */
+    VALGRIND_MEMPOOL_TRIM(set, set, FIRST_BLOCKHDRSZ);
+
     /* Reset block size allocation sequence, too */
     set->nextBlockSize = set->initBlockSize;
 }
@@ -648,6 +681,9 @@ AllocSetDelete(MemoryContext context)
                 freelist->first_free = (AllocSetContext *) oldset->header.nextchild;
                 freelist->num_free--;

+                /* Destroy the context's Valgrind pool --- see notes below */
+                VALGRIND_DESTROY_MEMPOOL(oldset);
+
                 /* All that remains is to free the header/initial block */
                 free(oldset);
             }
@@ -675,13 +711,24 @@ AllocSetDelete(MemoryContext context)
 #endif

         if (!IsKeeperBlock(set, block))
+        {
+            /* As in AllocSetReset, free block-header chunks explicitly */
+            VALGRIND_MEMPOOL_FREE(set, block);
             free(block);
+        }

         block = next;
     }

     Assert(context->mem_allocated == keepersize);

+    /*
+     * Destroy the Valgrind pool.  We don't seem to need to explicitly free
+     * the initial block's header chunk, nor any user-data chunks that
+     * Valgrind still knows about.
+     */
+    VALGRIND_DESTROY_MEMPOOL(set);
+
     /* Finally, free the context header, including the keeper block */
     free(set);
 }
@@ -716,6 +763,9 @@ AllocSetAllocLarge(MemoryContext context, Size size, int flags)
     if (block == NULL)
         return MemoryContextAllocationFailure(context, size, flags);

+    /* Make a Valgrind chunk covering the new block's header */
+    VALGRIND_MEMPOOL_ALLOC(set, block, ALLOC_BLOCKHDRSZ);
+
     context->mem_allocated += blksize;

     block->aset = set;
@@ -922,6 +972,9 @@ AllocSetAllocFromNewBlock(MemoryContext context, Size size, int flags,
     if (block == NULL)
         return MemoryContextAllocationFailure(context, size, flags);

+    /* Make a Valgrind chunk covering the new block's header */
+    VALGRIND_MEMPOOL_ALLOC(set, block, ALLOC_BLOCKHDRSZ);
+
     context->mem_allocated += blksize;

     block->aset = set;
@@ -1104,6 +1157,10 @@ AllocSetFree(void *pointer)
 #ifdef CLOBBER_FREED_MEMORY
         wipe_mem(block, block->freeptr - ((char *) block));
 #endif
+
+        /* As in AllocSetReset, free block-header chunks explicitly */
+        VALGRIND_MEMPOOL_FREE(set, block);
+
         free(block);
     }
     else
@@ -1184,6 +1241,7 @@ AllocSetRealloc(void *pointer, Size size, int flags)
          * realloc() to make the containing block bigger, or smaller, with
          * minimum space wastage.
          */
+        AllocBlock    newblock;
         Size        chksize;
         Size        blksize;
         Size        oldblksize;
@@ -1223,14 +1281,21 @@ AllocSetRealloc(void *pointer, Size size, int flags)
         blksize = chksize + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
         oldblksize = block->endptr - ((char *) block);

-        block = (AllocBlock) realloc(block, blksize);
-        if (block == NULL)
+        newblock = (AllocBlock) realloc(block, blksize);
+        if (newblock == NULL)
         {
             /* Disallow access to the chunk header. */
             VALGRIND_MAKE_MEM_NOACCESS(chunk, ALLOC_CHUNKHDRSZ);
             return MemoryContextAllocationFailure(&set->header, size, flags);
         }

+        /*
+         * Move Valgrind's block-header chunk explicitly.  (mcxt.c will take
+         * care of moving the chunk for the user data.)
+         */
+        VALGRIND_MEMPOOL_CHANGE(set, block, newblock, ALLOC_BLOCKHDRSZ);
+        block = newblock;
+
         /* updated separately, not to underflow when (oldblksize > blksize) */
         set->header.mem_allocated -= oldblksize;
         set->header.mem_allocated += blksize;
diff --git a/src/backend/utils/mmgr/bump.c b/src/backend/utils/mmgr/bump.c
index f7a37d1b3e8..271bf78c583 100644
--- a/src/backend/utils/mmgr/bump.c
+++ b/src/backend/utils/mmgr/bump.c
@@ -45,7 +45,9 @@
 #include "utils/memutils_memorychunk.h"
 #include "utils/memutils_internal.h"

-#define Bump_BLOCKHDRSZ    MAXALIGN(sizeof(BumpBlock))
+#define Bump_BLOCKHDRSZ        MAXALIGN(sizeof(BumpBlock))
+#define FIRST_BLOCKHDRSZ    (MAXALIGN(sizeof(BumpContext)) + \
+                             Bump_BLOCKHDRSZ)

 /* No chunk header unless built with MEMORY_CONTEXT_CHECKING */
 #ifdef MEMORY_CONTEXT_CHECKING
@@ -189,6 +191,12 @@ BumpContextCreate(MemoryContext parent, const char *name, Size minContextSize,
      * Avoid writing code that can fail between here and MemoryContextCreate;
      * we'd leak the header and initial block if we ereport in this stretch.
      */
+
+    /* See comments about Valgrind interactions in aset.c */
+    VALGRIND_CREATE_MEMPOOL(set, 0, false);
+    /* This chunk covers the BumpContext and the keeper block header */
+    VALGRIND_MEMPOOL_ALLOC(set, set, FIRST_BLOCKHDRSZ);
+
     dlist_init(&set->blocks);

     /* Fill in the initial block's block header */
@@ -262,6 +270,14 @@ BumpReset(MemoryContext context)
             BumpBlockFree(set, block);
     }

+    /*
+     * Instruct Valgrind to throw away all the chunks associated with this
+     * context, except for the one covering the BumpContext and keeper-block
+     * header.  This gets rid of the chunks for whatever user data is getting
+     * discarded by the context reset.
+     */
+    VALGRIND_MEMPOOL_TRIM(set, set, FIRST_BLOCKHDRSZ);
+
     /* Reset block size allocation sequence, too */
     set->nextBlockSize = set->initBlockSize;

@@ -279,6 +295,10 @@ BumpDelete(MemoryContext context)
 {
     /* Reset to release all releasable BumpBlocks */
     BumpReset(context);
+
+    /* Destroy the Valgrind pool -- see notes in aset.c */
+    VALGRIND_DESTROY_MEMPOOL(context);
+
     /* And free the context header and keeper block */
     free(context);
 }
@@ -318,6 +338,9 @@ BumpAllocLarge(MemoryContext context, Size size, int flags)
     if (block == NULL)
         return MemoryContextAllocationFailure(context, size, flags);

+    /* Make a Valgrind chunk covering the new block's header */
+    VALGRIND_MEMPOOL_ALLOC(set, block, Bump_BLOCKHDRSZ);
+
     context->mem_allocated += blksize;

     /* the block is completely full */
@@ -455,6 +478,9 @@ BumpAllocFromNewBlock(MemoryContext context, Size size, int flags,
     if (block == NULL)
         return MemoryContextAllocationFailure(context, size, flags);

+    /* Make a Valgrind chunk covering the new block's header */
+    VALGRIND_MEMPOOL_ALLOC(set, block, Bump_BLOCKHDRSZ);
+
     context->mem_allocated += blksize;

     /* initialize the new block */
@@ -606,6 +632,9 @@ BumpBlockFree(BumpContext *set, BumpBlock *block)
     wipe_mem(block, ((char *) block->endptr - (char *) block));
 #endif

+    /* As in aset.c, free block-header chunks explicitly */
+    VALGRIND_MEMPOOL_FREE(set, block);
+
     free(block);
 }

diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index 18679ad4f1e..e476df534d7 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -45,6 +45,8 @@

 #define Generation_BLOCKHDRSZ    MAXALIGN(sizeof(GenerationBlock))
 #define Generation_CHUNKHDRSZ    sizeof(MemoryChunk)
+#define FIRST_BLOCKHDRSZ        (MAXALIGN(sizeof(GenerationContext)) + \
+                                 Generation_BLOCKHDRSZ)

 #define Generation_CHUNK_FRACTION    8

@@ -221,6 +223,12 @@ GenerationContextCreate(MemoryContext parent,
      * Avoid writing code that can fail between here and MemoryContextCreate;
      * we'd leak the header if we ereport in this stretch.
      */
+
+    /* See comments about Valgrind interactions in aset.c */
+    VALGRIND_CREATE_MEMPOOL(set, 0, false);
+    /* This chunk covers the GenerationContext and the keeper block header */
+    VALGRIND_MEMPOOL_ALLOC(set, set, FIRST_BLOCKHDRSZ);
+
     dlist_init(&set->blocks);

     /* Fill in the initial block's block header */
@@ -309,6 +317,14 @@ GenerationReset(MemoryContext context)
             GenerationBlockFree(set, block);
     }

+    /*
+     * Instruct Valgrind to throw away all the chunks associated with this
+     * context, except for the one covering the GenerationContext and
+     * keeper-block header.  This gets rid of the chunks for whatever user
+     * data is getting discarded by the context reset.
+     */
+    VALGRIND_MEMPOOL_TRIM(set, set, FIRST_BLOCKHDRSZ);
+
     /* set it so new allocations to make use of the keeper block */
     set->block = KeeperBlock(set);

@@ -329,6 +345,10 @@ GenerationDelete(MemoryContext context)
 {
     /* Reset to release all releasable GenerationBlocks */
     GenerationReset(context);
+
+    /* Destroy the Valgrind pool -- see notes in aset.c */
+    VALGRIND_DESTROY_MEMPOOL(context);
+
     /* And free the context header and keeper block */
     free(context);
 }
@@ -365,6 +385,9 @@ GenerationAllocLarge(MemoryContext context, Size size, int flags)
     if (block == NULL)
         return MemoryContextAllocationFailure(context, size, flags);

+    /* Make a Valgrind chunk covering the new block's header */
+    VALGRIND_MEMPOOL_ALLOC(set, block, Generation_BLOCKHDRSZ);
+
     context->mem_allocated += blksize;

     /* block with a single (used) chunk */
@@ -487,6 +510,9 @@ GenerationAllocFromNewBlock(MemoryContext context, Size size, int flags,
     if (block == NULL)
         return MemoryContextAllocationFailure(context, size, flags);

+    /* Make a Valgrind chunk covering the new block's header */
+    VALGRIND_MEMPOOL_ALLOC(set, block, Generation_BLOCKHDRSZ);
+
     context->mem_allocated += blksize;

     /* initialize the new block */
@@ -677,6 +703,9 @@ GenerationBlockFree(GenerationContext *set, GenerationBlock *block)
     wipe_mem(block, block->blksize);
 #endif

+    /* As in aset.c, free block-header chunks explicitly */
+    VALGRIND_MEMPOOL_FREE(set, block);
+
     free(block);
 }

diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 7d28ca706eb..6fce916d11e 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -8,6 +8,15 @@
  * context-type-specific operations via the function pointers in a
  * context's MemoryContextMethods struct.
  *
+ * A note about Valgrind support: the context-type-specific code is
+ * responsible for creating and deleting a Valgrind "pool" for each context,
+ * and also for creating Valgrind "chunks" to cover its management data
+ * structures such as block headers.  (There must be a chunk that includes
+ * every pointer we want Valgrind to consider for leak-tracking purposes.)
+ * This module creates and deletes chunks that cover the caller-visible
+ * allocated areas.  However, the context-type-specific code must handle
+ * cleaning up caller-visible chunks during memory context reset operations.
+ *
  *
  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -449,8 +458,6 @@ MemoryContextResetOnly(MemoryContext context)

         context->methods->reset(context);
         context->isReset = true;
-        VALGRIND_DESTROY_MEMPOOL(context);
-        VALGRIND_CREATE_MEMPOOL(context, 0, false);
     }
 }

@@ -557,8 +564,6 @@ MemoryContextDeleteOnly(MemoryContext context)
     context->ident = NULL;

     context->methods->delete_context(context);
-
-    VALGRIND_DESTROY_MEMPOOL(context);
 }

 /*
@@ -1212,8 +1217,6 @@ MemoryContextCreate(MemoryContext node,
         node->nextchild = NULL;
         node->allowInCritSection = false;
     }
-
-    VALGRIND_CREATE_MEMPOOL(node, 0, false);
 }

 /*
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index d32c0d318fb..563ac728db6 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -377,6 +377,11 @@ SlabContextCreate(MemoryContext parent,
      * we'd leak the header if we ereport in this stretch.
      */

+    /* See comments about Valgrind interactions in aset.c */
+    VALGRIND_CREATE_MEMPOOL(slab, 0, false);
+    /* This chunk covers the SlabContext only */
+    VALGRIND_MEMPOOL_ALLOC(slab, slab, sizeof(SlabContext));
+
     /* Fill in SlabContext-specific header fields */
     slab->chunkSize = (uint32) chunkSize;
     slab->fullChunkSize = (uint32) fullChunkSize;
@@ -451,6 +456,10 @@ SlabReset(MemoryContext context)
 #ifdef CLOBBER_FREED_MEMORY
         wipe_mem(block, slab->blockSize);
 #endif
+
+        /* As in aset.c, free block-header chunks explicitly */
+        VALGRIND_MEMPOOL_FREE(slab, block);
+
         free(block);
         context->mem_allocated -= slab->blockSize;
     }
@@ -467,11 +476,23 @@ SlabReset(MemoryContext context)
 #ifdef CLOBBER_FREED_MEMORY
             wipe_mem(block, slab->blockSize);
 #endif
+
+            /* As in aset.c, free block-header chunks explicitly */
+            VALGRIND_MEMPOOL_FREE(slab, block);
+
             free(block);
             context->mem_allocated -= slab->blockSize;
         }
     }

+    /*
+     * Instruct Valgrind to throw away all the chunks associated with this
+     * context, except for the one covering the SlabContext.  This gets rid of
+     * the chunks for whatever user data is getting discarded by the context
+     * reset.
+     */
+    VALGRIND_MEMPOOL_TRIM(slab, slab, sizeof(SlabContext));
+
     slab->curBlocklistIndex = 0;

     Assert(context->mem_allocated == 0);
@@ -486,6 +507,10 @@ SlabDelete(MemoryContext context)
 {
     /* Reset to release all the SlabBlocks */
     SlabReset(context);
+
+    /* Destroy the Valgrind pool -- see notes in aset.c */
+    VALGRIND_DESTROY_MEMPOOL(context);
+
     /* And free the context header */
     free(context);
 }
@@ -567,6 +592,9 @@ SlabAllocFromNewBlock(MemoryContext context, Size size, int flags)
         if (unlikely(block == NULL))
             return MemoryContextAllocationFailure(context, size, flags);

+        /* Make a Valgrind chunk covering the new block's header */
+        VALGRIND_MEMPOOL_ALLOC(slab, block, Slab_BLOCKHDRSZ);
+
         block->slab = slab;
         context->mem_allocated += slab->blockSize;

@@ -795,6 +823,10 @@ SlabFree(void *pointer)
 #ifdef CLOBBER_FREED_MEMORY
             wipe_mem(block, slab->blockSize);
 #endif
+
+            /* As in aset.c, free block-header chunks explicitly */
+            VALGRIND_MEMPOOL_FREE(slab, block);
+
             free(block);
             slab->header.mem_allocated -= slab->blockSize;
         }
diff --git a/src/include/utils/memdebug.h b/src/include/utils/memdebug.h
index 7309271834b..80692dcef93 100644
--- a/src/include/utils/memdebug.h
+++ b/src/include/utils/memdebug.h
@@ -29,6 +29,7 @@
 #define VALGRIND_MEMPOOL_ALLOC(context, addr, size)            do {} while (0)
 #define VALGRIND_MEMPOOL_FREE(context, addr)                do {} while (0)
 #define VALGRIND_MEMPOOL_CHANGE(context, optr, nptr, size)    do {} while (0)
+#define VALGRIND_MEMPOOL_TRIM(context, addr, size)            do {} while (0)
 #endif


--
2.43.5

From fb80149e0517b33dd265ca5b1f559b0f60405a5a Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun, 11 May 2025 12:52:32 -0400
Subject: [PATCH v1 02/11] Temporarily suppress some Valgrind complaints.

This isn't meant for commit, but it is useful to reduce the large
amount of leak-check noise generated by a couple of known problems:

Temporary buffers, as well as some other structures, are allocated in
a way that causes us not to hold any pointer to what Valgrind thinks
is the start of the chunk.  That causes "possible leak" reports.

PL/pgSQL and SQL-function parsing leak stuff into the long-lived
function cache context.  This isn't really a huge problem, since
the cruft will be recovered if we have to re-parse the function.
But it leads to a lot of complaints from Valgrind.

We need better answers for these things, but for the moment hide
them to allow sorting through other leakage problems.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
---
 src/tools/valgrind.supp | 42 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/src/tools/valgrind.supp b/src/tools/valgrind.supp
index 7ea464c8094..7d4fb0b2c1d 100644
--- a/src/tools/valgrind.supp
+++ b/src/tools/valgrind.supp
@@ -180,3 +180,45 @@
    Memcheck:Cond
    fun:PyObject_Realloc
 }
+
+# Suppress complaints about aligned allocations of temp buffers.
+# This ought to be fixed properly, instead.
+{
+   hide_temp_buffer_allocations
+   Memcheck:Leak
+   match-leak-kinds: possible
+   fun:MemoryContextAlloc
+   fun:GetLocalBufferStorage
+}
+
+# Suppress complaints about aligned allocations of other stuff.
+# This ought to be fixed properly, instead.
+{
+   hide_alignment_leakage
+   Memcheck:Leak
+   match-leak-kinds: possible
+   fun:MemoryContextAllocExtended
+   fun:MemoryContextAllocAligned
+}
+
+# Suppress complaints about stuff leaked by PL/pgSQL parsing.
+# This ought to be fixed properly, instead.
+{
+   hide_plpgsql_parsing_leaks
+   Memcheck:Leak
+   match-leak-kinds: definite,possible,indirect
+
+   ...
+   fun:plpgsql_compile_callback
+}
+
+# And likewise for stuff leaked by SQL-function parsing.
+# This ought to be fixed properly, instead.
+{
+   hide_sql_parsing_leaks
+   Memcheck:Leak
+   match-leak-kinds: definite,possible,indirect
+
+   ...
+   fun:sql_compile_callback
+}
--
2.43.5

From 2f7d138a0a222e9f598a6f54c233c2cc7f6391c9 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun, 11 May 2025 12:56:23 -0400
Subject: [PATCH v1 03/11] Silence complaints about leaked dynahash storage.

Because dynahash.c never frees hashtable storage except by deleting
the whole hashtable context, it doesn't bother to track the individual
blocks of elements allocated by element_alloc().  This results in
"possibly lost" complaints from Valgrind except when the first element
of each block is actively in use.  (Otherwise it'll be on a freelist,
but very likely only reachable via "interior pointers" within element
blocks, which doesn't satisfy Valgrind.)

To fix, if we're building with USE_VALGRIND, expend an extra pointer's
worth of space in each element block so that we can chain them all
together from the HTAB header.  Skip this in shared hashtables though:
Valgrind doesn't track those, and we'd need additional locking to make
it safe to manipulate a shared chain.

While here, update a comment obsoleted by 9c911ec06.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
---
 src/backend/utils/hash/dynahash.c | 52 +++++++++++++++++++++++++++----
 1 file changed, 46 insertions(+), 6 deletions(-)

diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 1ad155d446e..f81f9c199e4 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -22,10 +22,11 @@
  * lookup key's hash value as a partition number --- this will work because
  * of the way calc_bucket() maps hash values to bucket numbers.
  *
- * For hash tables in shared memory, the memory allocator function should
- * match malloc's semantics of returning NULL on failure.  For hash tables
- * in local memory, we typically use palloc() which will throw error on
- * failure.  The code in this file has to cope with both cases.
+ * The memory allocator function should match malloc's semantics of returning
+ * NULL on failure.  (This is essential for hash tables in shared memory.
+ * For hash tables in local memory, we used to use palloc() which will throw
+ * error on failure; but we no longer do, so it's untested whether this
+ * module will still cope with that behavior.)
  *
  * dynahash.c provides support for these types of lookup keys:
  *
@@ -98,6 +99,7 @@

 #include "access/xact.h"
 #include "common/hashfn.h"
+#include "lib/ilist.h"
 #include "port/pg_bitutils.h"
 #include "storage/shmem.h"
 #include "storage/spin.h"
@@ -236,6 +238,16 @@ struct HTAB
     Size        keysize;        /* hash key length in bytes */
     long        ssize;            /* segment size --- must be power of 2 */
     int            sshift;            /* segment shift = log2(ssize) */
+
+    /*
+     * In a USE_VALGRIND build, non-shared hashtables keep an slist chain of
+     * all the element blocks they have allocated.  This pacifies Valgrind,
+     * which would otherwise often claim that the element blocks are "possibly
+     * lost" for lack of any non-interior pointers to their starts.
+     */
+#ifdef USE_VALGRIND
+    slist_head    element_blocks;
+#endif
 };

 /*
@@ -1708,6 +1720,8 @@ element_alloc(HTAB *hashp, int nelem, int freelist_idx)
 {
     HASHHDR    *hctl = hashp->hctl;
     Size        elementSize;
+    Size        requestSize;
+    char       *allocedBlock;
     HASHELEMENT *firstElement;
     HASHELEMENT *tmpElement;
     HASHELEMENT *prevElement;
@@ -1719,12 +1733,38 @@ element_alloc(HTAB *hashp, int nelem, int freelist_idx)
     /* Each element has a HASHELEMENT header plus user data. */
     elementSize = MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(hctl->entrysize);

+    requestSize = nelem * elementSize;
+
+    /* Add space for slist_node list link if we need one. */
+#ifdef USE_VALGRIND
+    if (!hashp->isshared)
+        requestSize += MAXALIGN(sizeof(slist_node));
+#endif
+
+    /* Allocate the memory. */
     CurrentDynaHashCxt = hashp->hcxt;
-    firstElement = (HASHELEMENT *) hashp->alloc(nelem * elementSize);
+    allocedBlock = hashp->alloc(requestSize);

-    if (!firstElement)
+    if (!allocedBlock)
         return false;

+    /*
+     * If USE_VALGRIND, each allocated block of elements of a non-shared
+     * hashtable is chained into a list, so that Valgrind won't think it's
+     * been leaked.
+     */
+#ifdef USE_VALGRIND
+    if (hashp->isshared)
+        firstElement = (HASHELEMENT *) allocedBlock;
+    else
+    {
+        slist_push_head(&hashp->element_blocks, (slist_node *) allocedBlock);
+        firstElement = (HASHELEMENT *) (allocedBlock + MAXALIGN(sizeof(slist_node)));
+    }
+#else
+    firstElement = (HASHELEMENT *) allocedBlock;
+#endif
+
     /* prepare to link all the new entries into the freelist */
     prevElement = NULL;
     tmpElement = firstElement;
--
2.43.5

From 8f3915908e39ece6572198decb16d91a76de6947 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun, 11 May 2025 13:09:53 -0400
Subject: [PATCH v1 04/11] Silence complaints about leaked catcache entries.

Put the dlist_node fields of catctup and catclist structs first.
This ensures that the dlist pointers point to the starts of these
palloc blocks, and thus that Valgrind won't consider them
"possibly lost".

This is a hack, since it implies a similar requirement for every other
struct that we chain into slists or dlists (unless there are other
pointers to them, as there are for the CatCache structs).  But there
is no other obvious alternative.  Fixing these two places seems to be
enough to silence all such Valgrind complaints in simple test cases.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
---
 src/include/utils/catcache.h | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/src/include/utils/catcache.h b/src/include/utils/catcache.h
index 277ec33c00b..00808e23f49 100644
--- a/src/include/utils/catcache.h
+++ b/src/include/utils/catcache.h
@@ -87,6 +87,14 @@ typedef struct catcache

 typedef struct catctup
 {
+    /*
+     * Each tuple in a cache is a member of a dlist that stores the elements
+     * of its hash bucket.  We keep each dlist in LRU order to speed repeated
+     * lookups.  Keep the dlist_node field first so that Valgrind understands
+     * the struct is reachable.
+     */
+    dlist_node    cache_elem;        /* list member of per-bucket list */
+
     int            ct_magic;        /* for identifying CatCTup entries */
 #define CT_MAGIC   0x57261502

@@ -98,13 +106,6 @@ typedef struct catctup
      */
     Datum        keys[CATCACHE_MAXKEYS];

-    /*
-     * Each tuple in a cache is a member of a dlist that stores the elements
-     * of its hash bucket.  We keep each dlist in LRU order to speed repeated
-     * lookups.
-     */
-    dlist_node    cache_elem;        /* list member of per-bucket list */
-
     /*
      * A tuple marked "dead" must not be returned by subsequent searches.
      * However, it won't be physically deleted from the cache until its
@@ -158,13 +159,17 @@ typedef struct catctup
  */
 typedef struct catclist
 {
+    /*
+     * Keep the dlist_node field first so that Valgrind understands the struct
+     * is reachable.
+     */
+    dlist_node    cache_elem;        /* list member of per-catcache list */
+
     int            cl_magic;        /* for identifying CatCList entries */
 #define CL_MAGIC   0x52765103

     uint32        hash_value;        /* hash value for lookup keys */

-    dlist_node    cache_elem;        /* list member of per-catcache list */
-
     /*
      * Lookup keys for the entry, with the first nkeys elements being valid.
      * All by-reference are separately allocated.
--
2.43.5

From 206cff7a169626dc32c411751f472c2147cd7a98 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun, 11 May 2025 13:14:52 -0400
Subject: [PATCH v1 05/11] Silence complaints about leaked CatCache structs.

In USE_VALGRIND builds, don't attempt to cache-align CatCache
structs.  That prevents them from being reported as
"possibly lost" because all the pointers to them look like
interior pointers to Valgrind.

This is a hack, not meant for final commit.  The same hazard exists
for every other use of MemoryContextAllocAligned, so we really ought
to fix it there instead of touching this code.  But this seems to be
enough to silence all such Valgrind complaints in simple test cases,
so just do this much for now.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
---
 src/backend/utils/cache/catcache.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 657648996c2..9e82d9fec68 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -923,12 +923,19 @@ InitCatCache(int id,
     }

     /*
-     * Allocate a new cache structure, aligning to a cacheline boundary
+     * Allocate a new cache structure.  Normally we align it to a cacheline
+     * boundary, but skip that in USE_VALGRIND builds because it will confuse
+     * Valgrind.  (XXX really we should fix that problem in
+     * MemoryContextAllocAligned, not here.)
      *
      * Note: we rely on zeroing to initialize all the dlist headers correctly
      */
+#ifndef USE_VALGRIND
     cp = (CatCache *) palloc_aligned(sizeof(CatCache), PG_CACHE_LINE_SIZE,
                                      MCXT_ALLOC_ZERO);
+#else
+    cp = (CatCache *) palloc0(sizeof(CatCache));
+#endif
     cp->cc_bucket = palloc0(nbuckets * sizeof(dlist_head));

     /*
--
2.43.5

From 6dbb1a4c32f394a7dc63073027bc942a5c1798a3 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun, 11 May 2025 13:19:35 -0400
Subject: [PATCH v1 06/11] Silence complaints about save_ps_display_args.

Valgrind seems not to consider the global "environ" variable as a
valid root pointer; so when we allocate a new environment array,
it claims that data is leaked.  To fix that, keep our own
statically-allocated copy of the pointer.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
---
 src/backend/utils/misc/ps_status.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/backend/utils/misc/ps_status.c b/src/backend/utils/misc/ps_status.c
index e08b26e8c14..4df25944deb 100644
--- a/src/backend/utils/misc/ps_status.c
+++ b/src/backend/utils/misc/ps_status.c
@@ -100,6 +100,17 @@ static void flush_ps_display(void);
 static int    save_argc;
 static char **save_argv;

+/*
+ * Valgrind seems not to consider the global "environ" variable as a valid
+ * root pointer; so when we allocate a new environment array, it claims that
+ * data is leaked.  To fix that, keep our own statically-allocated copy of the
+ * pointer.  (Oddly, this doesn't seem to be a problem for "argv".)
+ */
+#if defined(PS_USE_CLOBBER_ARGV) && defined(USE_VALGRIND)
+extern char **ps_status_new_environ;
+char      **ps_status_new_environ;
+#endif
+

 /*
  * Call this early in startup to save the original argc/argv values.
@@ -206,6 +217,11 @@ save_ps_display_args(int argc, char **argv)
         }
         new_environ[i] = NULL;
         environ = new_environ;
+
+        /* See notes about Valgrind above. */
+#ifdef USE_VALGRIND
+        ps_status_new_environ = new_environ;
+#endif
     }

     /*
--
2.43.5

From baf462f1854ffc8a38b0c4bdabb4575e36cea0f6 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun, 11 May 2025 13:20:59 -0400
Subject: [PATCH v1 07/11] Don't leak the startup-packet buffer in
 ProcessStartupPacket.

This is the first actual code bug fix in this patch series.

I only bothered to free the buffer in the successful-exit code
paths, not the error-exit ones, since we'd quit soon anyway in
the latter cases.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
---
 src/backend/tcop/backend_startup.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/src/backend/tcop/backend_startup.c b/src/backend/tcop/backend_startup.c
index a7d1fec981f..0b956627934 100644
--- a/src/backend/tcop/backend_startup.c
+++ b/src/backend/tcop/backend_startup.c
@@ -627,6 +627,8 @@ ProcessStartupPacket(Port *port, bool ssl_done, bool gss_done)
                      errmsg("received unencrypted data after SSL request"),
                      errdetail("This could be either a client-software bug or evidence of an attempted
man-in-the-middleattack."))); 

+        pfree(buf);
+
         /*
          * regular startup packet, cancel, etc packet should follow, but not
          * another SSL negotiation request, and a GSS request should only
@@ -681,6 +683,8 @@ ProcessStartupPacket(Port *port, bool ssl_done, bool gss_done)
                      errmsg("received unencrypted data after GSSAPI encryption request"),
                      errdetail("This could be either a client-software bug or evidence of an attempted
man-in-the-middleattack."))); 

+        pfree(buf);
+
         /*
          * regular startup packet, cancel, etc packet should follow, but not
          * another GSS negotiation request, and an SSL request should only
@@ -858,6 +862,8 @@ ProcessStartupPacket(Port *port, bool ssl_done, bool gss_done)
     if (am_walsender && !am_db_walsender)
         port->database_name[0] = '\0';

+    pfree(buf);
+
     /*
      * Done filling the Port structure
      */
--
2.43.5

From c8e124f6c6d209e39072e6c2859c476bce8c082e Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun, 11 May 2025 13:32:17 -0400
Subject: [PATCH v1 08/11] Fix some extremely broken code from 525392d57.

This uselessly makes two copies of the plan tree during
BuildCachedPlan; there should be only one, and it should be
in the stmt_context not plan_context.   Also, UpdateCachedPlan
has a fragile and undocumented assumption that the length of
the stmt_list can't change.  (The XXX comments in that function
have a very strong whiff of code that wasn't really ready to
commit, but this is enough fixing to silence Valgrind's complaints
about leaked data in the plan_context.)

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
---
 src/backend/utils/cache/plancache.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 9bcbc4c3e97..53e3255ad7f 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -1114,17 +1114,18 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
                                              ALLOCSET_START_SMALL_SIZES);
         MemoryContextCopyAndSetIdentifier(plan_context, plansource->query_string);

-        stmt_context = AllocSetContextCreate(CurrentMemoryContext,
+        stmt_context = AllocSetContextCreate(plan_context,
                                              "CachedPlan PlannedStmts",
                                              ALLOCSET_START_SMALL_SIZES);
         MemoryContextCopyAndSetIdentifier(stmt_context, plansource->query_string);
-        MemoryContextSetParent(stmt_context, plan_context);

+        /*
+         * Copy plans into the stmt_context.
+         */
         MemoryContextSwitchTo(stmt_context);
         plist = copyObject(plist);

         MemoryContextSwitchTo(plan_context);
-        plist = list_copy(plist);
     }
     else
         plan_context = CurrentMemoryContext;
@@ -1207,8 +1208,6 @@ UpdateCachedPlan(CachedPlanSource *plansource, int query_index,
 {
     List       *query_list = plansource->query_list,
                *plan_list;
-    ListCell   *l1,
-               *l2;
     CachedPlan *plan = plansource->gplan;
     MemoryContext oldcxt;

@@ -1260,12 +1259,7 @@ UpdateCachedPlan(CachedPlanSource *plansource, int query_index,

     MemoryContextReset(plan->stmt_context);
     oldcxt = MemoryContextSwitchTo(plan->stmt_context);
-    forboth(l1, plan_list, l2, plan->stmt_list)
-    {
-        PlannedStmt *plannedstmt = lfirst(l1);
-
-        lfirst(l2) = copyObject(plannedstmt);
-    }
+    plan->stmt_list = copyObject(plan_list);
     MemoryContextSwitchTo(oldcxt);

     /*
--
2.43.5

From f5b72b015066053ed3396f0ea010ae470911f3b6 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun, 11 May 2025 13:38:31 -0400
Subject: [PATCH v1 09/11] Avoid per-relation leakage in autovacuum.

PgStat_StatTabEntry and AutoVacOpts structs were leaked until
the end of the autovacuum worker's run, which is bad news if
there are a lot of relations in the database.

Note: pfree'ing the PgStat_StatTabEntry structs here seems
a bit risky, because pgstat_fetch_stat_tabentry_ext does not
guarantee anything about whether its result is long-lived.
That API could use a re-think.

Also, free a couple of structures retail when USE_VALGRIND.
This doesn't have any effect on the process's total memory
consumption, but it does reduce noise in valgrind leakage
reports.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
---
 src/backend/postmaster/autovacuum.c | 47 ++++++++++++++++++++++++-----
 1 file changed, 40 insertions(+), 7 deletions(-)

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 4d4a1a3197e..ebe8af1056d 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2077,6 +2077,12 @@ do_autovacuum(void)
                 }
             }
         }
+
+        /* Release stuff to avoid per-relation leakage */
+        if (relopts)
+            pfree(relopts);
+        if (tabentry)
+            pfree(tabentry);
     }

     table_endscan(relScan);
@@ -2093,7 +2099,8 @@ do_autovacuum(void)
         Form_pg_class classForm = (Form_pg_class) GETSTRUCT(tuple);
         PgStat_StatTabEntry *tabentry;
         Oid            relid;
-        AutoVacOpts *relopts = NULL;
+        AutoVacOpts *relopts;
+        bool        free_relopts = false;
         bool        dovacuum;
         bool        doanalyze;
         bool        wraparound;
@@ -2111,7 +2118,9 @@ do_autovacuum(void)
          * main rel
          */
         relopts = extract_autovac_opts(tuple, pg_class_desc);
-        if (relopts == NULL)
+        if (relopts)
+            free_relopts = true;
+        else
         {
             av_relation *hentry;
             bool        found;
@@ -2132,6 +2141,12 @@ do_autovacuum(void)
         /* ignore analyze for toast tables */
         if (dovacuum)
             table_oids = lappend_oid(table_oids, relid);
+
+        /* Release stuff to avoid leakage */
+        if (free_relopts)
+            pfree(relopts);
+        if (tabentry)
+            pfree(tabentry);
     }

     table_endscan(relScan);
@@ -2503,6 +2518,8 @@ deleted:
         pg_atomic_test_set_flag(&MyWorkerInfo->wi_dobalance);
     }

+    list_free(table_oids);
+
     /*
      * Perform additional work items, as requested by backends.
      */
@@ -2545,8 +2562,15 @@ deleted:

     /*
      * We leak table_toast_map here (among other things), but since we're
-     * going away soon, it's not a problem.
+     * going away soon, it's not a problem normally.  But when using Valgrind,
+     * release some stuff to reduce complaints about leaked storage.
      */
+#ifdef USE_VALGRIND
+    hash_destroy(table_toast_map);
+    FreeTupleDesc(pg_class_desc);
+    if (bstrategy)
+        pfree(bstrategy);
+#endif

     /*
      * Update pg_database.datfrozenxid, and truncate pg_xact if possible. We
@@ -2684,8 +2708,8 @@ deleted2:
 /*
  * extract_autovac_opts
  *
- * Given a relation's pg_class tuple, return the AutoVacOpts portion of
- * reloptions, if set; otherwise, return NULL.
+ * Given a relation's pg_class tuple, return a palloc'd copy of the
+ * AutoVacOpts portion of reloptions, if set; otherwise, return NULL.
  *
  * Note: callers do not have a relation lock on the table at this point,
  * so the table could have been dropped, and its catalog rows gone, after
@@ -2734,6 +2758,7 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
     autovac_table *tab = NULL;
     bool        wraparound;
     AutoVacOpts *avopts;
+    bool        free_avopts = false;

     /* fetch the relation's relcache entry */
     classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
@@ -2746,8 +2771,10 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
      * main table reloptions if the toast table itself doesn't have.
      */
     avopts = extract_autovac_opts(classTup, pg_class_desc);
-    if (classForm->relkind == RELKIND_TOASTVALUE &&
-        avopts == NULL && table_toast_map != NULL)
+    if (avopts)
+        free_avopts = true;
+    else if (classForm->relkind == RELKIND_TOASTVALUE &&
+             table_toast_map != NULL)
     {
         av_relation *hentry;
         bool        found;
@@ -2856,6 +2883,8 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
                          avopts->vacuum_cost_delay >= 0));
     }

+    if (free_avopts)
+        pfree(avopts);
     heap_freetuple(classTup);
     return tab;
 }
@@ -2887,6 +2916,10 @@ recheck_relation_needs_vacanalyze(Oid relid,
                               effective_multixact_freeze_max_age,
                               dovacuum, doanalyze, wraparound);

+    /* Release tabentry to avoid leakage */
+    if (tabentry)
+        pfree(tabentry);
+
     /* ignore ANALYZE for toast tables */
     if (classForm->relkind == RELKIND_TOASTVALUE)
         *doanalyze = false;
--
2.43.5

From 9815596aaecaec254c638e1a150cac7860954f1c Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun, 11 May 2025 13:44:22 -0400
Subject: [PATCH v1 10/11] Silence complaints about leaks in
 load_domaintype_info().

This code intentionally leaks some intermediate cruft into the
long-lived DomainConstraintCache's memory context, reasoning that the
amount of leakage will typically not be much so it's not worth doing
a copyObject() of the final tree to avoid that.  But Valgrind knows
nothing of engineering tradeoffs and complains anyway.  So modify the
code to do it the slow clean way when USE_VALGRIND.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
---
 src/backend/utils/cache/typcache.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c
index f9aec38a11f..bf0e362eae6 100644
--- a/src/backend/utils/cache/typcache.c
+++ b/src/backend/utils/cache/typcache.c
@@ -1189,8 +1189,20 @@ load_domaintype_info(TypeCacheEntry *typentry)
                 dcc->dccRefCount = 0;
             }

-            /* Create node trees in DomainConstraintCache's context */
+            /*
+             * The clean way to do the next bit is to run stringToNode() and
+             * expression_planner() in the short-lived caller's context, then
+             * copy the result into the DomainConstraintCache's dccContext.
+             * Under Valgrind, we do it that way to silence complaints about
+             * leaking cruft into dccContext.  However, the amount of stuff
+             * leaked is usually pretty minimal, and the cost of the extra
+             * copyObject() call is not negligible.  So when not catering to
+             * Valgrind, we do it all in dccContext and accept some leakage.
+             */
+#ifndef USE_VALGRIND
+            /* Do all this work in DomainConstraintCache's context */
             oldcxt = MemoryContextSwitchTo(dcc->dccContext);
+#endif

             check_expr = (Expr *) stringToNode(constring);

@@ -1206,10 +1218,19 @@ load_domaintype_info(TypeCacheEntry *typentry)
              */
             check_expr = expression_planner(check_expr);

+#ifdef USE_VALGRIND
+            /* Create only the minimally needed stuff in dccContext */
+            oldcxt = MemoryContextSwitchTo(dcc->dccContext);
+#endif
+
             r = makeNode(DomainConstraintState);
             r->constrainttype = DOM_CONSTRAINT_CHECK;
             r->name = pstrdup(NameStr(c->conname));
+#ifdef USE_VALGRIND
+            r->check_expr = copyObject(check_expr);
+#else
             r->check_expr = check_expr;
+#endif
             r->check_exprstate = NULL;

             MemoryContextSwitchTo(oldcxt);
--
2.43.5

From 7adfb9e9dee4725c8950a49dabd9cac5999bb691 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun, 11 May 2025 14:50:10 -0400
Subject: [PATCH v1 11/11] WIP: reduce leakages during PL/pgSQL function
 compilation.

format_procedure leaks memory, so run it in a short-lived context
not the session-lifespan cache context for the PL/pgSQL function.

parse_datatype called the core parser in the function's long-lived
cache context, thus leaking potentially a lot of storage into that
context.  We were also being a bit careless with the TypeName
structures made in that code path and others.  Most of the time we
don't need to retain the TypeName, so make sure it is made in the
short-lived temp context, and copy it only if we do need to retain it.

These are far from the only leaks in PL/pgSQL compilation, but
they're the biggest as far as I've seen.  However, this is just
WIP and may not get committed in this form at all: it's not clear
that this line of attack is enough to remove all leaks in this
area.  It might be better to try to run exclusively within
the short-term context and explicitly allocate what needs to
go into the function parsetree.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
---
 src/pl/plpgsql/src/pl_comp.c | 28 ++++++++++++++++++++++------
 src/pl/plpgsql/src/pl_gram.y |  8 +++++++-
 src/tools/valgrind.supp      | 11 -----------
 3 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c
index 519f7695d7c..5f3e8646fbb 100644
--- a/src/pl/plpgsql/src/pl_comp.c
+++ b/src/pl/plpgsql/src/pl_comp.c
@@ -177,6 +177,7 @@ plpgsql_compile_callback(FunctionCallInfo fcinfo,
     yyscan_t    scanner;
     Datum        prosrcdatum;
     char       *proc_source;
+    char       *proc_signature;
     HeapTuple    typeTup;
     Form_pg_type typeStruct;
     PLpgSQL_variable *var;
@@ -223,6 +224,9 @@ plpgsql_compile_callback(FunctionCallInfo fcinfo,
     plpgsql_check_syntax = forValidator;
     plpgsql_curr_compile = function;

+    /* format_procedure leaks memory, so run it in temp context */
+    proc_signature = format_procedure(fcinfo->flinfo->fn_oid);
+
     /*
      * All the permanent output of compilation (e.g. parse tree) is kept in a
      * per-function memory context, so it can be reclaimed easily.
@@ -232,7 +236,7 @@ plpgsql_compile_callback(FunctionCallInfo fcinfo,
                                      ALLOCSET_DEFAULT_SIZES);
     plpgsql_compile_tmp_cxt = MemoryContextSwitchTo(func_cxt);

-    function->fn_signature = format_procedure(fcinfo->flinfo->fn_oid);
+    function->fn_signature = pstrdup(proc_signature);
     MemoryContextSetIdentifier(func_cxt, function->fn_signature);
     function->fn_oid = fcinfo->flinfo->fn_oid;
     function->fn_input_collation = fcinfo->fncollation;
@@ -1658,6 +1662,11 @@ plpgsql_parse_wordrowtype(char *ident)
 {
     Oid            classOid;
     Oid            typOid;
+    TypeName   *typName;
+    MemoryContext oldCxt;
+
+    /* Avoid memory leaks in long-term function context */
+    oldCxt = MemoryContextSwitchTo(plpgsql_compile_tmp_cxt);

     /*
      * Look up the relation.  Note that because relation rowtypes have the
@@ -1680,9 +1689,12 @@ plpgsql_parse_wordrowtype(char *ident)
                  errmsg("relation \"%s\" does not have a composite type",
                         ident)));

+    typName = makeTypeName(ident);
+
+    MemoryContextSwitchTo(oldCxt);
+
     /* Build and return the row type struct */
-    return plpgsql_build_datatype(typOid, -1, InvalidOid,
-                                  makeTypeName(ident));
+    return plpgsql_build_datatype(typOid, -1, InvalidOid, typName);
 }

 /* ----------
@@ -1696,6 +1708,7 @@ plpgsql_parse_cwordrowtype(List *idents)
     Oid            classOid;
     Oid            typOid;
     RangeVar   *relvar;
+    TypeName   *typName;
     MemoryContext oldCxt;

     /*
@@ -1718,11 +1731,12 @@ plpgsql_parse_cwordrowtype(List *idents)
                  errmsg("relation \"%s\" does not have a composite type",
                         relvar->relname)));

+    typName = makeTypeNameFromNameList(idents);
+
     MemoryContextSwitchTo(oldCxt);

     /* Build and return the row type struct */
-    return plpgsql_build_datatype(typOid, -1, InvalidOid,
-                                  makeTypeNameFromNameList(idents));
+    return plpgsql_build_datatype(typOid, -1, InvalidOid, typName);
 }

 /*
@@ -1937,6 +1951,8 @@ plpgsql_build_recfield(PLpgSQL_rec *rec, const char *fldname)
  * origtypname is the parsed form of what the user wrote as the type name.
  * It can be NULL if the type could not be a composite type, or if it was
  * identified by OID to begin with (e.g., it's a function argument type).
+ * origtypname is in short-lived storage and must be copied if we choose
+ * to incorporate it into the function's parse tree.
  */
 PLpgSQL_type *
 plpgsql_build_datatype(Oid typeOid, int32 typmod,
@@ -2055,7 +2071,7 @@ build_datatype(HeapTuple typeTup, int32 typmod,
                      errmsg("type %s is not composite",
                             format_type_be(typ->typoid))));

-        typ->origtypname = origtypname;
+        typ->origtypname = copyObject(origtypname);
         typ->tcache = typentry;
         typ->tupdesc_id = typentry->tupDesc_identifier;
     }
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 5612e66d023..6d53da4e79a 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -3848,6 +3848,7 @@ parse_datatype(const char *string, int location, yyscan_t yyscanner)
     int32        typmod;
     sql_error_callback_arg cbarg;
     ErrorContextCallback syntax_errcontext;
+    MemoryContext oldCxt;

     cbarg.location = location;
     cbarg.yyscanner = yyscanner;
@@ -3857,9 +3858,14 @@ parse_datatype(const char *string, int location, yyscan_t yyscanner)
     syntax_errcontext.previous = error_context_stack;
     error_context_stack = &syntax_errcontext;

-    /* Let the main parser try to parse it under standard SQL rules */
+    /*
+     * Let the main parser try to parse it under standard SQL rules.  The
+     * parser leaks memory, so run it in temp context.
+     */
+    oldCxt = MemoryContextSwitchTo(plpgsql_compile_tmp_cxt);
     typeName = typeStringToTypeName(string, NULL);
     typenameTypeIdAndMod(NULL, typeName, &type_id, &typmod);
+    MemoryContextSwitchTo(oldCxt);

     /* Restore former ereport callback */
     error_context_stack = syntax_errcontext.previous;
diff --git a/src/tools/valgrind.supp b/src/tools/valgrind.supp
index 7d4fb0b2c1d..80268d6eed8 100644
--- a/src/tools/valgrind.supp
+++ b/src/tools/valgrind.supp
@@ -201,17 +201,6 @@
    fun:MemoryContextAllocAligned
 }

-# Suppress complaints about stuff leaked by PL/pgSQL parsing.
-# This ought to be fixed properly, instead.
-{
-   hide_plpgsql_parsing_leaks
-   Memcheck:Leak
-   match-leak-kinds: definite,possible,indirect
-
-   ...
-   fun:plpgsql_compile_callback
-}
-
 # And likewise for stuff leaked by SQL-function parsing.
 # This ought to be fixed properly, instead.
 {
--
2.43.5


Re: Why our Valgrind reports suck

От
Tom Lane
Дата:
I wrote:
> Okay, here is a patch series that updates the
> 0001-Make-memory-contexts-themselves-more-visible-to-valg.patch
> patch you posted in that thread,

I forgot to mention that I did try to implement the "two-level
pool" scheme that the Valgrind documentation talks about, and
could not make it work.  There seem to be undocumented interactions
between the outer and inner chunks, and it's not real clear to me
that there's not outright bugs.  Anyway, AFAICS that scheme
would bring us no immediate advantages anyway, compared to the
flat approach of just adding mempool chunks for the allocators'
management areas.

            regards, tom lane



Re: Why our Valgrind reports suck

От
Álvaro Herrera
Дата:
On 2025-May-11, Tom Lane wrote:

> Okay, here is a patch series that updates the
> 0001-Make-memory-contexts-themselves-more-visible-to-valg.patch
> patch you posted in that thread, and makes various follow-up
> fixes that either fix or paper over various leaks.

Wow, that's a lot of extra fixes.  I did a quick test run with all
patches save the last, then run tests "test_setup plpgsql tablespace" to
see if I'd get a leak report about plpgsql (to verify the setup was
correct), and I did.  Rerunning after applying that patch silences what
seems to be most of them.

One source of leaks still present is LLVM.  It's not large numbers,

140303.log:==00:00:12:42.244 140303==      possibly lost: 18,336 bytes in 207 blocks
140442.log:==00:00:13:43.070 140442==      possibly lost: 18,456 bytes in 210 blocks
140729.log:==00:00:16:09.802 140729==      possibly lost: 64,120 bytes in 840 blocks
140892.log:==00:00:17:39.300 140892==      possibly lost: 18,128 bytes in 202 blocks
141001.log:==00:00:18:31.631 141001==      possibly lost: 18,128 bytes in 202 blocks
141031.log:==00:00:19:03.528 141031==      possibly lost: 64,232 bytes in 717 blocks
141055.log:==00:00:20:11.536 141055==      possibly lost: 18,128 bytes in 202 blocks
141836.log:==00:00:29:10.344 141836==    definitely lost: 29,666 bytes in 3,175 blocks
141836.log:==00:00:29:10.344 141836==    indirectly lost: 13,977 bytes in 1,138 blocks
142061.log:==00:00:32:26.264 142061==      possibly lost: 18,128 bytes in 202 blocks

(The installcheck run is still going, but 90 tests in, those are the
only reports of ~10kB or more).


> In particular, I had not realized that autovacuum
> leaks a nontrivial amount of memory per relation processed (cf 0009),
> and apparently has done for a few releases now.  This is horrid in
> databases with many tables, and I'm surprised that we've not gotten
> complaints about it.

In PGConf Germany we got a lightning talk[1] that reported a problem
that might be explained by this: with 10 million of relations, the OOM
killer gets invoked on autovacuum workers on the reported case, so
essentially autovacuum doesn't work at all.  So clearly there is somebody
that would appreciate that this problem is fixed.

He actually blames it on relcache, but who knows how correct that is.
Not much to see on the slides other than they turn to vacuumdb instead,
but they're here:

[1]
https://www.postgresql.eu/events/pgconfde2025/sessions/session/6625/slides/681/06%20-%20LT%20-%20Jonas%20Marasus%20-%20War%20Story%20How%20Big%20Is%20Too%20Big%20For%20a%20Schema.pdf

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Computing is too important to be left to men." (Karen Spärck Jones)



Re: Why our Valgrind reports suck

От
Tom Lane
Дата:
=?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@kurilemu.de> writes:
> On 2025-May-11, Tom Lane wrote:
>> In particular, I had not realized that autovacuum
>> leaks a nontrivial amount of memory per relation processed (cf 0009),
>> and apparently has done for a few releases now.  This is horrid in
>> databases with many tables, and I'm surprised that we've not gotten
>> complaints about it.

> In PGConf Germany we got a lightning talk[1] that reported a problem
> that might be explained by this: with 10 million of relations, the OOM
> killer gets invoked on autovacuum workers on the reported case, so
> essentially autovacuum doesn't work at all.  So clearly there is somebody
> that would appreciate that this problem is fixed.

Interesting.

> He actually blames it on relcache, but who knows how correct that is.

I would not be surprised if the relcache is bloated too, but Valgrind
wouldn't think that's a leak.  I wonder if it'd be worth setting up
a mechanism for autovacuum to drop the relcache entry for a table
once it's done with it.

            regards, tom lane



Re: Why our Valgrind reports suck

От
Yasir
Дата:


On Mon, May 12, 2025 at 12:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> And, since there's nothing new under the sun around here,
> we already had a discussion about that back in 2021:
> https://www.postgresql.org/message-id/flat/3471359.1615937770%40sss.pgh.pa.us
> That thread seems to have led to fixing some specific bugs,
> but we never committed any of the discussed valgrind infrastructure
> improvements.  I'll have a go at resurrecting that...

Okay, here is a patch series that updates the
0001-Make-memory-contexts-themselves-more-visible-to-valg.patch
patch you posted in that thread, and makes various follow-up
fixes that either fix or paper over various leaks.  Some of it
is committable I think, but other parts are just WIP.  Anyway,
as of the 0010 patch we can run through the core regression tests
and see no more than a couple of kilobytes total reported leakage
in any process, except for two tests that expose leaks in TS
dictionary building.  (That's fixable but I ran out of time,
and I wanted to get this posted before Montreal.)  There is
work left to do before we can remove the suppressions added in
0002, but this is already huge progress compared to where we were.

A couple of these patches are bug fixes that need to be applied and
even back-patched.  In particular, I had not realized that autovacuum
leaks a nontrivial amount of memory per relation processed (cf 0009),
and apparently has done for a few releases now.  This is horrid in
databases with many tables, and I'm surprised that we've not gotten
complaints about it.

                        regards, tom lane


Thanks for sharing the patch series. I've applied the patches on my end and rerun the tests. Valgrind now reports 8 bytes leakage only, and the previously noisy outputs are almost entirely gone.
Here's valgrind output:

==00:00:01:50.385 90463== LEAK SUMMARY:
==00:00:01:50.385 90463==    definitely lost: 8 bytes in 1 blocks
==00:00:01:50.385 90463==    indirectly lost: 0 bytes in 0 blocks
==00:00:01:50.385 90463==      possibly lost: 0 bytes in 0 blocks
==00:00:01:50.385 90463==    still reachable: 1,182,132 bytes in 2,989 blocks
==00:00:01:50.385 90463==         suppressed: 0 bytes in 0 blocks
==00:00:01:50.385 90463== Rerun with --leak-check=full to see details of leaked memory
==00:00:01:50.385 90463==
==00:00:01:50.385 90463== For lists of detected and suppressed errors, rerun with: -s
==00:00:01:50.385 90463== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 34 from 3)

 Regards, 

Yasir Hussain
Data Bene

Re: Why our Valgrind reports suck

От
Tom Lane
Дата:
I wrote:
> Here's a v2 patchset that reaches the goal of zero reported leaks
> in the core regression tests, with some caveats:

Oh, another caveat is that I ran this with a fairly minimalistic set
of build options.  In a more complete build, I observed a small leak
in xml.c, which I posted a fix for in another thread [1].

I also see the leakage Alvaro mentioned with --with-llvm.  I'm not
sure how real that is though, because AFAICS all of it is reported
as "possibly lost", not "definitely lost" or "indirectly lost".
In Valgrind-speak that means there are pointers leading to the
chunk, just not pointing right at its start.  So this could be the
result of allocation tricks being played by the C++ compiler.
The Valgrind manual talks about some heuristics they use to handle
C++ coding patterns, but they don't seem to help in my environment.
In any case, the allocation points are mostly far enough down into
LLVM functions that if the leaks are real, I'd tend to call them
LLVM's bugs not ours.

I haven't really tried our non-core test suites yet.  Out of curiosity
I ran the plperl, plpython, and pltcl suites.  All of them show pretty
enormous amounts of "possibly lost" data, which again seems likely to
be an artifact of allocation games within those libraries rather than
real leaks.  I wonder if they have "valgrind friendly" build options
that we'd need to use to get sane results?

            regards, tom lane

[1] https://www.postgresql.org/message-id/1358967.1747858817%40sss.pgh.pa.us



Re: Why our Valgrind reports suck

От
Andres Freund
Дата:
Hi,

0001:

This is rather wild, I really have only the dimmest memory of that other
thread even though I apparently resorted to reading valgrind's source code...

I think the vchunk/vpool naming, while not necessarily elegant, is helpful.


0002: Makes sense.


0003:
0004:
0005:

Ugh, that's rather gross. Not that I have a better idea. So it's probably
worth doing ...


0006: LGTM


0007:

+    /* Run the rest in xact context to avoid Valgrind leak complaints */
+    MemoryContextSwitchTo(TopTransactionContext);
 
It seems like it also protects at least somewhat against actual leaks?


0008: LGTM


0009:

Seems like we really ought to do more here. But it's a substantial
improvement, so let's not let perfect be the enemy of good.


0010:
0011:

Not great, but better than not doing anything.


0012:

Hm. Kinda feels like we should just always do it the USE_VALGRIND
approach. ISTM that if domain typecache entry building is a bottleneck, there
are way bigger problems than a copyObject()...


0014:

I guess I personally would try to put the freeing code into a helper, but
it's a close call.


0015:

I'd probably put the list_free() after the
LWLockRelease(LogicalRepWorkerLock)?


0016:

Agreed with the concern stated in the commit message, but addressing that
would obviously be a bigger project.


0017:

A tad weird to leave the comments above the removed = NILs in place, even
though it's obviously still correct.


0018: LGTM.


> But this is the last step to get to zero reported leaks in a run of the core
> regression tests, so let's do it.

I assume that's just about the core tests, not more? I.e. I can't make skink
enable leak checking?

Greetings,

Andres Freund



Re: Why our Valgrind reports suck

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> [ review ]

Thanks for the comments!  I'll go through them and post an updated
version tomorrow.  The cfbot is already nagging me for a rebase
now that 0013 is moot.

>> But this is the last step to get to zero reported leaks in a run of the core
>> regression tests, so let's do it.

> I assume that's just about the core tests, not more? I.e. I can't make skink
> enable leak checking?

No, we're not there yet.  I've identified some other backend issues (in
postgres_fdw in particular), and I've not looked at frontend programs
at all.  For most frontend programs, I'm dubious how much we care.

Actually the big problem is I don't know what to do about
plperl/plpython/pltcl.  I suppose the big-hammer approach
would be to put in suppression patterns covering those,
at least till such time as someone has a better idea.

I'm envisioning this patch series as v19 work, were you
thinking we should be more aggressive?

            regards, tom lane



Re: Why our Valgrind reports suck

От
Andres Freund
Дата:
Hi,

On 2025-05-22 21:48:24 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> >> But this is the last step to get to zero reported leaks in a run of the core
> >> regression tests, so let's do it.
> 
> > I assume that's just about the core tests, not more? I.e. I can't make skink
> > enable leak checking?
> 
> No, we're not there yet.  I've identified some other backend issues (in
> postgres_fdw in particular), and I've not looked at frontend programs
> at all.  For most frontend programs, I'm dubious how much we care.

Skink only tests backend stuff anyway, but the other backend issues make it a
no go for no...


> I'm envisioning this patch series as v19 work, were you
> thinking we should be more aggressive?

Mostly agreed - but I am wondering if the AV fix should be backpatched?

Greetings,

Andres Freund



Re: Why our Valgrind reports suck

От
Peter Geoghegan
Дата:
On Fri, May 23, 2025 at 11:42 AM Andres Freund <andres@anarazel.de> wrote:
> > I'm envisioning this patch series as v19 work, were you
> > thinking we should be more aggressive?
>
> Mostly agreed - but I am wondering if the AV fix should be backpatched?

I think that it probably should be.

--
Peter Geoghegan



Re: Why our Valgrind reports suck

От
Tom Lane
Дата:
Peter Geoghegan <pg@bowt.ie> writes:
> On Fri, May 23, 2025 at 11:42 AM Andres Freund <andres@anarazel.de> wrote:
>>> I'm envisioning this patch series as v19 work, were you
>>> thinking we should be more aggressive?

>> Mostly agreed - but I am wondering if the AV fix should be backpatched?

> I think that it probably should be.

Yeah, I agree, but didn't get to that yet.  I've also gone ahead with
committing a couple of clear bugs that I located while doing this.

            regards, tom lane