Обсуждение: Memory leak of SMgrRelation object on standby

Поиск
Список
Период
Сортировка

Memory leak of SMgrRelation object on standby

От
Jingtang Zhang
Дата:
Hi~ hackers

Back to v17, commit 21d9c3ee gave SMgrRelation a well-defined lifetime, and
smgrclose nolonger removes SMgrRelation object from the hashtable, leaving
the work to smgrdestroyall. But I find a place that relies on the removing
behavior previously, but is still calling smgrclose.

Startup process of standby will redo table dropping with DropRelationFiles,
using smgrdounlinkall to drop buffers and unlink physical files, and then
uses smgrclose to destroy the SMgrRelation object. I think it should use
smgrdestroy here, or the object memory will be leaked.

With concurrent clients, the following pgbench script will produce the
memory leak of a standby startup process easily. Entries will be entered
into the hashtable but never removed.

pgbench -f bench.sql -n -c 32 -j 32 -T 600

```sql
DROP TABLE IF EXISTS tbl:client_id;
CREATE TABLE tbl:client_id (id int);
```

The attached patch export smgrdestroy as a public function, and use it in
DropRelationFiles.

—
Regards, Jingtang





Вложения

Re: Memory leak of SMgrRelation object on standby

От
Thomas Munro
Дата:
On Sat, Aug 16, 2025 at 12:50 AM Jingtang Zhang <mrdrivingduck@gmail.com> wrote:
> Back to v17, commit 21d9c3ee gave SMgrRelation a well-defined lifetime, and
> smgrclose nolonger removes SMgrRelation object from the hashtable, leaving
> the work to smgrdestroyall. But I find a place that relies on the removing
> behavior previously, but is still calling smgrclose.

Thanks for the report!  Replying here rather than the pgsql-bugs
thread because your patch is here.

> Startup process of standby will redo table dropping with DropRelationFiles,
> using smgrdounlinkall to drop buffers and unlink physical files, and then
> uses smgrclose to destroy the SMgrRelation object. I think it should use
> smgrdestroy here, or the object memory will be leaked.

DropRelationFiles() is also called by FinishPreparedTransaction().  At
first I thought that might be a problem too, but looking a bit more
closely and trying it out... if a prepared transaction dropped a
table, then it called RelationDropStorage(), RelationCloseSmgr(),
smgrunpin(), and I can't immediately think of any way to repin it
while the relation is locked, so you can't break the assertion about
that in smgrdestroy(), where we make sure there are no Relation
objects with dangling references.  It's already left unpinned or never
pinned and later destroyed in both DROP; PREPARE TRANSACTION; ...
COMMIT PREPARED; and CREATE; PREPARE TRANSACTION; ... ROLLBACK
PREPARED sequences, with different details.

Now I'm left wondering if two-phase commit should do this explicitly
or not.  For the isRedo case it seems clear, it was the intention of
21d9c3ee to destroy it on commit/abort, which must be here I think.
The following description from the commit message *probably* also fits
the two-phase case, even though it already doesn't leak without it.
It would be good to get a comment explaining the new smgrdestroy()
call, and point to where our tests exercise all relevant cases.  Hmm,
it's not immediately obvious how to introspect the cached state to
verify that the memory isn't leaked.

   Guarantee that the object won't be destroyed until the end of the
   current transaction, or in recovery, the commit/abort record that
   destroys the underlying storage.



Re: Memory leak of SMgrRelation object on standby

От
Jingtang Zhang
Дата:
Hi~

Thanks for looking.

> DropRelationFiles() is also called by FinishPreparedTransaction().  At
> first I thought that might be a problem too, but looking a bit more
> closely and trying it out... if a prepared transaction dropped a
> table, then it called RelationDropStorage(), RelationCloseSmgr(),
> smgrunpin(), and I can't immediately think of any way to repin it
> while the relation is locked, so you can't break the assertion about
> that in smgrdestroy(), where we make sure there are no Relation
> objects with dangling references.  It's already left unpinned or never
> pinned and later destroyed in both DROP; PREPARE TRANSACTION; ...
> COMMIT PREPARED; and CREATE; PREPARE TRANSACTION; ... ROLLBACK
> PREPARED sequences, with different details.

Another call of DropRelationFiles seems to have been handled by AtEOXact_SMgr,
so there is no leak w/o patch. I have pgbench'ed in one connection for 10 min
with CREATE; PREPARE TRANSACTION; ... ROLLBACK PREPARED; sequence and there
is no rising of RES.

> Now I'm left wondering if two-phase commit should do this explicitly
> or not.  For the isRedo case it seems clear, it was the intention of
> 21d9c3ee to destroy it on commit/abort, which must be here I think.
> The following description from the commit message *probably* also fits
> the two-phase case, even though it already doesn't leak without it.
> It would be good to get a comment explaining the new smgrdestroy()
> call, and point to where our tests exercise all relevant cases.

Explicit call seems to be a duplication if AtEOXact_SMgr can already handle?
Should be do it only for redo?

Some comment has been added to smgrdestroy() to define when it should be
called inside v2 patch.

> it's not immediately obvious how to introspect the cached state to
> verify that the memory isn't leaked.

It sure is...The way I'm using is benching for a long time to see if RES is
rising…

—
Regards, Jingtang




Вложения

Re: Memory leak of SMgrRelation object on standby

От
Jingtang Zhang
Дата:
Hi~

> On Sat, Aug 16, 2025 at 12:50 AM Jingtang Zhang <mrdrivingduck@gmail.com> wrote:
>> Back to v17, commit 21d9c3ee gave SMgrRelation a well-defined lifetime, and
>> smgrclose nolonger removes SMgrRelation object from the hashtable, leaving
>> the work to smgrdestroyall. But I find a place that relies on the removing
>> behavior previously, but is still calling smgrclose.

Also, in this situation, should startup process be treated as a background
worker similar to bgwriter/checkpointer and call smgrdestroyall in some
period? Even if startup process begins to call smgrdestroy inside
DropRelationFiles, suppose, there are a lot of transactions keep creating
tables on primary, the startup process of standby will open and create but
do not have any chance to destroy a SMgrRelation object, so the memory
will always grow. It seems to be true even if smgrclose is responsible for
destroying the object previously, because I can't find any smgrclose during
WAL recovery, except for DROP DATABASE which is rarely used in production.

—
Regards, Jingtang




Re: Memory leak of SMgrRelation object on standby

От
邱宇航
Дата:
> 2025年8月21日 23:07,Jingtang Zhang <mrdrivingduck@gmail.com> 写道:
>
> Also, in this situation, should startup process be treated as a background
> worker similar to bgwriter/checkpointer and call smgrdestroyall in some
> period?

Agree with that. Maybe we can call smgrdestroyall in startup process when
replaying CHECKPOINT records, just like bgwriter/checkpointer, which free
all smgr objects after any checkpoint.

Best regards,
Yuhang Qiu




Re: Memory leak of SMgrRelation object on standby

От
Jingtang Zhang
Дата:
Hi~

> Agree with that. Maybe we can call smgrdestroyall in startup process when
> replaying CHECKPOINT records, just like bgwriter/checkpointer, which free
> all smgr objects after any checkpoint.

That seems reasonable, in that case a startup process would behave just the
same as bgwriter or checkpointer.

I purpose a patch which calls smgrdestroyall() when redo each
XLOG_CHECKPOINT_ONLINE, so that it can keep the same frequency of calling
smgrdestroyall() as background processes on primary. I don't call it for
XLOG_CHECKPOINT_SHUTDOWN because the process is about to exit so that the
memory will go soon, and don't call it for XLOG_CHECKPOINT_REDO because it
seems to be a place holder only.

—
Regards, Jingtang


Вложения

Re: Memory leak of SMgrRelation object on standby

От
Jingtang Zhang
Дата:
Hi~

> I purpose a patch which calls smgrdestroyall() when redo each
> XLOG_CHECKPOINT_ONLINE, so that it can keep the same frequency of calling
> smgrdestroyall() as background processes on primary. I don't call it for
> XLOG_CHECKPOINT_SHUTDOWN because the process is about to exit so that the
> memory will go soon, and don't call it for XLOG_CHECKPOINT_REDO because it
> seems to be a place holder only.


Oops. When redo XLOG_CHECKPOINT_SHUTDOWN, smgrdestroyall should also be
called, since the startup may not exit on standby.

The patch is updated.

—
Regards, Jingtang


Вложения

Re: Memory leak of SMgrRelation object on standby

От
邱宇航
Дата:
> 2025年8月26日 11:59,Jingtang Zhang <mrdrivingduck@gmail.com> 写道:
>
> Hi~
>
>> I purpose a patch which calls smgrdestroyall() when redo each
>> XLOG_CHECKPOINT_ONLINE, so that it can keep the same frequency of calling
>> smgrdestroyall() as background processes on primary. I don't call it for
>> XLOG_CHECKPOINT_SHUTDOWN because the process is about to exit so that the
>> memory will go soon, and don't call it for XLOG_CHECKPOINT_REDO because it
>> seems to be a place holder only.
>
>
> Oops. When redo XLOG_CHECKPOINT_SHUTDOWN, smgrdestroyall should also be
> called, since the startup may not exit on standby.
>
> The patch is updated.
>
> —
> Regards, Jingtang
>
> <v4-0001-Fix-SMgrRelation-object-memory-leak-during-startup-r.patch>

LGTM.

Best regards,
Yuhang Qiu




Re: Memory leak of SMgrRelation object on standby

От
Michael Paquier
Дата:
On Tue, Sep 09, 2025 at 11:58:51AM +0800, 邱宇航 wrote:
>> Oops. When redo XLOG_CHECKPOINT_SHUTDOWN, smgrdestroyall should also be
>> called, since the startup may not exit on standby.
>>
>> The patch is updated.

True that the situation sucks for the startup process, bloating its
memory.  That's hard to reach, still for long-running startup
processes, which is a common thing, that's rather bad.

> LGTM.

Hmm.  I was playing a bit with the startup process and, after planting
a few calls to hash_get_num_entries(SMgrRelationHash) the bloat is
measurable.  On wraparound, it would mean that the hash table could
point to past entries in this context.

I can get behind the patch and the proposal of forcing a cleanup each
time a checkpoint record is replayed, outside of
RecoveryRestartPoint(), so I'll see about applying and backpatching
that.  Thanks for the report.
--
Michael

Вложения