Обсуждение: Memory leak of SMgrRelation object on standby
Hi~ hackers Back to v17, commit 21d9c3ee gave SMgrRelation a well-defined lifetime, and smgrclose nolonger removes SMgrRelation object from the hashtable, leaving the work to smgrdestroyall. But I find a place that relies on the removing behavior previously, but is still calling smgrclose. Startup process of standby will redo table dropping with DropRelationFiles, using smgrdounlinkall to drop buffers and unlink physical files, and then uses smgrclose to destroy the SMgrRelation object. I think it should use smgrdestroy here, or the object memory will be leaked. With concurrent clients, the following pgbench script will produce the memory leak of a standby startup process easily. Entries will be entered into the hashtable but never removed. pgbench -f bench.sql -n -c 32 -j 32 -T 600 ```sql DROP TABLE IF EXISTS tbl:client_id; CREATE TABLE tbl:client_id (id int); ``` The attached patch export smgrdestroy as a public function, and use it in DropRelationFiles. — Regards, Jingtang
Вложения
On Sat, Aug 16, 2025 at 12:50 AM Jingtang Zhang <mrdrivingduck@gmail.com> wrote: > Back to v17, commit 21d9c3ee gave SMgrRelation a well-defined lifetime, and > smgrclose nolonger removes SMgrRelation object from the hashtable, leaving > the work to smgrdestroyall. But I find a place that relies on the removing > behavior previously, but is still calling smgrclose. Thanks for the report! Replying here rather than the pgsql-bugs thread because your patch is here. > Startup process of standby will redo table dropping with DropRelationFiles, > using smgrdounlinkall to drop buffers and unlink physical files, and then > uses smgrclose to destroy the SMgrRelation object. I think it should use > smgrdestroy here, or the object memory will be leaked. DropRelationFiles() is also called by FinishPreparedTransaction(). At first I thought that might be a problem too, but looking a bit more closely and trying it out... if a prepared transaction dropped a table, then it called RelationDropStorage(), RelationCloseSmgr(), smgrunpin(), and I can't immediately think of any way to repin it while the relation is locked, so you can't break the assertion about that in smgrdestroy(), where we make sure there are no Relation objects with dangling references. It's already left unpinned or never pinned and later destroyed in both DROP; PREPARE TRANSACTION; ... COMMIT PREPARED; and CREATE; PREPARE TRANSACTION; ... ROLLBACK PREPARED sequences, with different details. Now I'm left wondering if two-phase commit should do this explicitly or not. For the isRedo case it seems clear, it was the intention of 21d9c3ee to destroy it on commit/abort, which must be here I think. The following description from the commit message *probably* also fits the two-phase case, even though it already doesn't leak without it. It would be good to get a comment explaining the new smgrdestroy() call, and point to where our tests exercise all relevant cases. Hmm, it's not immediately obvious how to introspect the cached state to verify that the memory isn't leaked. Guarantee that the object won't be destroyed until the end of the current transaction, or in recovery, the commit/abort record that destroys the underlying storage.
Hi~ Thanks for looking. > DropRelationFiles() is also called by FinishPreparedTransaction(). At > first I thought that might be a problem too, but looking a bit more > closely and trying it out... if a prepared transaction dropped a > table, then it called RelationDropStorage(), RelationCloseSmgr(), > smgrunpin(), and I can't immediately think of any way to repin it > while the relation is locked, so you can't break the assertion about > that in smgrdestroy(), where we make sure there are no Relation > objects with dangling references. It's already left unpinned or never > pinned and later destroyed in both DROP; PREPARE TRANSACTION; ... > COMMIT PREPARED; and CREATE; PREPARE TRANSACTION; ... ROLLBACK > PREPARED sequences, with different details. Another call of DropRelationFiles seems to have been handled by AtEOXact_SMgr, so there is no leak w/o patch. I have pgbench'ed in one connection for 10 min with CREATE; PREPARE TRANSACTION; ... ROLLBACK PREPARED; sequence and there is no rising of RES. > Now I'm left wondering if two-phase commit should do this explicitly > or not. For the isRedo case it seems clear, it was the intention of > 21d9c3ee to destroy it on commit/abort, which must be here I think. > The following description from the commit message *probably* also fits > the two-phase case, even though it already doesn't leak without it. > It would be good to get a comment explaining the new smgrdestroy() > call, and point to where our tests exercise all relevant cases. Explicit call seems to be a duplication if AtEOXact_SMgr can already handle? Should be do it only for redo? Some comment has been added to smgrdestroy() to define when it should be called inside v2 patch. > it's not immediately obvious how to introspect the cached state to > verify that the memory isn't leaked. It sure is...The way I'm using is benching for a long time to see if RES is rising… — Regards, Jingtang
Вложения
Hi~ > On Sat, Aug 16, 2025 at 12:50 AM Jingtang Zhang <mrdrivingduck@gmail.com> wrote: >> Back to v17, commit 21d9c3ee gave SMgrRelation a well-defined lifetime, and >> smgrclose nolonger removes SMgrRelation object from the hashtable, leaving >> the work to smgrdestroyall. But I find a place that relies on the removing >> behavior previously, but is still calling smgrclose. Also, in this situation, should startup process be treated as a background worker similar to bgwriter/checkpointer and call smgrdestroyall in some period? Even if startup process begins to call smgrdestroy inside DropRelationFiles, suppose, there are a lot of transactions keep creating tables on primary, the startup process of standby will open and create but do not have any chance to destroy a SMgrRelation object, so the memory will always grow. It seems to be true even if smgrclose is responsible for destroying the object previously, because I can't find any smgrclose during WAL recovery, except for DROP DATABASE which is rarely used in production. — Regards, Jingtang
> 2025年8月21日 23:07,Jingtang Zhang <mrdrivingduck@gmail.com> 写道: > > Also, in this situation, should startup process be treated as a background > worker similar to bgwriter/checkpointer and call smgrdestroyall in some > period? Agree with that. Maybe we can call smgrdestroyall in startup process when replaying CHECKPOINT records, just like bgwriter/checkpointer, which free all smgr objects after any checkpoint. Best regards, Yuhang Qiu
Hi~ > Agree with that. Maybe we can call smgrdestroyall in startup process when > replaying CHECKPOINT records, just like bgwriter/checkpointer, which free > all smgr objects after any checkpoint. That seems reasonable, in that case a startup process would behave just the same as bgwriter or checkpointer. I purpose a patch which calls smgrdestroyall() when redo each XLOG_CHECKPOINT_ONLINE, so that it can keep the same frequency of calling smgrdestroyall() as background processes on primary. I don't call it for XLOG_CHECKPOINT_SHUTDOWN because the process is about to exit so that the memory will go soon, and don't call it for XLOG_CHECKPOINT_REDO because it seems to be a place holder only. — Regards, Jingtang
Вложения
Hi~ > I purpose a patch which calls smgrdestroyall() when redo each > XLOG_CHECKPOINT_ONLINE, so that it can keep the same frequency of calling > smgrdestroyall() as background processes on primary. I don't call it for > XLOG_CHECKPOINT_SHUTDOWN because the process is about to exit so that the > memory will go soon, and don't call it for XLOG_CHECKPOINT_REDO because it > seems to be a place holder only. Oops. When redo XLOG_CHECKPOINT_SHUTDOWN, smgrdestroyall should also be called, since the startup may not exit on standby. The patch is updated. — Regards, Jingtang
Вложения
> 2025年8月26日 11:59,Jingtang Zhang <mrdrivingduck@gmail.com> 写道: > > Hi~ > >> I purpose a patch which calls smgrdestroyall() when redo each >> XLOG_CHECKPOINT_ONLINE, so that it can keep the same frequency of calling >> smgrdestroyall() as background processes on primary. I don't call it for >> XLOG_CHECKPOINT_SHUTDOWN because the process is about to exit so that the >> memory will go soon, and don't call it for XLOG_CHECKPOINT_REDO because it >> seems to be a place holder only. > > > Oops. When redo XLOG_CHECKPOINT_SHUTDOWN, smgrdestroyall should also be > called, since the startup may not exit on standby. > > The patch is updated. > > — > Regards, Jingtang > > <v4-0001-Fix-SMgrRelation-object-memory-leak-during-startup-r.patch> LGTM. Best regards, Yuhang Qiu
On Tue, Sep 09, 2025 at 11:58:51AM +0800, 邱宇航 wrote: >> Oops. When redo XLOG_CHECKPOINT_SHUTDOWN, smgrdestroyall should also be >> called, since the startup may not exit on standby. >> >> The patch is updated. True that the situation sucks for the startup process, bloating its memory. That's hard to reach, still for long-running startup processes, which is a common thing, that's rather bad. > LGTM. Hmm. I was playing a bit with the startup process and, after planting a few calls to hash_get_num_entries(SMgrRelationHash) the bloat is measurable. On wraparound, it would mean that the hash table could point to past entries in this context. I can get behind the patch and the proposal of forcing a cleanup each time a checkpoint record is replayed, outside of RecoveryRestartPoint(), so I'll see about applying and backpatching that. Thanks for the report. -- Michael