Обсуждение: ReplicationSlotRelease() crashes when the instance is in the single user mode
ReplicationSlotRelease() crashes when the instance is in the single user mode
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear hackers, I found $SUBJECT when I'm playing with the single user mode. How to reproduce =========== You can reproduce the failure with below steps. ``` # Initialize an instance $ initdb -D data -U postgres # Start it as single user mode $ postgres --single -D data/ postgres PostgreSQL stand-alone backend 18devel backend> SELECT pg_create_physical_replication_slot(slot_name := 'physical_slot', immediately_reserve := true); ... backend> SELECT pg_replication_slot_advance('physical_slot', pg_current_wal_lsn()); 1: pg_replication_slot_advance (typeid = 2249, len = -1, typmod = -1, byval = f) ---- TRAP: failed Assert("slot != NULL && (slot->active_pid != 0)"), File: "../postgres/src/backend/replication/slot.c", Line:674, PID: 430860 postgres(ExceptionalCondition+0xab)[0xb86a2a] postgres(ReplicationSlotRelease+0x5a)[0x8df10b] postgres(pg_replication_slot_advance+0x330)[0x8e46ed] ... ``` Analysis ===== We trapped at below assertion in ReplicationSlotRelease(). IIUC, `slot->active_pid` is set only when the process is under the postmaster, but ReplicationSlotRelease() always requires it. ``` Assert(slot != NULL && slot->active_pid != 0); ``` Possible fix ======= Naively considered, there are two approaches to fix this. 1) set active_pid when even in the single user mode [1], or 2) ease the condition to accept the situation [2]. I'm not familiar with the mode, but [1] seems better if we want to unify codes. Thought? [1]: ``` --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -599,7 +599,7 @@ retry: SpinLockRelease(&s->mutex); } else - active_pid = MyProcPid; + s->active_pid = active_pid = MyProcPid; LWLockRelease(ReplicationSlotControlLock); /* ``` [2]: ``` --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -671,7 +671,8 @@ ReplicationSlotRelease(void) bool is_logical = false; /* keep compiler quiet */ TimestampTz now = 0; - Assert(slot != NULL && slot->active_pid != 0); + Assert(slot != NULL && + (slot->active_pid != 0 || !IsUnderPostmaster)); if (am_walsender) { ``` Best regards, Hayato Kuroda FUJITSU LIMITED
Re: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
"David G. Johnston"
Дата:
On Monday, February 17, 2025, Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote:
backend> SELECT pg_create_physical_replication_slot(slot_name := 'physical_slot', immediately_reserve := true);
Since this function releases the slot when it returns, re-acquisition, even by the same backend, must always re-associate MyProcPid to the named slot.
[1]:
```
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -599,7 +599,7 @@ retry:
SpinLockRelease(&s->mutex);
}
else
- active_pid = MyProcPid;
+ s->active_pid = active_pid = MyProcPid;
LWLockRelease(ReplicationSlotControlLock);
/*
```
This, but you cannot modify the slot without holding the spinlock.
I’d probably add an assert that the existing state of s->active_pid is either 0 or MyProcPid already. In single-user mode it mustn’t, really cannot, be anything else. But the failure here is because the SQL function does a slot release; there are probably other reasonable paths where the assignment of MyProcPid during slot creation is retained and encountered during a subsequent slot acquire.
David J.
RE: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Michael, > Perhaps a very naive question, but is there any point in authorizing > manipulations of MyReplicationSlot in single-user mode, to begin with? > With this remark, I would mean to apply a rule to > ReplicationSlotAcquire(), so as all its callers would know about that. According to the original thread [1], there was a wide consensus replication-related operations can be rejected, except the slot removal. I feel this is reasonable. Currently pg_drop_replication_slot() requires the droping slot can be acquired, so we cannot reject it in single user mode as-is. Maybe we should revive the 0002 patch in [1] then try to do that. Thought? [1]: https://www.postgresql.org/message-id/3b2f809f-326c-38dd-7a9e-897f957a4eb1%40enterprisedb.com Best regards, Hayato Kuroda FUJITSU LIMITED
RE: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Michael, > Ah, good point for the slot drop. So 0ce5cf2ef24f is claiming that > some of these manipulations are OK. I didn't suspect this one. Yeah, I think so. > Slot advancing is a very different beast, unfortunately, that may > depend on many other subsystems. For example with logical slots we > would finish by calling rm_decode, which could be outside of core. > Justifying that this operation is supported in single-user mode is > larger than what you are suggesting here.. OK. Actually, I do not have any use-cases in my mind. I played because I found a special path for single user mode in ReplicationSlotAcquire() so that I wanted to see the behavior. Best regards, Hayato Kuroda FUJITSU LIMITED
Re: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
Michael Paquier
Дата:
On Wed, Feb 19, 2025 at 02:57:34AM +0000, Hayato Kuroda (Fujitsu) wrote: > Based on the discussion, I feel it is enough to add quick error out > for SQL functions. PSA attached. I did not check how these call behave individually, just a few comments while putting my eyes on the patch. + if (!IsUnderPostmaster) + elog(ERROR, + "slot operation is prohibited in the single user mode"); elog() should not be used for failures that can be user-facing as this would not provide any translation. I'd suggest rewording the error message to provide some more context, as well, say: "cannot use %s in single-user mode", "function_name" -- Michael
Вложения
RE: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Michael, > I did not check how these call behave individually, just a few > comments while putting my eyes on the patch. > > + if (!IsUnderPostmaster) > + elog(ERROR, > + "slot operation is prohibited in the single user mode"); > > elog() should not be used for failures that can be user-facing as this > would not provide any translation. I intentionally used elog() because I thought single user mode is not user-facing. But it is OK for me to use ereport() instead. > I'd suggest rewording the error message to provide some more context, > as well, say: > "cannot use %s in single-user mode", "function_name" Fixed. PSA new version Best regards, Hayato Kuroda FUJITSU LIMITED
Вложения
Re: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
"David G. Johnston"
Дата:
On Wed, Feb 19, 2025 at 7:23 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote:
I intentionally used elog() because I thought single user mode is not user-facing.
But it is OK for me to use ereport() instead.
Single-user mode is also known as "Oh crap!" mode, something used when starting the server in multi-user model fails for non-trivial reasons.
It is also what at least one PostgreSQL Online Service (a.k.a. fiddle) uses as an implementation choice.
David J.
Re: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
Bertrand Drouvot
Дата:
Hi, On Thu, Feb 20, 2025 at 02:22:41AM +0000, Hayato Kuroda (Fujitsu) wrote: > Dear Michael, > Thanks for the report and the patch! > > I did not check how these call behave individually, just a few > > comments while putting my eyes on the patch. > > > > + if (!IsUnderPostmaster) > > + elog(ERROR, > > + "slot operation is prohibited in the single user mode"); > > > > elog() should not be used for failures that can be user-facing as this > > would not provide any translation. > > I intentionally used elog() because I thought single user mode is not user-facing. > But it is OK for me to use ereport() instead. Yeah, I think it's also about using ereport for cases that we think can happen ( and so needs to be translated). In this case, it clearly can happen so ereport() is better. > > I'd suggest rewording the error message to provide some more context, > > as well, say: > > "cannot use %s in single-user mode", "function_name" > > Fixed. PSA new version === 1 Nit: "cannot use function %s in single-user mode", "function_name"" maybe? (that would somehow match the existing ""user-defined types cannot use subscripting function %s") === 2 + "pg_copy_replication_slot"))); s/pg_copy_replication_slot/copy_replication_slot/ maybe? As it is the function that would report the error actually (could be called from pg_copy_logical_replication_slot_[a|b|c] and pg_copy_physical_replication_slot_[a|b] though). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
RE: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
"Zhijie Hou (Fujitsu)"
Дата:
On Thursday, February 20, 2025 10:23 AM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Michael, > > > I did not check how these call behave individually, just a few > > comments while putting my eyes on the patch. > > > > + if (!IsUnderPostmaster) > > + elog(ERROR, > > + "slot operation is prohibited in the single user > mode"); > > > > elog() should not be used for failures that can be user-facing as this > > would not provide any translation. > > I intentionally used elog() because I thought single user mode is not > user-facing. > But it is OK for me to use ereport() instead. > > > I'd suggest rewording the error message to provide some more context, > > as well, say: > > "cannot use %s in single-user mode", "function_name" > > Fixed. PSA new version I'm curious about the scope of the restrictions we plan to add. For example, the current patch does not include checks in the functions used for consuming changes (such as pg_logical_slot_get_changes). Was this omission intentional? Best Regards, Hou zj
Re: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
Amit Kapila
Дата:
On Thu, Feb 20, 2025 at 4:26 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > On Thursday, February 20, 2025 10:23 AM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > > > Dear Michael, > > > > > I did not check how these call behave individually, just a few > > > comments while putting my eyes on the patch. > > > > > > + if (!IsUnderPostmaster) > > > + elog(ERROR, > > > + "slot operation is prohibited in the single user > > mode"); > > > > > > elog() should not be used for failures that can be user-facing as this > > > would not provide any translation. > > > > I intentionally used elog() because I thought single user mode is not > > user-facing. > > But it is OK for me to use ereport() instead. > > > > > I'd suggest rewording the error message to provide some more context, > > > as well, say: > > > "cannot use %s in single-user mode", "function_name" > > > > Fixed. PSA new version > > I'm curious about the scope of the restrictions we plan to add. For example, > the current patch does not include checks in the functions used for consuming > changes (such as pg_logical_slot_get_changes). Was this omission intentional? > Also, what about pg_replication_origin_* APIs? Do we want to restrict those as well if we are restricting slot operations? I don't see any solid theory presented in this thread on why we should add new checks in multiple APIs restricting those in single-user mode. -- With Regards, Amit Kapila.
RE: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Amit, > Also, what about pg_replication_origin_* APIs? Do we want to restrict > those as well if we are restricting slot operations? I don't see any > solid theory presented in this thread on why we should add new checks > in multiple APIs restricting those in single-user mode. As David [1] and documentation [2] described, single user mode is typically used for initialization, debugging and recovery purpose - not for normal purposes. I think doing replication stuffs is not intended. Tom also considered like that [4]. The error I reported seemed to be introduced seven years ago (0ce5cf2), and no one has reported till now. This also implies that there are no reasons to support it. Ideally, functions described in [5] can be banned in the single-user mode, except the pg_drop_replication_slot(). Thought? [1]: https://www.postgresql.org/message-id/CAKFQuwbnBkGZAM%2B5b1DWmbqU5W7b1r-nQsw87BukrUC5WLrJXg%40mail.gmail.com [2]: https://www.postgresql.org/docs/devel/app-postgres.html [3]: https://www.postgresql.org/message-id/3b2f809f-326c-38dd-7a9e-897f957a4eb1%40enterprisedb.com [4]: https://www.postgresql.org/message-id/25024.1525789919%40sss.pgh.pa.us [5]: https://www.postgresql.org/docs/devel/functions-admin.html#FUNCTIONS-REPLICATION Best regards, Hayato Kuroda FUJITSU LIMITED
RE: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear hackers, Thanks everyone for giving comments! PSA new version. What's new: - Message format was modified to {"cannot use function %s in single-user mode", "function_name"} - Reporting funcname was adjusted based on the parameters. ternary operator was used. - Guard was added for functions in logicalfunction.c. For now, functions for replication origin and replication messages were retained. I can handle them after the discussion. Best regards, Hayato Kuroda FUJITSU LIMITED
Вложения
Re: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
Amit Kapila
Дата:
On Thu, Feb 20, 2025 at 6:21 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear hackers, > > Thanks everyone for giving comments! PSA new version. > What's new: > > - Message format was modified to {"cannot use function %s in single-user mode", "function_name"} > - Reporting funcname was adjusted based on the parameters. ternary operator was used. > - Guard was added for functions in logicalfunction.c. > Shouldn't such a check be present in the CheckSlotPermissions() kind of function to perform it in the central place? > For now, functions for replication origin and replication messages were retained. > I can handle them after the discussion. > Which other functions do we see similar restrictions? I checked "sequence manipulation functions" (1), and "Transaction ID and Snapshot Information Functions" (2) but couldn't see similar restrictions. (1) - https://www.postgresql.org/docs/current/functions-sequence.html (2) - https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-INFO-SNAPSHOT -- With Regards, Amit Kapila.
RE: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Amit, > Shouldn't such a check be present in the CheckSlotPermissions() kind > of function to perform it in the central place? OK. I checked whether we can reuse pre-existing functions, but it seems not appropriate. CheckSlotPermissions() is called even by pg_drop_replication_slot(), and CheckSlotRequirements() is not called by pg_sync_replication_slots(). I defined new function CheckSlotIsInSingleUserMode() and put the check there. I removed function name from the ereport(), but I feel it is enough - CheckSlotPermissions() and CheckSlotRequirements() do not have. > > For now, functions for replication origin and replication messages were > retained. > > I can handle them after the discussion. > > > > Which other functions do we see similar restrictions? I checked > "sequence manipulation functions" (1), and "Transaction ID and > Snapshot Information Functions" (2) but couldn't see similar > restrictions. > > (1) - https://www.postgresql.org/docs/current/functions-sequence.html > (2) - > https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-INF > O-SNAPSHOT I grepped sources and could not find explicit limitations neither. So...this might be overkill. But I still think the restriction is OK for the slot - no need to do some efforts for accepting single-user mode, just close the cover. Best regards, Hayato Kuroda FUJITSU LIMITED
Вложения
Re: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
Amit Kapila
Дата:
On Thu, Feb 27, 2025 at 1:29 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > > > > Which other functions do we see similar restrictions? I checked > > "sequence manipulation functions" (1), and "Transaction ID and > > Snapshot Information Functions" (2) but couldn't see similar > > restrictions. > > > > (1) - https://www.postgresql.org/docs/current/functions-sequence.html > > (2) - > > https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-INF > > O-SNAPSHOT > > I grepped sources and could not find explicit limitations neither. So...this might > be overkill. > Yes, this is my worry. > But I still think the restriction is OK for the slot - no need to do > some efforts for accepting single-user mode, just close the cover. > I understand that we may not have a clear use case for this to work in single-user mode. But how will we define the boundary in similar cases? I mean, we should have some rule for such exposed functions, and it should be followed uniformly. Now, if one needs a bigger or complex change to make the function work in single-user mode, then it is easy to take an exception from what is currently being followed in the code. However, if the change is as trivial as you proposed in the first email, why not go with that and make this function work in single-user mode? -- With Regards, Amit Kapila.
RE: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Amit, > I understand that we may not have a clear use case for this to work in > single-user mode. But how will we define the boundary in similar > cases? I mean, we should have some rule for such exposed functions, > and it should be followed uniformly. Now, if one needs a bigger or > complex change to make the function work in single-user mode, then it > is easy to take an exception from what is currently being followed in > the code. However, if the change is as trivial as you proposed in the > first email, why not go with that and make this function work in > single-user mode? Hmm, the opinion about this is completely opposite with other reviewers. I want to ask them again how you feel. I also added Tom who pointed out in the initial thread. Question: how you feel to restrict SQL functions for slot during the single-user mode? Nobody has considered use cases for it; we do not have concrete theory and similar cases. And needed band-aid to work seems very small. Best regards, Hayato Kuroda FUJITSU LIMITED
Re: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
Paul A Jungwirth
Дата:
Mutaamba (cc'd) and I reviewed this patch together. To summarize the patch and thread so far: The patch adds a new function, CheckSlotIsInSingleUserMode. If true then we raise an error. Otherwise we would trip an assert in ReplicationSlotRelease requiring the slot to have an active_pid, which is never set in single-user mode. We do want to support pg_drop_replication_slot in single-user mode, but not other replication slot functions. In particular, advancing the slot may call non-core code, which seems risky in single-user mode. The patch does not apply. The attached v5 fixes it with a small update to the context for the slot.h hunk. The commit message needs some explanation. Why are we doing this? What does the patch do? What alternatives did we consider? The current error message seems reasonable. The patch has no docs. If we plan to forbid some functions in single-user mode, we should document which ones (e.g. in func.sgml). There are no tests. Do we have any tests at all for single-user mode? The only one I see is in recovery/t/017_shm.pl. What if we had tests that ran the regress suite in single-user mode? Basically instead of psql we run postgres --single? Do we expect a lot of it would fail? Is it small enough the test could maintain a diff that expects those failures? I'm not saying we should do that as part of this patch, but is there any interest in that? Since single-user mode is most often used when things are broken, it's a harsh place to hit a bug. The patch lacks source comments. Something on CheckSlotIsInSingleUserMode explaining why would be good. In ReplicationSlotRelease, why only assert that `slot->active_pid != 0`? Why not assert that it's MyProcPid (even outside single-user mode)? Can one backend really release a slot while another is using it? Can you drop it? Maybe you can copy it? Are we calling CheckSlotIsInSingleUserMode from everywhere that needs it? We tried to check all the functions that call ReplicationSlotCreate, ReplicationSlotRelease, and update_synced_slots_inactive_since (since they all have these asserts). To call out a few: The pg_replication_origin_* functions don't call the Assert-ing functions. binary_upgrade_logical_slot_has_caught_up - Not available from the command line. WalSndErrorCleanup - Probably not used in single-user mode? We also see that PostgresMain calls ReplicationSlotRelease from its error handler. Presumably single-user mode would be executing PostgresSingleUserMain instead, but still perhaps we should call CheckSlotIsInSingleUserMode here just as a sanity-check? On Thu, Feb 27, 2025 at 9:42 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > I understand that we may not have a clear use case for this to work in > > single-user mode. But how will we define the boundary in similar > > cases? I mean, we should have some rule for such exposed functions, > > and it should be followed uniformly. Now, if one needs a bigger or > > complex change to make the function work in single-user mode, then it > > is easy to take an exception from what is currently being followed in > > the code. However, if the change is as trivial as you proposed in the > > first email, why not go with that and make this function work in > > single-user mode? > > Hmm, the opinion about this is completely opposite with other reviewers. I want > to ask them again how you feel. I also added Tom who pointed out in the initial > thread. > > Question: how you feel to restrict SQL functions for slot during the single-user > mode? Nobody has considered use cases for it; we do not have concrete theory and > similar cases. And needed band-aid to work seems very small. No one has replied yet, but I vote for forbidding these functions. I can't articulate a full theory for which functions we restrict in single-user mode, and I think we should permit as much as possible. But any theory would weigh usefulness against risk. Here no one has found a use-case for several years, and executing user-supplied code in an unusual, higher-risk scenario seems like a good reason to be cautious. Also without tests for single-user mode, I'm extra wary. If single-user support is desired by someone, they could submit a patch. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Вложения
Re: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
Robert Haas
Дата:
On Tue, Aug 5, 2025 at 12:51 PM Paul A Jungwirth <pj@illuminatedcomputing.com> wrote: > No one has replied yet, but I vote for forbidding these functions. I > can't articulate a full theory for which functions we restrict in > single-user mode, and I think we should permit as much as possible. > But any theory would weigh usefulness against risk. Here no one has > found a use-case for several years, and executing user-supplied code > in an unusual, higher-risk scenario seems like a good reason to be > cautious. Also without tests for single-user mode, I'm extra wary. If > single-user support is desired by someone, they could submit a patch. I don't feel good about the direction from which this patch is attacking the problem. The original stack trace looks like this: postgres(ExceptionalCondition+0xab)[0xb86a2a] postgres(ReplicationSlotRelease+0x5a)[0x8df10b] postgres(pg_replication_slot_advance+0x330)[0x8e46ed] ReplicationSlotRelease() is called near the end of pg_replication_slot_advance. IIUC, this means that we successfully acquired the slot and did all the work, and then failed a sanity check afterward. So, I see two possibilities: either it's not OK to acquire a replication slot in single-user mode, in which case this should have failed earlier, or it's also OK to release it, in which case this should not have failed at all. Interestingly, I see that ReplicationSlotAcquire() features a special case whose specific purpose is to allow it to work in single-user mode, which makes me rather suspect that the latter is correct. It is possible that it was intentional that acquiring works in single-user mode and dropping (other than via ReplicationSlotDropAcquired) does not, but so far I see nothing in the comments suggesting that. In any event, that inconsistency between how ReplicationSlotAcquire() works and how ReplicationSlotRelease() works seems to me to be the core thing that should be fixed here. If the selected fix also requires error checks in higher-level functions, as added by this patch, then well and good. But right now, the proposed patch isn't trying to fix the definitional inconsistency in the underlying layer at all, and is instead just installing guard rails to try to keep anyone from bumping into it. That seems like a bad direction to go -- it feels like we're blocking access to potentially-useful functionality because of an assertion failure that might just be slightly sloppy coding. -- Robert Haas EDB: http://www.enterprisedb.com
RE: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Robert, Paul, Mutaamba, Sorry for the late reply. I was in the business trip. > I don't feel good about the direction from which this patch is > attacking the problem. The original stack trace looks like this: > > postgres(ExceptionalCondition+0xab)[0xb86a2a] > postgres(ReplicationSlotRelease+0x5a)[0x8df10b] > postgres(pg_replication_slot_advance+0x330)[0x8e46ed] > > ReplicationSlotRelease() is called near the end of > pg_replication_slot_advance. IIUC, this means that we successfully > acquired the slot and did all the work, and then failed a sanity check > afterward. Exactly. The slot is trying to be released after most of tasks are done. ``` #6 0x00000000008fd20b in pg_replication_slot_advance (fcinfo=0x23b4428) at ../postgres/src/backend/replication/slotfuncs.c:588 588 ReplicationSlotRelease(); (gdb) list 583 * advancing potentially done. 584 */ 585 ReplicationSlotsComputeRequiredXmin(false); 586 ReplicationSlotsComputeRequiredLSN(); 587 588 ReplicationSlotRelease(); ``` > So, I see two possibilities: either it's not OK to acquire > a replication slot in single-user mode, in which case this should have > failed earlier, or it's also OK to release it, in which case this > should not have failed at all. Interestingly, I see that > ReplicationSlotAcquire() features a special case whose specific > purpose is to allow it to work in single-user mode, which makes me > rather suspect that the latter is correct. Agreed. > It is possible that it was > intentional that acquiring works in single-user mode and dropping > (other than via ReplicationSlotDropAcquired) does not, but so far I > see nothing in the comments suggesting that. > > In any event, that inconsistency between how ReplicationSlotAcquire() > works and how ReplicationSlotRelease() works seems to me to be the > core thing that should be fixed here. If the selected fix also > requires error checks in higher-level functions, as added by this > patch, then well and good. But right now, the proposed patch isn't > trying to fix the definitional inconsistency in the underlying layer > at all, and is instead just installing guard rails to try to keep > anyone from bumping into it. That seems like a bad direction to go -- To confirm; your point is that we should firstly fix to allow acquiring/releasing slots in the mode, then consider additional guards, is it right? Valid point. We've not deeply considered but ideas described in [1] could be used. > it feels like we're blocking access to potentially-useful > functionality because of an assertion failure that might just be > slightly sloppy coding. I still cannot find enough use-cases to allow manipulating slots, though. [1]: https://www.postgresql.org/message-id/OSCPR01MB14966ED588A0328DAEBE8CB25F5FA2%40OSCPR01MB14966.jpnprd01.prod.outlook.com Best regards, Hayato Kuroda FUJITSU LIMITED
Re: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
Robert Haas
Дата:
On Wed, Aug 13, 2025 at 3:21 AM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > To confirm; your point is that we should firstly fix to allow acquiring/releasing > slots in the mode, then consider additional guards, is it right? Valid point. Yes. > I still cannot find enough use-cases to allow manipulating slots, though. The use case for single-user mode is quite limited in general, but most things work in single-user mode unless they are something that intrinsically can't. For example, parallel query cannot be used in single-user mode, because there is only one process. Replication cannot work, for the same reason. But manual slot operations can work, so I do not think it is good to arbitrarily prohibit them. We do not need a reason to specifically allow them; it is enough that there is no good reason for them to be blocked. -- Robert Haas EDB: http://www.enterprisedb.com
RE: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Paul, Mutaamba, Here are updated patches. Based on the Robert's suggestion, I separated into two parts. 0001 fixed the original issue and 0002 prohibited the slot manipulation in single-user mode. I want to focus on 0001 first because on one would argue it. All comments from you were included in 0002. > The patch does not apply. The attached v5 fixes it with a small update > to the context for the slot.h hunk. Oh, I didn't realize because cfbot said OK. Anyway, new patch could be applied atop HEAD. > The commit message needs some explanation. Why are we doing this? What > does the patch do? What alternatives did we consider? Updated. How do you feel? > The patch has no docs. If we plan to forbid some functions in > single-user mode, we should document which ones (e.g. in func.sgml). A statement was added. > There are no tests. Do we have any tests at all for single-user mode? > The only one I see is in recovery/t/017_shm.pl. What if we had tests > that ran the regress suite in single-user mode? Basically instead of > psql we run postgres --single? Do we expect a lot of it would fail? Is > it small enough the test could maintain a diff that expects those > failures? I'm not saying we should do that as part of this patch, but > is there any interest in that? Since single-user mode is most often > used when things are broken, it's a harsh place to hit a bug. > > The patch lacks source comments. Something on > CheckSlotIsInSingleUserMode explaining why would be good. Comments were added in all caller functions. > In ReplicationSlotRelease, why only assert that `slot->active_pid != > 0`? Why not assert that it's MyProcPid (even outside single-user > mode)? Can one backend really release a slot while another is using > it? Can you drop it? Maybe you can copy it? You meant we should assert `slot->active_pid == MyProcPid`, right? Naively considered it can be fix, but it is out-of-scope of the thread. It is existing code, should be discussed verified in another place. > Are we calling CheckSlotIsInSingleUserMode from everywhere that needs > it? We tried to check all the functions that call > ReplicationSlotCreate, ReplicationSlotRelease, and > update_synced_slots_inactive_since (since they all have these > asserts). To call out a few: > > The pg_replication_origin_* functions don't call the Assert-ing functions. You asked that whether we should call CheckSlotIsInSingleUserMode in pg_replication_origin_* APIs, right? It depends on the policy. If we want to prohibit all replication-related command, it should be. It is still under discussion, so I did not touch. > binary_upgrade_logical_slot_has_caught_up - Not available from the command > line. I found that we can in both single-user and binary-upgrade mode. At that time the function could be called: ``` $ postgres --single -D data/ postgres -b PostgreSQL stand-alone backend 19devel backend> SELECT binary_upgrade_logical_slot_has_caught_up('test'); 1: binary_upgrade_logical_slot_has_caught_up (typeid = 16, len = 1, typmod = -1, byval = t) ---- LOG: starting logical decoding for slot "test" DETAIL: Streaming transactions committing after 0/0182C640, reading WAL from 0/0182C608. STATEMENT: SELECT binary_upgrade_logical_slot_has_caught_up('test'); LOG: logical decoding found consistent point at 0/0182C608 DETAIL: There are no running transactions. STATEMENT: SELECT binary_upgrade_logical_slot_has_caught_up('test'); 1: binary_upgrade_logical_slot_has_caught_up = "f" (typeid = 16, len = 1, typmod = -1, byval = t) ---- ``` I don't think this is a realistic case, but the check was added. > WalSndErrorCleanup - Probably not used in single-user mode? IIUC we won't reach in the single-user mode. Hence it was not called intentionally. > We also see that PostgresMain calls ReplicationSlotRelease from its > error handler. Presumably single-user mode would be executing > PostgresSingleUserMain instead, ISTM, PostgresMain() would be called from the PostgresSingleUserMain(). > but still perhaps we should call > CheckSlotIsInSingleUserMode here just as a sanity-check? I feel this code is to release the acquired slot when ERROR was raised. Since we have already covered the entrance, it is not needed. [1]: https://www.postgresql.org/message-id/CA%2BTgmobspWhoMysNHL9b3C9xi%3DOpHdBYhtAgDH4N_A2foyjN-w%40mail.gmail.com Best regards, Hayato Kuroda FUJITSU LIMITED
Вложения
RE: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Robert, > The use case for single-user mode is quite limited in general, but > most things work in single-user mode unless they are something that > intrinsically can't. For example, parallel query cannot be used in > single-user mode, because there is only one process. Replication > cannot work, for the same reason. But manual slot operations can work, > so I do not think it is good to arbitrarily prohibit them. We do not > need a reason to specifically allow them; it is enough that there is > no good reason for them to be blocked. Thanks for the confirmation. Till now, it is mostly fifty-fifty in the community whether we should allow the manipulation in the single-user mode. I did split a patch to fix the initial issue [1]. Let's focus on the 0001 patch firstly. [1]: https://www.postgresql.org/message-id/OSCPR01MB14966B39B3ED21229EAEFA10FF531A%40OSCPR01MB14966.jpnprd01.prod.outlook.com Best regards, Hayato Kuroda FUJITSU LIMITED
Re: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
Michael Paquier
Дата:
On Mon, Aug 18, 2025 at 08:47:27AM +0000, Hayato Kuroda (Fujitsu) wrote: > Here are updated patches. Based on the Robert's suggestion, I separated into two parts. > 0001 fixed the original issue and 0002 prohibited the slot manipulation in > single-user mode. I want to focus on 0001 first because on one would argue it. FWIW, I had my doubts at the beginning on the thread about the use-case, but after re-reading the whole thing I am going to side with Robert's opinion that if a fix to authorize some of the operations is simple, then let's just authorize the case(s) and have the fix. Echoing with Paul, what I find as critically lacking from the proposed patches are regression tests to check and validate the behaviors you are looking for in the patches. Please add some in the shape of perl commands that use direct --single commands of postgres. We have cases that do so currently in the tree: sepgsql and shm. I don't see a reason why we could not do that as well here with run_log(). It is also possible to pass a -c exit_on_error=true to force failures. Not having a test for the single-user case of a slot drop is of course something that we are lacking now. As we are playing with this area of the code, let's add something for this case as well. -- Michael
Вложения
RE: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Michael, Paul, Mutaamba, > FWIW, I had my doubts at the beginning on the thread about the > use-case, but after re-reading the whole thing I am going to side with > Robert's opinion that if a fix to authorize some of the operations is > simple, then let's just authorize the case(s) and have the fix. OK, thanks for the clarification. Let's focus on 0001 now. > Echoing with Paul, what I find as critically lacking from the proposed > patches are regression tests to check and validate the behaviors you > are looking for in the patches. Please add some in the shape of perl > commands that use direct --single commands of postgres. We have cases > that do so currently in the tree: sepgsql and shm. I don't see a > reason why we could not do that as well here with run_log(). It is > also possible to pass a -c exit_on_error=true to force failures. > > Not having a test for the single-user case of a slot drop is of course > something that we are lacking now. As we are playing with this area > of the code, let's add something for this case as well. I added a first version of the test. It could work on my env (Linux). Since I cannot come up the appropriate place, I introduced new test file in recovery test. creating/dropping/advancing a slot, and doing a logical decoding is tested. Since standby mode is not supported by the single-user instance, I did not test the slot synchronization. I have not known that. 0002 was updated accordingly just in case. In the patch some functions were expected to fail. Also, a check for pg_sync_replication_slots() was removed because we cannot call the function in the first place. IIUC, 0001 can be backpatched for all supported branches. I will create for them after patch would be a good shape. Best regards, Hayato Kuroda FUJITSU LIMITED
Вложения
RE: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
"Hayato Kuroda (Fujitsu)"
Дата:
> I added a first version of the test. It could work on my env (Linux). Best regards, Hayato Kuroda FUJITSU LIMITED
Re: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
Michael Paquier
Дата:
On Tue, Aug 19, 2025 at 10:26:22AM +0000, Hayato Kuroda (Fujitsu) wrote: > I added a first version of the test. It could work on my env (Linux). You could also validate that using the CI, assuming that you have a cloned repo on github with the CI enabled. See src/tools/ci/README. I use that a lot with local branches for pre-commit validations where things could break across the board. > Since I cannot come up the appropriate place, I introduced new test file in > recovery test. creating/dropping/advancing a slot, and doing a logical decoding > is tested. Since standby mode is not supported by the single-user instance, > I did not test the slot synchronization. I have not known that. Hmm. src/test/modules/test_misc/ makes more sense to me here. (No need to send an updated patch just for that.) > IIUC, 0001 can be backpatched for all supported branches. I will create for them > after patch would be a good shape. Thanks, I'll check all that tomorrow. -- Michael
Вложения
RE: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Michael, > You could also validate that using the CI, assuming that you have a > cloned repo on github with the CI enabled. See src/tools/ci/README. > I use that a lot with local branches for pre-commit validations where > things could break across the board. I have run the tests on my CI and found that windows cannot accept the test. Per attached output, starting the postgres with the single-user mode was failed because the user had admin permissions [1]. However, pg_ctl start command could be done by the same user. I'm not familiar with Windows, but according to your blogpost [2], Windows seems to allows running some server command by generating the restricted context for running postgres commands. In case of single-user mode, however, we directly run the given command with the current user. So, there is a possibility that only instance can boot only by pg_ctl. Based on that, I want to skip the test on windows platform rather than modifying the ci environment. IIUC, other tests which uses --single cannot work on windows as well. > Hmm. src/test/modules/test_misc/ makes more sense to me here. (No > need to send an updated patch just for that.) I did not notice the directory and looked nice. Updated. [1]: ``` # Running: postgres --single -F -c exit_on_error=true -D C:\cirrus\build/testrun/recovery/049_slots_in_single_user_mode\data/t_049_slots_in_single_user_mode_node_data/pgdata postgres Execution of PostgreSQL by a user with administrative permissions is not permitted. The server must be started under an unprivileged user ID to prevent possible system security compromises. See the documentation for more information on how to properly start the server. ``` [2]: https://paquier.xyz/postgresql-2/postgres-utilities-restricted-token/ Best regards, Hayato Kuroda FUJITSU LIMITED
Вложения
Re: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
Michael Paquier
Дата:
On Tue, Aug 19, 2025 at 12:52:24PM +0000, Hayato Kuroda (Fujitsu) wrote: > I have run the tests on my CI and found that windows cannot accept the test. > Per attached output, starting the postgres with the single-user mode was failed > because the user had admin permissions [1]. However, pg_ctl start command could be > done by the same user. > I'm not familiar with Windows, but according to your blogpost [2], Windows seems > to allows running some server command by generating the restricted context for > running postgres commands. In case of single-user mode, however, we directly run > the given command with the current user. So, there is a possibility that only > instance can boot only by pg_ctl. > Based on that, I want to skip the test on windows platform rather than modifying > the ci environment. > > IIUC, other tests which uses --single cannot work on windows as well. This has reminded me of 1a9d80282811, where I have used a trick with pg_ctl to allow single-user mode executions to bypass the privileged account permission failure, but I don't see what we could do here as we should pass down commands to postgres for execution via stdin for IPC::Run. We don't really have anything WIN32-specific, so skipping the tests on Windows sounds fine by me. One thing that you have forgotten is to update EXTRA_INSTALL to add test_decoding, or a make check in the module fails. I have expanded a bit more the tests, as for example we have paths based on active_pid for temporary slots, which could matter at some point. I have added a few more things with physical slots. With all that in mind, I don't really see the point of 0002. There was also a mention of replication origins upthread, but we don't really have anything specific to a shared state or a validation in this area AFAIK, so I don't think that test additions are worth it in this case. -- Michael
Вложения
RE: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Michael, I found you've pushed 0001. Thanks! I've considered to create patches for other branches after you said OK, but you seemed to push all of them directly. > This has reminded me of 1a9d80282811, where I have used a trick with > pg_ctl to allow single-user mode executions to bypass the privileged > account permission failure, but I don't see what we could do here as > we should pass down commands to postgres for execution via stdin for > IPC::Run. We don't really have anything WIN32-specific, so skipping > the tests on Windows sounds fine by me. OK, thanks for the confirmation. > One thing that you have forgotten is to update EXTRA_INSTALL to add > test_decoding, or a make check in the module fails. I have expanded a > bit more the tests, as for example we have paths based on active_pid > for temporary slots, which could matter at some point. I have added a > few more things with physical slots. Oh, I didn't recognize because I usually use meson, and my CI said OK. Good update! > With all that in mind, I don't really see the point of 0002. There > was also a mention of replication origins upthread, but we don't > really have anything specific to a shared state or a validation in > this area AFAIK, so I don't think that test additions are worth it in > this case. Let me clarify your point. For now, there are no decisions to prohibit origin manipulations. 0002 only restricts to handle slots in the single-user mode. Did you say that we do not have to do tests to ensure these SQL functions are prohibited in the mode? (Since senior members in this thread are against to restrict, I'm planning to stop working 0002 after creating a final patch for the reference purpose) Best regards, Hayato Kuroda FUJITSU LIMITED
Re: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
Michael Paquier
Дата:
On Wed, Aug 20, 2025 at 07:42:11AM +0000, Hayato Kuroda (Fujitsu) wrote: > Let me clarify your point. For now, there are no decisions to prohibit origin > manipulations. 0002 only restricts to handle slots in the single-user mode. Did > you say that we do not have to do tests to ensure these SQL functions are > prohibited in the mode? I am saying that there is little point in having tests for the origin functions in single-user mode as these don't do anything really fancy with global states (there's a acquired_by of course, and no specific IsUnderPostmaster patch), contrary to the replication slots. I am not convinced that there is any need to restrict them, either. If somebody shows a reason to justify such a move, we could argue about it, of course. -- Michael
Вложения
RE: ReplicationSlotRelease() crashes when the instance is in the single user mode
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Michael and hackers, This is a wrap-up of this thread. I'm planning to close the thread once. > I am saying that there is little point in having tests for the origin > functions in single-user mode as these don't do anything really > fancy with global states (there's a acquired_by of course, and no > specific IsUnderPostmaster patch), contrary to the replication slots. > I am not convinced that there is any need to restrict them, either. > If somebody shows a reason to justify such a move, we could argue > about it, of course. OK, thanks for the clarification. Yes, there are no special path for single-user mode for origins. Now, senior members recently appeared here are not so motivated to restrict some operations in single-user mode unless it cannot be done in principle. Based on that, I decided to close the thread once until someone raised another point. For the reference purpose, I attached a patch which prohibits slot manipulation in single-user mode. Anyone can update and use the patch for any purposes. Best regards, Hayato Kuroda FUJITSU LIMITED