Обсуждение: Checks in RegisterBackgroundWorker.()
The first patch on my "Refactoring backend fork+exec code" thread [0] changes the allocations of BackgroundWorkerList from plain malloc() to MemoryContextAlloc(PostmasterContext). However, that actually caused a segfault in worker_spi tests in EXEC_BACKEND mode. BackgroundWorkerList is a postmaster-private data structure and should not be accessed in backends. That assumption failed in RegisterBackgroundWorker(). When you put worker_spi in shared_preload_libraries, its _PG_init() function calls RegisterBackgroundWorker(), as expected. But in EXEC_BACKEND mode, the library is loaded *again* in each backend process, and each of those loads also call RegisterBackgroundWorker(). It's too late to correctly register any static background workers at that stage, but RegisterBackgroundWorker() still goes through the motions and adds the element to BackgroundWorkerList. If you change the malloc() to MemoryContextAlloc(PostmasterContext), it segfaults because PostmasterContext == NULL in a backend process. In summary, RegisterBackgroundWorker() is doing some questionable and useless work, when a shared preload library is loaded to a backend process in EXEC_BACKEND mode. Attached patches: 1. Tighten/clarify those checks. See also commit message for details. 2. The patch from the other thread to switch to MemoryContextAlloc(PostmasterContext) 3. A fix for a highly misleading comment in the same file. Any comments? [0] https://www.postgresql.org/message-id/flat/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef%40iki.fi P.S. In addition to those, I also considered these changes but didn't implement them yet: - Change RegisterBackgroundWorker() to return true/false to indicate whether the registration succeeded. Currently, the caller has no way of knowing. In many cases, I think even an ERROR and refusing to start up the server would be appropriate. But at least we should let the caller know and decide. - Add "Assert(am_postmaster)" assertions to the functions in bgworker.c that are only supposed to be called in postmaster process. The functions have good explicit comments on that, but wouldn't hurt to also assert. (There is no 'am_postmaster' flag, the equivalent is actually (IsPostmasterEnvironment && !IsUnderPostmaster), but perhaps we should define a macro or flag for that) -- Heikki Linnakangas Neon (https://neon.tech)
Вложения
On Fri, Aug 25, 2023 at 3:15 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > In summary, RegisterBackgroundWorker() is doing some questionable and > useless work, when a shared preload library is loaded to a backend > process in EXEC_BACKEND mode. Yeah. When I was working on 7389aad6 ("Use WaitEventSet API for postmaster's event loop."), I also tried to move all of the postmaster's state variables into PostmasterContext (since the only reason for that scope was the signal handler code that is now gone), and I hit a variant of this design problem. I wonder if that would be unblocked by this... https://www.postgresql.org/message-id/CA+hUKGKH_RPAo=NgPfHKj--565aL1qiVpUGdWt1_pmJehY+dmw@mail.gmail.com
On 25/08/2023 00:00, Thomas Munro wrote: > On Fri, Aug 25, 2023 at 3:15 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> In summary, RegisterBackgroundWorker() is doing some questionable and >> useless work, when a shared preload library is loaded to a backend >> process in EXEC_BACKEND mode. > > Yeah. When I was working on 7389aad6 ("Use WaitEventSet API for > postmaster's event loop."), I also tried to move all of the > postmaster's state variables into PostmasterContext (since the only > reason for that scope was the signal handler code that is now gone), > and I hit a variant of this design problem. I wonder if that would be > unblocked by this... > > https://www.postgresql.org/message-id/CA+hUKGKH_RPAo=NgPfHKj--565aL1qiVpUGdWt1_pmJehY+dmw@mail.gmail.com A-ha, yes I believe this patch will unblock that. RegisterBackgroundWorker() has no legit reason to access BackgroundWorkerList in child processes, and with these patches, it no longer does. -- Heikki Linnakangas Neon (https://neon.tech)
Here's a new version of these patches. I fixed one comment and ran pgindent, no other changes. -- Heikki Linnakangas Neon (https://neon.tech)
Вложения
On Thu, Sep 28, 2023 at 9:46 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > Here's a new version of these patches. I fixed one comment and ran > pgindent, no other changes. > Subject: [PATCH v2 1/3] Clarify the checks in RegisterBackgroundWorker. LGTM. I see it passes on CI, and I also tested locally with EXEC_BACKEND, with shared_preload_libraries=path/to/pg_prewarm.so which works fine. > Subject: [PATCH v2 2/3] Allocate Backend structs in PostmasterContext. LGTM. I checked that you preserve the behaviour on OOM (LOG), and you converted free() to pfree() in code that runs in the postmaster, but dropped it in the code that runs in the child because all children should delete PostmasterContext, making per-object pfree redundant. Good. > Subject: [PATCH v2 3/3] Fix misleading comment on StartBackgroundWorker(). LGTM. Hmm, maybe I would have called that function "BackgroundWorkerMain()" like several other similar things, but that's not important. This doesn't quite fix the problem I was complaining about earlier, but it de-confuses things. (Namely that if BackgroundWorkerList weren't a global variable, RegisterWorkerMain() wouldn't be able to find it, and if it took some kind of context pointer as an argument, _PG_init() functions wouldn't be able to provide it, unless we changed _PG_init() to take an argument, which we can't really do. Oh well.)
On 06/10/2023 13:13, Thomas Munro wrote: > On Thu, Sep 28, 2023 at 9:46 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> Subject: [PATCH v2 3/3] Fix misleading comment on StartBackgroundWorker(). > > LGTM. Hmm, maybe I would have called that function > "BackgroundWorkerMain()" like several other similar things, but that's > not important. That's a good idea. I renamed it to BackgroundWorkerMain(). Pushed with that change, thanks for the review! -- Heikki Linnakangas Neon (https://neon.tech)