Обсуждение: Refactoring backend fork+exec code

Поиск
Список
Период
Сортировка

Refactoring backend fork+exec code

От
Heikki Linnakangas
Дата:
I started to look at the code in postmaster.c related to launching child 
processes. I tried to reduce the difference between EXEC_BACKEND and 
!EXEC_BACKEND code paths, and put the code that needs to differ behind a 
better abstraction. I started doing this to help with implementing 
multi-threading, but it doesn't introduce anything thread-related yet 
and I think this improves readability anyway.

This is still work-inprogress, especially the last, big, patch in the 
patch set. Mainly, I need to clean up the comments in the new 
launch_backend.c file. But the other patches are in pretty good shape, 
and if you ignore launch_backend.c, you can see the effect on the other 
source files.

With these patches, there is a new function for launching a postmaster 
child process:

pid_t postmaster_child_launch(PostmasterChildType child_type, char 
*startup_data, size_t startup_data_len, ClientSocket *client_sock);

This function hides the differences between EXEC_BACKEND and 
!EXEC_BACKEND cases.

In 'startup_data', the caller can pass a blob of data to the child 
process, with different meaning for different kinds of child processes. 
For a backend process, for example, it's used to pass the CAC_state, 
which indicates whether the backend accepts the connection or just sends 
"too many clients" error. And for background workers, it's used to pass 
the BackgroundWorker struct. The startup data is passed to the child 
process in the

ClientSocket is a new struct holds a socket FD, and the local and remote 
address info. Before this patch set, postmaster initializes the Port 
structs but only fills in those fields in it. With this patch set, we 
have a new ClientSocket struct just for those fields, which makes it 
more clear which fields are initialized where.

I haven't done much testing yet, and no testing at all on Windows, so 
that's probably still broken.

-- 
Heikki Linnakangas
Neon (https://neon.tech)
Вложения

Re: Refactoring backend fork+exec code

От
"Tristan Partin"
Дата:
> From 1d89eec53c7fefa7a4a8c011c9f19e3df64dc436 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Mon, 12 Jun 2023 16:33:20 +0300
> Subject: [PATCH 4/9] Use FD_CLOEXEC on ListenSockets

> @@ -831,7 +834,8 @@ StreamConnection(pgsocket server_fd, Port *port)
>  void
>  StreamClose(pgsocket sock)
>  {
> -       closesocket(sock);
> +       if (closesocket(sock) != 0)
> +               elog(LOG, "closesocket failed: %m");
>  }
>
>  /*

Do you think WARNING would be a more appropriate log level?

> From 2f518be9e96cfed1a1a49b4af8f7cb4a837aa784 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Mon, 12 Jun 2023 18:07:54 +0300
> Subject: [PATCH 5/9] Move "too many clients already" et al. checks from
>  ProcessStartupPacket.

This seems like a change you could push already (assuming another
maintainer agrees with you), which makes reviews for this patchset even
easier.

> From c25b67c045018a2bf05e6ff53819d26e561fc83f Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Sun, 18 Jun 2023 14:11:16 +0300
> Subject: [PATCH 6/9] Pass CAC as argument to backend process.

Could you expand a bit more on why it is better to pass it as a separate
argument? Does it not fit well conceptually in struct Port?

> @@ -4498,15 +4510,19 @@ postmaster_forkexec(int argc, char *argv[])
>   * returns the pid of the fork/exec'd process, or -1 on failure
>   */
>  static pid_t
> -backend_forkexec(Port *port)
> +backend_forkexec(Port *port, CAC_state cac)
>  {
> -       char       *av[4];
> +       char       *av[5];
>         int                     ac = 0;
> +       char            cacbuf[10];
>
>         av[ac++] = "postgres";
>         av[ac++] = "--forkbackend";
>         av[ac++] = NULL;                        /* filled in by internal_forkexec */
>
> +       snprintf(cacbuf, sizeof(cacbuf), "%d", (int) cac);
> +       av[ac++] = cacbuf;

Might be worth a sanity check that there wasn't any truncation into
cacbuf, which is an impossibility as the code is written, but still
useful for catching a future developer error.

Is it worth adding a command line option at all instead of having the
naked positional argument? It would help anybody who might read the
command line what the seemingly random integer stands for.

> @@ -4910,7 +4926,10 @@ SubPostmasterMain(int argc, char *argv[])
>         /* Run backend or appropriate child */
>         if (strcmp(argv[1], "--forkbackend") == 0)
>         {
> -               Assert(argc == 3);              /* shouldn't be any more args */
> +               CAC_state       cac;
> +
> +               Assert(argc == 4);
> +               cac = (CAC_state) atoi(argv[3]);

Perhaps an assert or full error checking that atoi succeeds would be
useful for similar reasons to my previous comment.

> From 658cba5cdb2e5c45faff84566906d2fcaa8a3674 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Mon, 12 Jun 2023 18:03:03 +0300
> Subject: [PATCH 7/9] Remove ConnCreate and ConnFree, and allocate Port in
>  stack.

Again, seems like another patch that could be pushed separately assuming
others don't have any comments.

> From 65384b9a6cfb3b9b589041526216e0f64d64bea5 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Sun, 18 Jun 2023 13:56:44 +0300
> Subject: [PATCH 8/9] Introduce ClientSocket, rename some funcs

> @@ -1499,7 +1499,7 @@ CloseServerPorts(int status, Datum arg)
>         {
>                 if (ListenSocket[i] != PGINVALID_SOCKET)
>                 {
> -                       StreamClose(ListenSocket[i]);
> +                       closesocket(ListenSocket[i]);
>                         ListenSocket[i] = PGINVALID_SOCKET;
>                 }
>         }

I see you have been adding log messages in the case of closesocket()
failing. Do you think it is worth doing here as well?

One strange part about this patch is that in patch 4, you edit
StreamClose() to emit a log message in the case of closesocket()
failure, but then this patch just completely removes it.

> @@ -4407,11 +4420,11 @@ BackendInitialize(Port *port, CAC_state cac)
>   *             Doesn't return at all.
>   */
>  static void
> -BackendRun(Port *port)
> +BackendRun(void)
>  {
>         /*
> -        * Create a per-backend PGPROC struct in shared memory. We must do
> -        * this before we can use LWLocks (in AttachSharedMemoryAndSemaphores).
> +        * Create a per-backend PGPROC struct in shared memory. We must do this
> +        * before we can use LWLocks (in AttachSharedMemoryAndSemaphores).
>          */
>         InitProcess();

This comment reflow probably fits better in the patch that added the
AttachSharedMemoryAndSemaphores function.

> From b33cfeb28a5419045acb659a01410b2b463bea3e Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Sun, 18 Jun 2023 13:59:48 +0300
> Subject: [PATCH 9/9] Refactor postmaster child process launching

> - Move code related to launching backend processes to new source file,
>   process_start.c

Since this seems pretty self-contained, my be easier to review if this
was its own commit.

> - Refactor the mechanism of passing informaton from the parent to
>   child process. Instead of using different command-line arguments
>   when launching the child process in EXEC_BACKEND mode, pass a
>   variable-length blob of data along with all the global
>   variables. The contents of that blob depends on the kind of child
>   process being launched. In !EXEC_BACKEND mode, we use the same blob,
>   but it's simply inherited from the parent to child process.

Same with this. Perhaps others would disagree.

> +const          PMChildEntry entry_kinds[] = {
> +       {"backend", BackendMain, true},
> +
> +       {"autovacuum launcher", AutoVacLauncherMain, true},
> +       {"autovacuum worker", AutoVacWorkerMain, true},
> +       {"bgworker", BackgroundWorkerMain, true},
> +       {"syslogger", SysLoggerMain, false},
> +
> +       {"startup", StartupProcessMain, true},
> +       {"bgwriter", BackgroundWriterMain, true},
> +       {"archiver", PgArchiverMain, true},
> +       {"checkpointer", CheckpointerMain, true},
> +       {"wal_writer", WalWriterMain, true},
> +       {"wal_receiver", WalReceiverMain, true},
> +};

It seems like this could be made static. I didn't see it getting exposed
in a header file anywhere, but I also admit that I can be blind at
times.

I need to spend more time looking at this last patch.

Nice work so far!

--
Tristan Partin
Neon (https://neon.tech)



Re: Refactoring backend fork+exec code

От
Andres Freund
Дата:
Hi,

On 2023-06-18 14:22:33 +0300, Heikki Linnakangas wrote:
> I started to look at the code in postmaster.c related to launching child
> processes. I tried to reduce the difference between EXEC_BACKEND and
> !EXEC_BACKEND code paths, and put the code that needs to differ behind a
> better abstraction. I started doing this to help with implementing
> multi-threading, but it doesn't introduce anything thread-related yet and I
> think this improves readability anyway.

Yes please!  This code is absolutely awful.


> From 0cb6f8d665980d30a5d2a29013000744f16bf813 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Sun, 18 Jun 2023 11:00:21 +0300
> Subject: [PATCH 3/9] Refactor CreateSharedMemoryAndSemaphores.
> 
> Moves InitProcess calls a little later in EXEC_BACKEND case.

What's the reason for this part? ISTM that we'd really want to get away from
plastering duplicated InitProcess() etc everywhere.

I think this might be easier to understand if you just changed did the
CreateSharedMemoryAndSemaphores() -> AttachSharedMemoryAndSemaphores() piece
in this commit, and the rest later.


> +void
> +AttachSharedMemoryAndSemaphores(void)
> +{
> +    /* InitProcess must've been called already */

Perhaps worth an assertion to make it easier to see that the order is wrong?


> From 1d89eec53c7fefa7a4a8c011c9f19e3df64dc436 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Mon, 12 Jun 2023 16:33:20 +0300
> Subject: [PATCH 4/9] Use FD_CLOEXEC on ListenSockets
> 
> We went through some effort to close them in the child process. Better to
> not hand them down to the child process in the first place.

I think Thomas has a larger version of this patch:
https://postgr.es/m/CA%2BhUKGKPNFcfBQduqof4-7C%3DavjcSfdkKBGvQoRuAvfocnvY0A%40mail.gmail.com



> From 65384b9a6cfb3b9b589041526216e0f64d64bea5 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Sun, 18 Jun 2023 13:56:44 +0300
> Subject: [PATCH 8/9] Introduce ClientSocket, rename some funcs
> 
> - Move more of the work on a client socket to the child process.
> 
> - Reduce the amount of data that needs to be passed from postmaster to
>   child. (Used to pass a full Port struct, although most of the fields were
>   empty. Now we pass the much slimmer ClientSocket.)

I think there might be extensions accessing Port. Not sure if it's worth
worrying about, but ...


> --- a/src/backend/postmaster/autovacuum.c
> +++ b/src/backend/postmaster/autovacuum.c
> @@ -476,8 +476,8 @@ AutoVacLauncherMain(int argc, char *argv[])
>      pqsignal(SIGCHLD, SIG_DFL);
>  
>      /*
> -     * Create a per-backend PGPROC struct in shared memory. We must do
> -     * this before we can use LWLocks.
> +     * Create a per-backend PGPROC struct in shared memory. We must do this
> +     * before we can use LWLocks.
>       */
>      InitProcess();
>

Don't think this was intentional?


> From b33cfeb28a5419045acb659a01410b2b463bea3e Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Sun, 18 Jun 2023 13:59:48 +0300
> Subject: [PATCH 9/9] Refactor postmaster child process launching
> 
> - Move code related to launching backend processes to new source file,
>   process_start.c

I think you might have renamed this to launch_backend.c?


> - Introduce new postmaster_child_launch() function that deals with the
>   differences between EXEC_BACKEND and fork mode.
> 
> - Refactor the mechanism of passing informaton from the parent to
>   child process. Instead of using different command-line arguments
>   when launching the child process in EXEC_BACKEND mode, pass a
>   variable-length blob of data along with all the global
>   variables. The contents of that blob depends on the kind of child
>   process being launched. In !EXEC_BACKEND mode, we use the same blob,
>   but it's simply inherited from the parent to child process.


> +const        PMChildEntry entry_kinds[] = {
> +    {"backend", BackendMain, true},
> +
> +    {"autovacuum launcher", AutoVacLauncherMain, true},
> +    {"autovacuum worker", AutoVacWorkerMain, true},
> +    {"bgworker", BackgroundWorkerMain, true},
> +    {"syslogger", SysLoggerMain, false},
> +
> +    {"startup", StartupProcessMain, true},
> +    {"bgwriter", BackgroundWriterMain, true},
> +    {"archiver", PgArchiverMain, true},
> +    {"checkpointer", CheckpointerMain, true},
> +    {"wal_writer", WalWriterMain, true},
> +    {"wal_receiver", WalReceiverMain, true},
> +};

I'd assign them with the PostmasterChildType as index, so there's no danger of
getting out of order.

const                PMChildEntry entry_kinds = {
  [PMC_AV_LAUNCHER] = {"autovacuum launcher", AutoVacLauncherMain, true},
  ...
}

or such should work.


I'd also use designated initializers for the fields, it's otherwise hard to
know what true means etc.

I think it might be good to put more into array. If we e.g. knew whether a
particular child type is a backend-like, and aux process or syslogger, we
could avoid the duplicated InitAuxiliaryProcess(),
MemoryContextDelete(PostmasterContext) etc calls everywhere.


> +/*
> + * SubPostmasterMain -- Get the fork/exec'd process into a state equivalent
> + *            to what it would be if we'd simply forked on Unix, and then
> + *            dispatch to the appropriate place.
> + *
> + * The first two command line arguments are expected to be "--forkFOO"
> + * (where FOO indicates which postmaster child we are to become), and
> + * the name of a variables file that we can read to load data that would
> + * have been inherited by fork() on Unix.  Remaining arguments go to the
> + * subprocess FooMain() routine. XXX
> + */
> +void
> +SubPostmasterMain(int argc, char *argv[])
> +{
> +    PostmasterChildType child_type;
> +    char       *startup_data;
> +    size_t        startup_data_len;
> +
> +    /* In EXEC_BACKEND case we will not have inherited these settings */
> +    IsPostmasterEnvironment = true;
> +    whereToSendOutput = DestNone;
> +
> +    /* Setup essential subsystems (to ensure elog() behaves sanely) */
> +    InitializeGUCOptions();
> +
> +    /* Check we got appropriate args */
> +    if (argc < 3)
> +        elog(FATAL, "invalid subpostmaster invocation");
> +
> +    if (strncmp(argv[1], "--forkchild=", 12) == 0)
> +    {
> +        char       *entry_name = argv[1] + 12;
> +        bool        found = false;
> +
> +        for (int idx = 0; idx < lengthof(entry_kinds); idx++)
> +        {
> +            if (strcmp(entry_kinds[idx].name, entry_name) == 0)
> +            {
> +                child_type = idx;
> +                found = true;
> +                break;
> +            }
> +        }
> +        if (!found)
> +            elog(ERROR, "unknown child kind %s", entry_name);
> +    }


Hm, shouldn't we error out when called without --forkchild?


> +/* Save critical backend variables into the BackendParameters struct */
> +#ifndef WIN32
> +static bool
> +save_backend_variables(BackendParameters *param, ClientSocket *client_sock)
> +#else

There's so much of this kind of thing. Could we hide it in a struct or such
instead of needing ifdefs everywhere?



> --- a/src/backend/storage/ipc/shmem.c
> +++ b/src/backend/storage/ipc/shmem.c
> @@ -144,6 +144,8 @@ InitShmemAllocation(void)
>      /*
>       * Initialize ShmemVariableCache for transaction manager. (This doesn't
>       * really belong here, but not worth moving.)
> +     *
> +     * XXX: we really should move this
>       */
>      ShmemVariableCache = (VariableCache)
>          ShmemAlloc(sizeof(*ShmemVariableCache));

Heh. Indeed. And probably just rename it to something less insane.


Greetings,

Andres Freund



Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

От
Heikki Linnakangas
Дата:
Focusing on this one patch in this series:

On 11/07/2023 01:50, Andres Freund wrote:
>> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
>> Date: Mon, 12 Jun 2023 16:33:20 +0300
>> Subject: [PATCH 4/9] Use FD_CLOEXEC on ListenSockets
>>
>> We went through some effort to close them in the child process. Better to
>> not hand them down to the child process in the first place.
> 
> I think Thomas has a larger version of this patch:
> https://postgr.es/m/CA%2BhUKGKPNFcfBQduqof4-7C%3DavjcSfdkKBGvQoRuAvfocnvY0A%40mail.gmail.com

Hmm, no, that's a little different. Thomas added the FD_CLOEXEC option 
to the *accepted* socket in commit 1da569ca1f. That was part of that 
thread. This patch adds the option to the *listen* sockets. That was not 
discussed in that thread, but it's certainly in the same vein.

Thomas: What do you think of the attached?

On 11/07/2023 00:07, Tristan Partin wrote:
>> @@ -831,7 +834,8 @@ StreamConnection(pgsocket server_fd, Port *port)
>>  void
>>  StreamClose(pgsocket sock)
>>  {
>> -       closesocket(sock);
>> +       if (closesocket(sock) != 0)
>> +               elog(LOG, "closesocket failed: %m");
>>  }
>>
>>  /*
> 
> Do you think WARNING would be a more appropriate log level?

No, WARNING is for messages that you expect the client to receive. This 
failure is unexpected at the system level, the message is for the 
administrator. The distinction isn't always very clear, but LOG seems 
more appropriate in this case.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Вложения

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

От
Thomas Munro
Дата:
On Thu, Aug 24, 2023 at 11:41 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On 11/07/2023 01:50, Andres Freund wrote:
> >> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> >> Date: Mon, 12 Jun 2023 16:33:20 +0300
> >> Subject: [PATCH 4/9] Use FD_CLOEXEC on ListenSockets
> >>
> >> We went through some effort to close them in the child process. Better to
> >> not hand them down to the child process in the first place.
> >
> > I think Thomas has a larger version of this patch:
> > https://postgr.es/m/CA%2BhUKGKPNFcfBQduqof4-7C%3DavjcSfdkKBGvQoRuAvfocnvY0A%40mail.gmail.com
>
> Hmm, no, that's a little different. Thomas added the FD_CLOEXEC option
> to the *accepted* socket in commit 1da569ca1f. That was part of that
> thread. This patch adds the option to the *listen* sockets. That was not
> discussed in that thread, but it's certainly in the same vein.
>
> Thomas: What do you think of the attached?

LGTM.  I vaguely recall thinking that it might be better to keep
EXEC_BACKEND and !EXEC_BACKEND working the same which might be why I
didn't try this one, but it looks fine with the comment to explain, as
you have it.  (It's a shame we can't use O_CLOFORK.)

There was some question in the other thread about whether doing that
to the server socket might affect accepted sockets too on some OS, but
I can at least confirm that your patch works fine on FreeBSD in an
EXEC_BACKEND build.  I think there were some historical disagreements
about which socket properties were inherited, but not that.



Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

От
Heikki Linnakangas
Дата:
On 24/08/2023 15:48, Thomas Munro wrote:
> LGTM.  I vaguely recall thinking that it might be better to keep
> EXEC_BACKEND and !EXEC_BACKEND working the same which might be why I
> didn't try this one, but it looks fine with the comment to explain, as
> you have it.  (It's a shame we can't use O_CLOFORK.)

Yeah, O_CLOFORK would be nice..

Committed, thanks!

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

От
Jeff Janes
Дата:
On Thu, Aug 24, 2023 at 10:05 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 24/08/2023 15:48, Thomas Munro wrote:
> LGTM.  I vaguely recall thinking that it might be better to keep
> EXEC_BACKEND and !EXEC_BACKEND working the same which might be why I
> didn't try this one, but it looks fine with the comment to explain, as
> you have it.  (It's a shame we can't use O_CLOFORK.)

Yeah, O_CLOFORK would be nice..

Committed, thanks!


Since this commit, I'm getting a lot (63 per restart) of messages:

 LOG:  could not close client or listen socket: Bad file descriptor
 
All I have to do to get the message is turn logging_collector = on and restart.

The close failure condition existed before the commit, it just wasn't logged before.  So, did the extra logging added here just uncover a  pre-existing bug?

The LOG message is sent to the terminal, not to the log file.

Cheers,

Jeff

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

От
Heikki Linnakangas
Дата:
On 28/08/2023 18:55, Jeff Janes wrote:
> Since this commit, I'm getting a lot (63 per restart) of messages:
> 
>   LOG:  could not close client or listen socket: Bad file descriptor
> All I have to do to get the message is turn logging_collector = on and 
> restart.
> 
> The close failure condition existed before the commit, it just wasn't 
> logged before.  So, did the extra logging added here just uncover a  
> pre-existing bug?

Yes, so it seems. Syslogger is started before the ListenSockets array is 
initialized, so its still all zeros. When syslogger is forked, the child 
process tries to close all the listen sockets, which are all zeros. So 
syslogger calls close(0) MAXLISTEN (64) times. Attached patch moves the 
array initialization earlier.

The first close(0) actually does have an effect: it closes stdin, which 
is fd 0. That is surely accidental, but I wonder if we should indeed 
close stdin in child processes? The postmaster process doesn't do 
anything with stdin either, although I guess a library might try to read 
a passphrase from stdin before starting up, for example.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Вложения

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

От
Michael Paquier
Дата:
On Mon, Aug 28, 2023 at 11:52:15PM +0300, Heikki Linnakangas wrote:
> On 28/08/2023 18:55, Jeff Janes wrote:
>> Since this commit, I'm getting a lot (63 per restart) of messages:
>>
>>   LOG:  could not close client or listen socket: Bad file descriptor
>> All I have to do to get the message is turn logging_collector = on and
>> restart.
>>
>> The close failure condition existed before the commit, it just wasn't
>> logged before.  So, did the extra logging added here just uncover a
>> pre-existing bug?

In case you've not noticed:
https://www.postgresql.org/message-id/ZOvvuQe0rdj2slA9@paquier.xyz
But it does not really matter now ;)

> Yes, so it seems. Syslogger is started before the ListenSockets array is
> initialized, so its still all zeros. When syslogger is forked, the child
> process tries to close all the listen sockets, which are all zeros. So
> syslogger calls close(0) MAXLISTEN (64) times. Attached patch moves the
> array initialization earlier.

Yep, I've reached the same conclusion.  Wouldn't it be cleaner to move
the callback registration of CloseServerPorts() closer to the array
initialization, though?

> The first close(0) actually does have an effect: it closes stdin, which is
> fd 0. That is surely accidental, but I wonder if we should indeed close
> stdin in child processes? The postmaster process doesn't do anything with
> stdin either, although I guess a library might try to read a passphrase from
> stdin before starting up, for example.

We would have heard about that, wouldn't we?  I may be missing
something of course, but on HEAD, the array initialization is done
before starting any child processes, and the syslogger is the first
one forked.
--
Michael

Вложения

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

От
Heikki Linnakangas
Дата:
On 29/08/2023 01:28, Michael Paquier wrote:
> 
> In case you've not noticed:
> https://www.postgresql.org/message-id/ZOvvuQe0rdj2slA9@paquier.xyz
> But it does not really matter now ;)

Ah sorry, missed that thread.

>> Yes, so it seems. Syslogger is started before the ListenSockets array is
>> initialized, so its still all zeros. When syslogger is forked, the child
>> process tries to close all the listen sockets, which are all zeros. So
>> syslogger calls close(0) MAXLISTEN (64) times. Attached patch moves the
>> array initialization earlier.
> 
> Yep, I've reached the same conclusion.  Wouldn't it be cleaner to move
> the callback registration of CloseServerPorts() closer to the array
> initialization, though?

Ok, pushed that way.

I checked the history of this: it goes back to commit 9a86f03b4e in 
version 13. The SysLogger_Start() call used to be later, after setting p 
ListenSockets, but that commit moved it. So I backpatched this to v13.

>> The first close(0) actually does have an effect: it closes stdin, which is
>> fd 0. That is surely accidental, but I wonder if we should indeed close
>> stdin in child processes? The postmaster process doesn't do anything with
>> stdin either, although I guess a library might try to read a passphrase from
>> stdin before starting up, for example.
> 
> We would have heard about that, wouldn't we?  I may be missing
> something of course, but on HEAD, the array initialization is done
> before starting any child processes, and the syslogger is the first
> one forked.

Yes, syslogger is the only process that closes stdin. After moving the 
initialization, it doesn't close it either.

Thinking about this some more, the ListenSockets array is a bit silly 
anyway. We fill the array starting from index 0, always append to the 
end, and never remove entries from it. It would seem more 
straightforward to keep track of the used size of the array. Currently 
we always loop through the unused parts too, and e.g. 
ConfigurePostmasterWaitSet() needs to walk the array to count how many 
elements are in use.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

От
Heikki Linnakangas
Дата:
On 29/08/2023 09:21, Heikki Linnakangas wrote:
> Thinking about this some more, the ListenSockets array is a bit silly
> anyway. We fill the array starting from index 0, always append to the
> end, and never remove entries from it. It would seem more
> straightforward to keep track of the used size of the array. Currently
> we always loop through the unused parts too, and e.g.
> ConfigurePostmasterWaitSet() needs to walk the array to count how many
> elements are in use.

Like this.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Вложения

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

От
Heikki Linnakangas
Дата:
On 29/08/2023 09:58, Heikki Linnakangas wrote:
> On 29/08/2023 09:21, Heikki Linnakangas wrote:
>> Thinking about this some more, the ListenSockets array is a bit silly
>> anyway. We fill the array starting from index 0, always append to the
>> end, and never remove entries from it. It would seem more
>> straightforward to keep track of the used size of the array. Currently
>> we always loop through the unused parts too, and e.g.
>> ConfigurePostmasterWaitSet() needs to walk the array to count how many
>> elements are in use.
> 
> Like this.

This seems pretty uncontroversial, and I heard no objections, so I went 
ahead and committed that.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

От
Michael Paquier
Дата:
On Thu, Oct 05, 2023 at 03:08:37PM +0300, Heikki Linnakangas wrote:
> This seems pretty uncontroversial, and I heard no objections, so I went
> ahead and committed that.

It looks like e29c4643951 is causing issues here.  While doing
benchmarking on a cluster compiled with -O2, I got a crash:
LOG:  system logger process (PID 27924) was terminated by signal 11: Segmentation fault

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055ef3b9aed20 in pfree ()
(gdb) bt
#0  0x000055ef3b9aed20 in pfree ()
#1  0x000055ef3b7e0e41 in ClosePostmasterPorts ()
#2  0x000055ef3b7e6649 in SysLogger_Start ()
#3  0x000055ef3b7e4413 in PostmasterMain ()

Okay, the backtrace is not that useful.  I'll see if I can get
something better, still it seems like this has broken the way the
syslogger closes these ports.
--
Michael

Вложения

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

От
Michael Paquier
Дата:
On Fri, Oct 06, 2023 at 02:30:16PM +0900, Michael Paquier wrote:
> Okay, the backtrace is not that useful.  I'll see if I can get
> something better, still it seems like this has broken the way the
> syslogger closes these ports.

And here you go:
Program terminated with signal SIGSEGV, Segmentation fault.
#0  GetMemoryChunkMethodID (pointer=0x0) at mcxt.c:196 196 header =
*((const uint64 *) ((const char *) pointer - sizeof(uint64)));
(gdb) bt
#0  GetMemoryChunkMethodID (pointer=0x0) at mcxt.c:196
#1  0x0000557d04176d59 in pfree (pointer=0x0) at mcxt.c:1463
#2  0x0000557d03e8eab3 in ClosePostmasterPorts (am_syslogger=true) at postmaster.c:2571
#3  0x0000557d03e93ac2 in SysLogger_Start () at syslogger.c:686
#4  0x0000557d03e8c5b7 in PostmasterMain (argc=3, argv=0x557d0471ed00)
at postmaster.c:1148
#5  0x0000557d03d48e34 in main (argc=3, argv=0x557d0471ed00) at main.c:198
(gdb) up 2
#2  0x0000557d03e8eab3 in ClosePostmasterPorts (am_syslogger=true) at
postmaster.c:2571
2571 pfree(ListenSockets);
(gdb) p ListenSockets $1 = (pgsocket *) 0x0
--
Michael

Вложения

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

От
Heikki Linnakangas
Дата:
On 06/10/2023 09:50, Michael Paquier wrote:
> On Fri, Oct 06, 2023 at 02:30:16PM +0900, Michael Paquier wrote:
>> Okay, the backtrace is not that useful.  I'll see if I can get
>> something better, still it seems like this has broken the way the
>> syslogger closes these ports.
> 
> And here you go:
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  GetMemoryChunkMethodID (pointer=0x0) at mcxt.c:196 196 header =
> *((const uint64 *) ((const char *) pointer - sizeof(uint64)));
> (gdb) bt
> #0  GetMemoryChunkMethodID (pointer=0x0) at mcxt.c:196
> #1  0x0000557d04176d59 in pfree (pointer=0x0) at mcxt.c:1463
> #2  0x0000557d03e8eab3 in ClosePostmasterPorts (am_syslogger=true) at postmaster.c:2571
> #3  0x0000557d03e93ac2 in SysLogger_Start () at syslogger.c:686
> #4  0x0000557d03e8c5b7 in PostmasterMain (argc=3, argv=0x557d0471ed00)
> at postmaster.c:1148
> #5  0x0000557d03d48e34 in main (argc=3, argv=0x557d0471ed00) at main.c:198
> (gdb) up 2
> #2  0x0000557d03e8eab3 in ClosePostmasterPorts (am_syslogger=true) at
> postmaster.c:2571
> 2571 pfree(ListenSockets);
> (gdb) p ListenSockets $1 = (pgsocket *) 0x0

Fixed, thanks!

I did a quick test with syslogger enabled before committing, but didn't 
notice the segfault. I missed it because syslogger gets restarted and 
then it worked.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

От
Michael Paquier
Дата:
On Fri, Oct 06, 2023 at 10:27:22AM +0300, Heikki Linnakangas wrote:
> I did a quick test with syslogger enabled before committing, but didn't
> notice the segfault. I missed it because syslogger gets restarted and then
> it worked.

Thanks, Heikki.
--
Michael

Вложения

Re: Refactoring backend fork+exec code

От
Heikki Linnakangas
Дата:
I updated this patch set, addressing some of the straightforward 
comments from Tristan and Andres, and did some more cleanups, commenting 
etc. Works on Windows now.

Replies to some of the individual comments below:

On 11/07/2023 00:07, Tristan Partin wrote:
>> @@ -4498,15 +4510,19 @@ postmaster_forkexec(int argc, char *argv[])
>>   * returns the pid of the fork/exec'd process, or -1 on failure
>>   */
>>  static pid_t
>> -backend_forkexec(Port *port)
>> +backend_forkexec(Port *port, CAC_state cac)
>>  {
>> -       char       *av[4];
>> +       char       *av[5];
>>         int                     ac = 0;
>> +       char            cacbuf[10];
>>
>>         av[ac++] = "postgres";
>>         av[ac++] = "--forkbackend";
>>         av[ac++] = NULL;                        /* filled in by internal_forkexec */
>>
>> +       snprintf(cacbuf, sizeof(cacbuf), "%d", (int) cac);
>> +       av[ac++] = cacbuf;
> 
> Might be worth a sanity check that there wasn't any truncation into
> cacbuf, which is an impossibility as the code is written, but still
> useful for catching a future developer error.
> 
> Is it worth adding a command line option at all instead of having the
> naked positional argument? It would help anybody who might read the
> command line what the seemingly random integer stands for.

+1. This gets refactored away in the last patch though. In the last 
patch, I used a child process name instead of an integer precisely 
because it looks nicer in "ps".

I wonder if we should add more command line arguments, just for 
informational purposes. Autovacuum worker process could display the 
database name it's connected to, for example. I don't know how important 
the command line is on Windows, is it displayed by tools that people 
care about?

On 11/07/2023 01:50, Andres Freund wrote:
> On 2023-06-18 14:22:33 +0300, Heikki Linnakangas wrote:
>>  From 0cb6f8d665980d30a5d2a29013000744f16bf813 Mon Sep 17 00:00:00 2001
>> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
>> Date: Sun, 18 Jun 2023 11:00:21 +0300
>> Subject: [PATCH 3/9] Refactor CreateSharedMemoryAndSemaphores.
>>
>> Moves InitProcess calls a little later in EXEC_BACKEND case.
> 
> What's the reason for this part? 

The point is that with this commit, InitProcess() is called at same 
place in EXEC_BACKEND mode and !EXEC_BACKEND. It feels more consistent 
that way.

> ISTM that we'd really want to get away from plastering duplicated
> InitProcess() etc everywhere.

Sure, we could do more to reduce the duplication. I think this is a step 
in the right direction, though.

>>  From 65384b9a6cfb3b9b589041526216e0f64d64bea5 Mon Sep 17 00:00:00 2001
>> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
>> Date: Sun, 18 Jun 2023 13:56:44 +0300
>> Subject: [PATCH 8/9] Introduce ClientSocket, rename some funcs
>>
>> - Move more of the work on a client socket to the child process.
>>
>> - Reduce the amount of data that needs to be passed from postmaster to
>>    child. (Used to pass a full Port struct, although most of the fields were
>>    empty. Now we pass the much slimmer ClientSocket.)
> 
> I think there might be extensions accessing Port. Not sure if it's worth
> worrying about, but ...

That's OK. Port still exists, it's just created a little later. It will 
be initialized by the time extensions might look at it.

>> +const        PMChildEntry entry_kinds[] = {
>> +    {"backend", BackendMain, true},
>> +
>> +    {"autovacuum launcher", AutoVacLauncherMain, true},
>> +    {"autovacuum worker", AutoVacWorkerMain, true},
>> +    {"bgworker", BackgroundWorkerMain, true},
>> +    {"syslogger", SysLoggerMain, false},
>> +
>> +    {"startup", StartupProcessMain, true},
>> +    {"bgwriter", BackgroundWriterMain, true},
>> +    {"archiver", PgArchiverMain, true},
>> +    {"checkpointer", CheckpointerMain, true},
>> +    {"wal_writer", WalWriterMain, true},
>> +    {"wal_receiver", WalReceiverMain, true},
>> +};
> 
> I'd assign them with the PostmasterChildType as index, so there's no danger of
> getting out of order.
> 
> const                PMChildEntry entry_kinds = {
>    [PMC_AV_LAUNCHER] = {"autovacuum launcher", AutoVacLauncherMain, true},
>    ...
> }
> 
> or such should work.

Nice, I didn't know about that syntax! Changed it that way.

> I'd also use designated initializers for the fields, it's otherwise hard to
> know what true means etc.

I think with one boolean and the struct declaration nearby, it's fine. 
If this becomes more complex in the future, with more fields, I agree.

> I think it might be good to put more into array. If we e.g. knew whether a
> particular child type is a backend-like, and aux process or syslogger, we
> could avoid the duplicated InitAuxiliaryProcess(),
> MemoryContextDelete(PostmasterContext) etc calls everywhere.

I agree we could do more refactoring here. I don't agree with adding 
more to this struct though. I'm trying to limit the code in 
launch_backend.c to hiding the differences between EXEC_BACKEND and 
!EXEC_BACKEND. In EXEC_BACKEND mode, it restores the child process to 
the same state as it is after fork() in !EXEC_BACKEND mode. Any other 
initialization steps belong elsewhere.

Some of the steps between InitPostmasterChild() and the *Main() 
functions could probably be moved around and refactored. I didn't think 
hard about that. I think that can be done separately as follow-up patch.

>> +/* Save critical backend variables into the BackendParameters struct */
>> +#ifndef WIN32
>> +static bool
>> +save_backend_variables(BackendParameters *param, ClientSocket *client_sock)
>> +#else
> 
> There's so much of this kind of thing. Could we hide it in a struct or such
> instead of needing ifdefs everywhere?

A lot of #ifdefs you mean? I agree launch_backend.c has a lot of those. 
I haven't come up with any good ideas on reducing them, unfortunately.

-- 
Heikki Linnakangas
Neon (https://neon.tech)
Вложения

Re: Refactoring backend fork+exec code

От
Heikki Linnakangas
Дата:
On 11/10/2023 14:12, Heikki Linnakangas wrote:
> On 11/07/2023 01:50, Andres Freund wrote:
>>> Subject: [PATCH 3/9] Refactor CreateSharedMemoryAndSemaphores.
>>>
>>> Moves InitProcess calls a little later in EXEC_BACKEND case.
>>
>> What's the reason for this part?
> 
> The point is that with this commit, InitProcess() is called at same
> place in EXEC_BACKEND mode and !EXEC_BACKEND. It feels more consistent
> that way.
> 
>> ISTM that we'd really want to get away from plastering duplicated
>> InitProcess() etc everywhere.
> 
> Sure, we could do more to reduce the duplication. I think this is a step
> in the right direction, though.

Here's another rebased patch set. Compared to previous version, I did a 
little more refactoring around CreateSharedMemoryAndSemaphores and 
InitProcess:

- patch 1 splits CreateSharedMemoryAndSemaphores into two functions: 
CreateSharedMemoryAndSemaphores is now only called at postmaster 
startup, and a new function called AttachSharedMemoryStructs() is called 
in backends in EXEC_BACKEND mode. I extracted the common parts of those 
functions to a new static function. (Some of this refactoring used to be 
part of the 3rd patch in the series, but it seems useful on its own, so 
I split it out.)

- patch 3 moves the call to AttachSharedMemoryStructs() to 
InitProcess(), reducing the boilerplate code a little.


The patches are incrementally useful, so if you don't have time to 
review all of them, a review on a subset would be useful too.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Вложения

Re: Refactoring backend fork+exec code

От
"Tristan Partin"
Дата:
On Wed Nov 29, 2023 at 5:36 PM CST, Heikki Linnakangas wrote:
> On 11/10/2023 14:12, Heikki Linnakangas wrote:
> Here's another rebased patch set. Compared to previous version, I did a
> little more refactoring around CreateSharedMemoryAndSemaphores and
> InitProcess:
>
> - patch 1 splits CreateSharedMemoryAndSemaphores into two functions:
> CreateSharedMemoryAndSemaphores is now only called at postmaster
> startup, and a new function called AttachSharedMemoryStructs() is called
> in backends in EXEC_BACKEND mode. I extracted the common parts of those
> functions to a new static function. (Some of this refactoring used to be
> part of the 3rd patch in the series, but it seems useful on its own, so
> I split it out.)
>
> - patch 3 moves the call to AttachSharedMemoryStructs() to
> InitProcess(), reducing the boilerplate code a little.
>
>
> The patches are incrementally useful, so if you don't have time to
> review all of them, a review on a subset would be useful too.

>  From 8886db1ed6bae21bf6d77c9bb1230edbb55e24f9 Mon Sep 17 00:00:00 2001
>  From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
>  Date: Thu, 30 Nov 2023 00:04:22 +0200
>  Subject: [PATCH v3 4/7] Pass CAC as argument to backend process

For me, being new to the code, it would be nice to have more of an
explanation as to why this is "better." I don't doubt it; it would just
help me and future readers of this commit in the future. More of an
explanation in the commit message would suffice.

My other comment on this commit is that we now seem to have lost the
context on what CAC stands for. Before we had the member variable to
explain it. A comment on the enum would be great or changing cac named
variables to canAcceptConnections. I did notice in patch 7 that there
are still some variables named canAcceptConnections around, so I'll
leave this comment up to you.

>   From 98f8397b32a0b36e221475b32697c9c5bbca86a0 Mon Sep 17 00:00:00 2001
>   From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
>   Date: Wed, 11 Oct 2023 13:38:06 +0300
>   Subject: [PATCH v3 5/7] Remove ConnCreate and ConnFree, and allocate Port in
>    stack

I like it separate.

>  From 79aab42705a8cb0e16e61c33052fc56fdd4fca76 Mon Sep 17 00:00:00 2001
>  From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
>  Date: Wed, 11 Oct 2023 13:38:10 +0300
>  Subject: [PATCH v3 6/7] Introduce ClientSocket, rename some funcs

>  +static int        BackendStartup(ClientSocket *port);

s/port/client_sock

>  -                port->remote_hostname = strdup(remote_host);
>  +                port->remote_hostname = pstrdup(remote_host);
>  +        MemoryContextSwitchTo(oldcontext);

Something funky with the whitespace here, but my eyes might also be
playing tricks on me. Mixing spaces in tabs like what do in this
codebase makes it difficult to review :).

>  From ce51876f87f1e4317e25baf64184749448fcd033 Mon Sep 17 00:00:00 2001
>  From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
>  Date: Thu, 30 Nov 2023 00:07:34 +0200
>  Subject: [PATCH v3 7/7] Refactor postmaster child process launching

>  +                entry_kinds[child_type].main_fn(startup_data, startup_data_len);
>  +                Assert(false);

Seems like you want the pg_unreachable() macro here instead of
Assert(false). Similar comment at the end of SubPostmasterMain().

>  +        if (fwrite(param, paramsz, 1, fp) != 1)
>  +        {
>  +                ereport(LOG,
>  +                                (errcode_for_file_access(),
>  +                                 errmsg("could not write to file \"%s\": %m", tmpfilename)));
>  +                FreeFile(fp);
>  +                return -1;
>  +        }
>  +
>  +        /* Release file */
>  +        if (FreeFile(fp))
>  +        {
>  +                ereport(LOG,
>  +                                (errcode_for_file_access(),
>  +                                 errmsg("could not write to file \"%s\": %m", tmpfilename)));
>  +                return -1;
>  +        }

Two pieces of feedback here. I generally find write(2) more useful than
fwrite(3) because write(2) will report a useful errno, whereas fwrite(2)
just uses ferror(3). The additional errno information might be valuable
context in the log message. Up to you if you think it is also valuable.

The log message if FreeFile() fails doesn't seem to make sense to me.
I didn't see any file writing in that code path, but it is possible that
I missed something.

>  +        /*
>  +         * Set reference point for stack-depth checking.  This might seem
>  +         * redundant in !EXEC_BACKEND builds; but it's not because the postmaster
>  +         * launches its children from signal handlers, so we might be running on
>  +         * an alternative stack. XXX still true?
>  +         */
>  +        (void) set_stack_base();

Looks like there is still this XXX left. Can't say I completely
understand the second sentence either.

>  +        /*
>  +         * make sure stderr is in binary mode before anything can possibly be
>  +         * written to it, in case it's actually the syslogger pipe, so the pipe
>  +         * chunking protocol isn't disturbed. Non-logpipe data gets translated on
>  +         * redirection (e.g. via pg_ctl -l) anyway.
>  +         */

Nit: The 'm' in the first "make" should be capitalized.

>  +        if (fread(¶m, sizeof(param), 1, fp) != 1)
>  +        {
>  +                write_stderr("could not read from backend variables file \"%s\": %s\n",
>  +                                         id, strerror(errno));
>  +                exit(1);
>  +        }
>  +
>  +        /* read startup data */
>  +        *startup_data_len = param.startup_data_len;
>  +        if (param.startup_data_len > 0)
>  +        {
>  +                *startup_data = palloc(*startup_data_len);
>  +                if (fread(*startup_data, *startup_data_len, 1, fp) != 1)
>  +                {
>  +                        write_stderr("could not read startup data from backend variables file \"%s\": %s\n",
>  +                                                 id, strerror(errno));
>  +                        exit(1);
>  +                }
>  +        }

fread(3) doesn't set errno. I would probably switch these to read(2) for
the reason I wrote in a previous comment.

>  +        /*
>  +         * Need to reinitialize the SSL library in the backend, since the context
>  +         * structures contain function pointers and cannot be passed through the
>  +         * parameter file.
>  +         *
>  +         * If for some reason reload fails (maybe the user installed broken key
>  +         * files), soldier on without SSL; that's better than all connections
>  +         * becoming impossible.
>  +         *
>  +         * XXX should we do this in all child processes?  For the moment it's
>  +         * enough to do it in backend children.
>  +         */
>  +#ifdef USE_SSL
>  +        if (EnableSSL)
>  +        {
>  +                if (secure_initialize(false) == 0)
>  +                        LoadedSSL = true;
>  +                else
>  +                        ereport(LOG,
>  +                                        (errmsg("SSL configuration could not be loaded in child process")));
>  +        }
>  +#endif

Do other child process types do any non-local communication?

>  -typedef struct ClientSocket {
>  +struct ClientSocket
>  +{
>           pgsocket        sock;                        /* File descriptor */
>           SockAddr        laddr;                        /* local addr (postmaster) */
>           SockAddr        raddr;                        /* remote addr (client) */
>  -} ClientSocket;
>  +};
>  +typedef struct ClientSocket ClientSocket;

Can't say I completely understand the reason for this change given it
was added in your series.

I didn't look too hard at the Windows-specific code, so maybe someone
who knows Windows will have something to say, but it also might've just
been copy-paste that I didn't realize.

There were a few more XXXs that probably should be figured out before
committing. Though perhaps some of them were already there.

Patches 1-3 seem committable as-is. I only had minor comments on
everything but 7, so after taking a look at those, they could be
committed.

Overall, this seems liked a marked improvement :).

--
Tristan Partin
Neon (https://neon.tech)



Re: Refactoring backend fork+exec code

От
Andres Freund
Дата:
Hi,

On 2023-11-30 01:36:25 +0200, Heikki Linnakangas wrote:
> - patch 1 splits CreateSharedMemoryAndSemaphores into two functions:
> CreateSharedMemoryAndSemaphores is now only called at postmaster startup,
> and a new function called AttachSharedMemoryStructs() is called in backends
> in EXEC_BACKEND mode. I extracted the common parts of those functions to a
> new static function. (Some of this refactoring used to be part of the 3rd
> patch in the series, but it seems useful on its own, so I split it out.)

I like that idea.



> From a96b6e92fdeaa947bf32774c425419b8f987b8e2 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Thu, 30 Nov 2023 00:01:25 +0200
> Subject: [PATCH v3 1/7] Refactor CreateSharedMemoryAndSemaphores
>
> For clarity, have separate functions for *creating* the shared memory
> and semaphores at postmaster or single-user backend startup, and
> for *attaching* to existing shared memory structures in EXEC_BACKEND
> case. CreateSharedMemoryAndSemaphores() is now called only at
> postmaster startup, and a new AttachSharedMemoryStructs() function is
> called at backend startup in EXEC_BACKEND mode.

I assume CreateSharedMemoryAndSemaphores() is still called during crash
restart?  I wonder if it shouldn't split three ways:
1) create
2) initialize
3) attach


> From 3478cafcf74a5c8d649e0287e6c72669a29c0e70 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Thu, 30 Nov 2023 00:02:03 +0200
> Subject: [PATCH v3 2/7] Pass BackgroundWorker entry in the parameter file in
>  EXEC_BACKEND mode
>
> This makes it possible to move InitProcess later in SubPostmasterMain
> (in next commit), as we no longer need to access shared memory to read
> background worker entry.

>  static void read_backend_variables(char *id, Port *port);
> @@ -4831,7 +4833,7 @@ SubPostmasterMain(int argc, char *argv[])
>          strcmp(argv[1], "--forkavlauncher") == 0 ||
>          strcmp(argv[1], "--forkavworker") == 0 ||
>          strcmp(argv[1], "--forkaux") == 0 ||
> -        strncmp(argv[1], "--forkbgworker=", 15) == 0)
> +        strncmp(argv[1], "--forkbgworker", 14) == 0)
>          PGSharedMemoryReAttach();
>      else
>          PGSharedMemoryNoReAttach();
> @@ -4962,10 +4964,8 @@ SubPostmasterMain(int argc, char *argv[])
>
>          AutoVacWorkerMain(argc - 2, argv + 2);    /* does not return */
>      }
> -    if (strncmp(argv[1], "--forkbgworker=", 15) == 0)
> +    if (strncmp(argv[1], "--forkbgworker", 14) == 0)


Now that we don't need to look at parameters anymore, these should probably be
just a strcmp(), like the other cases?


> From 0d071474e12a70ff8113c7b0731c5b97fec45007 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Wed, 29 Nov 2023 23:47:25 +0200
> Subject: [PATCH v3 3/7] Refactor how InitProcess is called
>
> The order of process initialization steps is now more consistent
> between !EXEC_BACKEND and EXEC_BACKEND modes. InitProcess() is called
> at the same place in either mode. We can now also move the
> AttachSharedMemoryStructs() call into InitProcess() itself. This
> reduces the number of "#ifdef EXEC_BACKEND" blocks.

Yay.


> diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
> index cdfdd6fbe1d..6c708777dde 100644
> --- a/src/backend/storage/lmgr/proc.c
> +++ b/src/backend/storage/lmgr/proc.c
> @@ -461,6 +461,12 @@ InitProcess(void)
>       */
>      InitLWLockAccess();
>      InitDeadLockChecking();
> +
> +#ifdef EXEC_BACKEND
> +    /* Attach process to shared data structures */
> +    if (IsUnderPostmaster)
> +        AttachSharedMemoryStructs();
> +#endif
>  }
>
>  /*
> @@ -614,6 +620,12 @@ InitAuxiliaryProcess(void)
>       * Arrange to clean up at process exit.
>       */
>      on_shmem_exit(AuxiliaryProcKill, Int32GetDatum(proctype));
> +
> +#ifdef EXEC_BACKEND
> +    /* Attach process to shared data structures */
> +    if (IsUnderPostmaster)
> +        AttachSharedMemoryStructs();
> +#endif
>  }

Aside: Somewhat odd that InitAuxiliaryProcess() doesn't call
InitLWLockAccess().


I think a short comment explaining why we can attach to shmem structs after
already accessing shared memory earlier in the function would be worthwhile.


> From ce51876f87f1e4317e25baf64184749448fcd033 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Thu, 30 Nov 2023 00:07:34 +0200
> Subject: [PATCH v3 7/7] Refactor postmaster child process launching
>
> - Move code related to launching backend processes to new source file,
>   launch_backend.c
>
> - Introduce new postmaster_child_launch() function that deals with the
>   differences between EXEC_BACKEND and fork mode.
>
> - Refactor the mechanism of passing information from the parent to
>   child process. Instead of using different command-line arguments when
>   launching the child process in EXEC_BACKEND mode, pass a
>   variable-length blob of data along with all the global variables. The
>   contents of that blob depend on the kind of child process being
>   launched. In !EXEC_BACKEND mode, we use the same blob, but it's simply
>   inherited from the parent to child process.
>
>  [...]
>  33 files changed, 1787 insertions(+), 2002 deletions(-)

Well, that's not small...

I think it may be worth splitting some of the file renaming out into a
separate commit, makes it harder to see what changed here.


> +AutoVacLauncherMain(char *startup_data, size_t startup_data_len)
>  {
> -    pid_t        AutoVacPID;
> +    sigjmp_buf    local_sigjmp_buf;
>
> -#ifdef EXEC_BACKEND
> -    switch ((AutoVacPID = avlauncher_forkexec()))
> -#else
> -    switch ((AutoVacPID = fork_process()))
> -#endif
> +    /* Release postmaster's working memory context */
> +    if (PostmasterContext)
>      {
> -        case -1:
> -            ereport(LOG,
> -                    (errmsg("could not fork autovacuum launcher process: %m")));
> -            return 0;
> -
> -#ifndef EXEC_BACKEND
> -        case 0:
> -            /* in postmaster child ... */
> -            InitPostmasterChild();
> -
> -            /* Close the postmaster's sockets */
> -            ClosePostmasterPorts(false);
> -
> -            AutoVacLauncherMain(0, NULL);
> -            break;
> -#endif
> -        default:
> -            return (int) AutoVacPID;
> +        MemoryContextDelete(PostmasterContext);
> +        PostmasterContext = NULL;
>      }
>
> -    /* shouldn't get here */
> -    return 0;
> -}

This if (PostmasterContext) ... else "shouldn't get here" business seems
pretty silly, more likely to hide problems than to help.


> +/*
> + * Information needed to launch different kinds of child processes.
> + */
> +static const struct
> +{
> +    const char *name;
> +    void        (*main_fn) (char *startup_data, size_t startup_data_len);
> +    bool        shmem_attach;
> +}            entry_kinds[] = {
> +    [PMC_BACKEND] = {"backend", BackendMain, true},

Personally I'd give the struct an actual name - makes the debugging experience
a bit nicer than anonymous structs that you can't even reference by a typedef.


> +    [PMC_AV_LAUNCHER] = {"autovacuum launcher", AutoVacLauncherMain, true},
> +    [PMC_AV_WORKER] = {"autovacuum worker", AutoVacWorkerMain, true},
> +    [PMC_BGWORKER] = {"bgworker", BackgroundWorkerMain, true},
> +    [PMC_SYSLOGGER] = {"syslogger", SysLoggerMain, false},
> +
> +    [PMC_STARTUP] = {"startup", StartupProcessMain, true},
> +    [PMC_BGWRITER] = {"bgwriter", BackgroundWriterMain, true},
> +    [PMC_ARCHIVER] = {"archiver", PgArchiverMain, true},
> +    [PMC_CHECKPOINTER] = {"checkpointer", CheckpointerMain, true},
> +    [PMC_WAL_WRITER] = {"wal_writer", WalWriterMain, true},
> +    [PMC_WAL_RECEIVER] = {"wal_receiver", WalReceiverMain, true},
> +};


It feels like we have too many different ways of documenting the type of a
process. This new PMC_ stuff, enum AuxProcType, enum BackendType. Which then
leads to code like this:


> -CheckpointerMain(void)
> +CheckpointerMain(char *startup_data, size_t startup_data_len)
>  {
>      sigjmp_buf    local_sigjmp_buf;
>      MemoryContext checkpointer_context;
>
> +    Assert(startup_data_len == 0);
> +
> +    MyAuxProcType = CheckpointerProcess;
> +    MyBackendType = B_CHECKPOINTER;
> +    AuxiliaryProcessInit();
> +

For each type of child process. That seems a bit too redundant.  Can't we
unify this at least somewhat? Can't we just reuse BackendType here? Sure,
there'd be pointless entry for B_INVALID, but that doesn't seem like a
problem, could even be useful, by pointing it to a function raising an error.

At the very least this shouldn't deviate from the naming pattern of
BackendType.


> +/*
> + * SubPostmasterMain -- Get the fork/exec'd process into a state equivalent
> + *            to what it would be if we'd simply forked on Unix, and then
> + *            dispatch to the appropriate place.
> + *
> + * The first two command line arguments are expected to be "--forkchild=<name>",
> + * where <name> indicates which postmaster child we are to become, and
> + * the name of a variables file that we can read to load data that would
> + * have been inherited by fork() on Unix.
> + */
> +void
> +SubPostmasterMain(int argc, char *argv[])
> +{
> +    PostmasterChildType child_type;
> +    char       *startup_data;
> +    size_t        startup_data_len;
> +    char       *entry_name;
> +    bool        found = false;
> +
> +    /* In EXEC_BACKEND case we will not have inherited these settings */
> +    IsPostmasterEnvironment = true;
> +    whereToSendOutput = DestNone;
> +
> +    /* Setup essential subsystems (to ensure elog() behaves sanely) */
> +    InitializeGUCOptions();
> +
> +    /* Check we got appropriate args */
> +    if (argc != 3)
> +        elog(FATAL, "invalid subpostmaster invocation");
> +
> +    if (strncmp(argv[1], "--forkchild=", 12) != 0)
> +        elog(FATAL, "invalid subpostmaster invocation (--forkchild argument missing)");
> +    entry_name = argv[1] + 12;
> +    found = false;
> +    for (int idx = 0; idx < lengthof(entry_kinds); idx++)
> +    {
> +        if (strcmp(entry_kinds[idx].name, entry_name) == 0)
> +        {
> +            child_type = idx;
> +            found = true;
> +            break;
> +        }
> +    }
> +    if (!found)
> +        elog(ERROR, "unknown child kind %s", entry_name);

If we then have to search linearly, why don't we just pass the index into the
array?

>
> -#define StartupDataBase()        StartChildProcess(StartupProcess)
> -#define StartArchiver()            StartChildProcess(ArchiverProcess)
> -#define StartBackgroundWriter() StartChildProcess(BgWriterProcess)
> -#define StartCheckpointer()        StartChildProcess(CheckpointerProcess)
> -#define StartWalWriter()        StartChildProcess(WalWriterProcess)
> -#define StartWalReceiver()        StartChildProcess(WalReceiverProcess)
> +#define StartupDataBase()        StartChildProcess(PMC_STARTUP)
> +#define StartArchiver()            StartChildProcess(PMC_ARCHIVER)
> +#define StartBackgroundWriter() StartChildProcess(PMC_BGWRITER)
> +#define StartCheckpointer()        StartChildProcess(PMC_CHECKPOINTER)
> +#define StartWalWriter()        StartChildProcess(PMC_WAL_WRITER)
> +#define StartWalReceiver()        StartChildProcess(PMC_WAL_RECEIVER)
> +
> +#define StartAutoVacLauncher()    StartChildProcess(PMC_AV_LAUNCHER);
> +#define StartAutoVacWorker()    StartChildProcess(PMC_AV_WORKER);

Obviously not your fault, but these macros are so pointless... Making it
harder to find where we start child processes, all to save a a few characters
in one place, while adding considerably more in others.


> +void
> +BackendMain(char *startup_data, size_t startup_data_len)
> +{

Is there any remaining reason for this to live in postmaster.c? Given that
other backend types don't, that seems oddly assymmetrical.

Greetings,

Andres Freund



Re: Refactoring backend fork+exec code

От
Andres Freund
Дата:
Hi,

On 2023-11-30 12:44:33 -0600, Tristan Partin wrote:
> >  +        /*
> >  +         * Set reference point for stack-depth checking.  This might seem
> >  +         * redundant in !EXEC_BACKEND builds; but it's not because the postmaster
> >  +         * launches its children from signal handlers, so we might be running on
> >  +         * an alternative stack. XXX still true?
> >  +         */
> >  +        (void) set_stack_base();
> 
> Looks like there is still this XXX left. Can't say I completely understand
> the second sentence either.

We used to start some child processes of postmaster in signal handlers. That
was fixed in

commit 7389aad6366
Author: Thomas Munro <tmunro@postgresql.org>
Date:   2023-01-12 12:34:23 +1300
 
    Use WaitEventSet API for postmaster's event loop.


In some cases signal handlers run on a separate stack, which meant that the
set_stack_base() we did in postmaster would yield a completely bogus stack
depth estimation.  So this comment should likely have been removed. Thomas?

Greetings,

Andres Freund



Re: Refactoring backend fork+exec code

От
Thomas Munro
Дата:
On Fri, Dec 1, 2023 at 9:31 AM Andres Freund <andres@anarazel.de> wrote:
> On 2023-11-30 12:44:33 -0600, Tristan Partin wrote:
> > >  +        /*
> > >  +         * Set reference point for stack-depth checking.  This might seem
> > >  +         * redundant in !EXEC_BACKEND builds; but it's not because the postmaster
> > >  +         * launches its children from signal handlers, so we might be running on
> > >  +         * an alternative stack. XXX still true?
> > >  +         */
> > >  +        (void) set_stack_base();
> >
> > Looks like there is still this XXX left. Can't say I completely understand
> > the second sentence either.
>
> We used to start some child processes of postmaster in signal handlers. That
> was fixed in
>
> commit 7389aad6366
> Author: Thomas Munro <tmunro@postgresql.org>
> Date:   2023-01-12 12:34:23 +1300
>
>     Use WaitEventSet API for postmaster's event loop.
>
>
> In some cases signal handlers run on a separate stack, which meant that the
> set_stack_base() we did in postmaster would yield a completely bogus stack
> depth estimation.  So this comment should likely have been removed. Thomas?

Right, I should delete that comment in master and 16.  While wondering
what to write instead, my first thought is that it is better to leave
the actual call there though, because otherwise there is a small
difference in stack reference point between EXEC_BACKEND and
!EXEC_BACKEND builds, consumed by a few postmaster stack frames.  So
the new comment would just say that.

(I did idly wonder if there is a longjmp trick to zap those frames
post-fork, not looked into and probably doesn't really improve
anything we care about...)



Re: Refactoring backend fork+exec code

От
Heikki Linnakangas
Дата:
On 30/11/2023 22:26, Andres Freund wrote:
> Aside: Somewhat odd that InitAuxiliaryProcess() doesn't call
> InitLWLockAccess().

Yeah that caught my eye too.

It seems to have been an oversight in commit 1c6821be31f. Before that, 
in 9.4, the lwlock stats were printed for aux processes too, on shutdown.

Committed a fix for that to master.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Refactoring backend fork+exec code

От
Heikki Linnakangas
Дата:
On 30/11/2023 22:26, Andres Freund wrote:
> On 2023-11-30 01:36:25 +0200, Heikki Linnakangas wrote:
>>  From a96b6e92fdeaa947bf32774c425419b8f987b8e2 Mon Sep 17 00:00:00 2001
>> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
>> Date: Thu, 30 Nov 2023 00:01:25 +0200
>> Subject: [PATCH v3 1/7] Refactor CreateSharedMemoryAndSemaphores
>>
>> For clarity, have separate functions for *creating* the shared memory
>> and semaphores at postmaster or single-user backend startup, and
>> for *attaching* to existing shared memory structures in EXEC_BACKEND
>> case. CreateSharedMemoryAndSemaphores() is now called only at
>> postmaster startup, and a new AttachSharedMemoryStructs() function is
>> called at backend startup in EXEC_BACKEND mode.
> 
> I assume CreateSharedMemoryAndSemaphores() is still called during crash
> restart?

Yes.

>  I wonder if it shouldn't split three ways:
> 1) create
> 2) initialize
> 3) attach

Why? What would be the difference between create and initialize phases?

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Refactoring backend fork+exec code

От
Andres Freund
Дата:
Hi,

On 2023-12-01 01:36:13 +0200, Heikki Linnakangas wrote:
> On 30/11/2023 22:26, Andres Freund wrote:
> > On 2023-11-30 01:36:25 +0200, Heikki Linnakangas wrote:
> > >  From a96b6e92fdeaa947bf32774c425419b8f987b8e2 Mon Sep 17 00:00:00 2001
> > > From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> > > Date: Thu, 30 Nov 2023 00:01:25 +0200
> > > Subject: [PATCH v3 1/7] Refactor CreateSharedMemoryAndSemaphores
> > >
> > > For clarity, have separate functions for *creating* the shared memory
> > > and semaphores at postmaster or single-user backend startup, and
> > > for *attaching* to existing shared memory structures in EXEC_BACKEND
> > > case. CreateSharedMemoryAndSemaphores() is now called only at
> > > postmaster startup, and a new AttachSharedMemoryStructs() function is
> > > called at backend startup in EXEC_BACKEND mode.
> >
> > I assume CreateSharedMemoryAndSemaphores() is still called during crash
> > restart?
>
> Yes.
>
> >  I wonder if it shouldn't split three ways:
> > 1) create
> > 2) initialize
> > 3) attach
>
> Why? What would be the difference between create and initialize phases?

Mainly because I somehow mis-remembered how we deal with the shared memory
allocation when crashing. I somehow had remembered that we reused the same
allocation across restarts, but reinitialized it from scratch. There's a
kernel of truth to that, because we can end up re-attaching to an existing
sysv shared memory segment. But not more. Perhaps I was confusing things with
the listen sockets?

Andres



Re: Refactoring backend fork+exec code

От
Heikki Linnakangas
Дата:
On 30/11/2023 20:44, Tristan Partin wrote:
> Patches 1-3 seem committable as-is.

Thanks for the review! I'm focusing on patches 1-3 now, and will come 
back to the rest after committing 1-3.

There was one test failure with EXEC_BACKEND from patch 2, in 
'test_shm_mq'. In restore_backend_variables() I checked if 'bgw_name' is 
empty to decide if the BackgroundWorker struct is filled in or not, but 
it turns out that 'test_shm_mq' doesn't fill in bgw_name. It probably 
should, I think that's an oversight in 'test_shm_mq', but that's a 
separate issue.

I did some more refactoring of patch 2, to fix that and to improve it in 
general. The BackgroundWorker struct is now passed through the 
fork-related functions similarly to the Port struct. That seems more 
consistent.

Attached is new version of these patches. For easier review, I made the 
new refactorings compared in a new commit 0003. I will squash that 
before pushing, but this makes it easier to see what changed.

Barring any new feedback or issues, I will commit these.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Вложения

Re: Refactoring backend fork+exec code

От
Alexander Lakhin
Дата:
Hello Heikki,

01.12.2023 15:10, Heikki Linnakangas wrote:
> Attached is new version of these patches. For easier review, I made the new refactorings compared in a new commit 
> 0003. I will squash that before pushing, but this makes it easier to see what changed.
>
> Barring any new feedback or issues, I will commit these.
>

Maybe you could look at issues with running `make check` under Valgrind
when server built with CPPFLAGS="-DUSE_VALGRIND -DEXEC_BACKEND":
# +++ regress check in src/test/regress +++
# initializing database system by copying initdb template
# postmaster failed, examine ".../src/test/regress/log/postmaster.log" for the reason
Bail out!make[1]: ***

...
2023-12-01 16:48:39.136 MSK postmaster[1307988] LOG:  listening on Unix socket "/tmp/pg_regress-pPFNk0/.s.PGSQL.55312"
==00:00:00:01.692 1259396== Syscall param write(buf) points to uninitialised byte(s)
==00:00:00:01.692 1259396==    at 0x5245A37: write (write.c:26)
==00:00:00:01.692 1259396==    by 0x51BBF6C: _IO_file_write@@GLIBC_2.2.5 (fileops.c:1180)
==00:00:00:01.692 1259396==    by 0x51BC84F: new_do_write (fileops.c:448)
==00:00:00:01.692 1259396==    by 0x51BC84F: _IO_new_file_xsputn (fileops.c:1254)
==00:00:00:01.692 1259396==    by 0x51BC84F: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1196)
==00:00:00:01.692 1259396==    by 0x51B1056: fwrite (iofwrite.c:39)
==00:00:00:01.692 1259396==    by 0x552E21: internal_forkexec (postmaster.c:4518)
==00:00:00:01.692 1259396==    by 0x5546A1: postmaster_forkexec (postmaster.c:4444)
==00:00:00:01.692 1259396==    by 0x55471C: StartChildProcess (postmaster.c:5275)
==00:00:00:01.692 1259396==    by 0x557B61: PostmasterMain (postmaster.c:1454)
==00:00:00:01.692 1259396==    by 0x472136: main (main.c:198)
==00:00:00:01.692 1259396==  Address 0x1ffeffdc11 is on thread 1's stack
==00:00:00:01.692 1259396==  in frame #4, created by internal_forkexec (postmaster.c:4482)
==00:00:00:01.692 1259396==

With memset(param, 0, sizeof(*param)); added at the beginning of
save_backend_variables(), server starts successfully, but then
`make check` fails with other Valgrind error.

Best regards,
Alexander



Re: Refactoring backend fork+exec code

От
"Tristan Partin"
Дата:
On Fri Dec 1, 2023 at 6:10 AM CST, Heikki Linnakangas wrote:
> On 30/11/2023 20:44, Tristan Partin wrote:
> > Patches 1-3 seem committable as-is.
>
> Thanks for the review! I'm focusing on patches 1-3 now, and will come
> back to the rest after committing 1-3.
>
> There was one test failure with EXEC_BACKEND from patch 2, in
> 'test_shm_mq'. In restore_backend_variables() I checked if 'bgw_name' is
> empty to decide if the BackgroundWorker struct is filled in or not, but
> it turns out that 'test_shm_mq' doesn't fill in bgw_name. It probably
> should, I think that's an oversight in 'test_shm_mq', but that's a
> separate issue.
>
> I did some more refactoring of patch 2, to fix that and to improve it in
> general. The BackgroundWorker struct is now passed through the
> fork-related functions similarly to the Port struct. That seems more
> consistent.
>
> Attached is new version of these patches. For easier review, I made the
> new refactorings compared in a new commit 0003. I will squash that
> before pushing, but this makes it easier to see what changed.
>
> Barring any new feedback or issues, I will commit these.

My only thought is that perhaps has_bg_worker is a better name than
has_worker, but I agree that having a flag is better than checking
bgw_name.

--
Tristan Partin
Neon (https://neon.tech)



Re: Refactoring backend fork+exec code

От
Heikki Linnakangas
Дата:
On 01/12/2023 16:00, Alexander Lakhin wrote:
> Maybe you could look at issues with running `make check` under Valgrind
> when server built with CPPFLAGS="-DUSE_VALGRIND -DEXEC_BACKEND":
> # +++ regress check in src/test/regress +++
> # initializing database system by copying initdb template
> # postmaster failed, examine ".../src/test/regress/log/postmaster.log" for the reason
> Bail out!make[1]: ***
> 
> ...
> 2023-12-01 16:48:39.136 MSK postmaster[1307988] LOG:  listening on Unix socket
"/tmp/pg_regress-pPFNk0/.s.PGSQL.55312"
> ==00:00:00:01.692 1259396== Syscall param write(buf) points to uninitialised byte(s)
> ==00:00:00:01.692 1259396==    at 0x5245A37: write (write.c:26)
> ==00:00:00:01.692 1259396==    by 0x51BBF6C: _IO_file_write@@GLIBC_2.2.5 (fileops.c:1180)
> ==00:00:00:01.692 1259396==    by 0x51BC84F: new_do_write (fileops.c:448)
> ==00:00:00:01.692 1259396==    by 0x51BC84F: _IO_new_file_xsputn (fileops.c:1254)
> ==00:00:00:01.692 1259396==    by 0x51BC84F: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1196)
> ==00:00:00:01.692 1259396==    by 0x51B1056: fwrite (iofwrite.c:39)
> ==00:00:00:01.692 1259396==    by 0x552E21: internal_forkexec (postmaster.c:4518)
> ==00:00:00:01.692 1259396==    by 0x5546A1: postmaster_forkexec (postmaster.c:4444)
> ==00:00:00:01.692 1259396==    by 0x55471C: StartChildProcess (postmaster.c:5275)
> ==00:00:00:01.692 1259396==    by 0x557B61: PostmasterMain (postmaster.c:1454)
> ==00:00:00:01.692 1259396==    by 0x472136: main (main.c:198)
> ==00:00:00:01.692 1259396==  Address 0x1ffeffdc11 is on thread 1's stack
> ==00:00:00:01.692 1259396==  in frame #4, created by internal_forkexec (postmaster.c:4482)
> ==00:00:00:01.692 1259396==
> 
> With memset(param, 0, sizeof(*param)); added at the beginning of
> save_backend_variables(), server starts successfully, but then
> `make check` fails with other Valgrind error.

That's actually a pre-existing issue, I'm seeing that even on unpatched 
'master'.

In a nutshell, the problem is that the uninitialized padding bytes in 
BackendParameters are written to the file. When we read the file back, 
we don't access the padding bytes, so that's harmless. But Valgrind 
doesn't know that.

On Windows, the file is created with 
CreateFileMapping(INVALID_HANDLE_VALUE, ...) and we write the variables 
directly to the mapping. If I understand the Windows API docs correctly, 
it is guaranteed to be initialized to zeros, so we don't have this 
problem on Windows, only on other platforms with EXEC_BACKEND. I think 
it makes sense to clear the memory on other platforms too, since that's 
what we do on Windows.

Committed a fix with memset(). I'm not sure what our policy with 
backpatching this kind of issues is. This goes back to all supported 
versions, but given the lack of complaints, I chose to not backpatch.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Refactoring backend fork+exec code

От
"Tristan Partin"
Дата:
On Fri Dec 1, 2023 at 2:44 PM CST, Heikki Linnakangas wrote:
> On 01/12/2023 16:00, Alexander Lakhin wrote:
> > Maybe you could look at issues with running `make check` under Valgrind
> > when server built with CPPFLAGS="-DUSE_VALGRIND -DEXEC_BACKEND":
> > # +++ regress check in src/test/regress +++
> > # initializing database system by copying initdb template
> > # postmaster failed, examine ".../src/test/regress/log/postmaster.log" for the reason
> > Bail out!make[1]: ***
> >
> > ...
> > 2023-12-01 16:48:39.136 MSK postmaster[1307988] LOG:  listening on Unix socket
"/tmp/pg_regress-pPFNk0/.s.PGSQL.55312"
> > ==00:00:00:01.692 1259396== Syscall param write(buf) points to uninitialised byte(s)
> > ==00:00:00:01.692 1259396==    at 0x5245A37: write (write.c:26)
> > ==00:00:00:01.692 1259396==    by 0x51BBF6C: _IO_file_write@@GLIBC_2.2.5 (fileops.c:1180)
> > ==00:00:00:01.692 1259396==    by 0x51BC84F: new_do_write (fileops.c:448)
> > ==00:00:00:01.692 1259396==    by 0x51BC84F: _IO_new_file_xsputn (fileops.c:1254)
> > ==00:00:00:01.692 1259396==    by 0x51BC84F: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1196)
> > ==00:00:00:01.692 1259396==    by 0x51B1056: fwrite (iofwrite.c:39)
> > ==00:00:00:01.692 1259396==    by 0x552E21: internal_forkexec (postmaster.c:4518)
> > ==00:00:00:01.692 1259396==    by 0x5546A1: postmaster_forkexec (postmaster.c:4444)
> > ==00:00:00:01.692 1259396==    by 0x55471C: StartChildProcess (postmaster.c:5275)
> > ==00:00:00:01.692 1259396==    by 0x557B61: PostmasterMain (postmaster.c:1454)
> > ==00:00:00:01.692 1259396==    by 0x472136: main (main.c:198)
> > ==00:00:00:01.692 1259396==  Address 0x1ffeffdc11 is on thread 1's stack
> > ==00:00:00:01.692 1259396==  in frame #4, created by internal_forkexec (postmaster.c:4482)
> > ==00:00:00:01.692 1259396==
> >
> > With memset(param, 0, sizeof(*param)); added at the beginning of
> > save_backend_variables(), server starts successfully, but then
> > `make check` fails with other Valgrind error.
>
> That's actually a pre-existing issue, I'm seeing that even on unpatched
> 'master'.
>
> In a nutshell, the problem is that the uninitialized padding bytes in
> BackendParameters are written to the file. When we read the file back,
> we don't access the padding bytes, so that's harmless. But Valgrind
> doesn't know that.
>
> On Windows, the file is created with
> CreateFileMapping(INVALID_HANDLE_VALUE, ...) and we write the variables
> directly to the mapping. If I understand the Windows API docs correctly,
> it is guaranteed to be initialized to zeros, so we don't have this
> problem on Windows, only on other platforms with EXEC_BACKEND. I think
> it makes sense to clear the memory on other platforms too, since that's
> what we do on Windows.
>
> Committed a fix with memset(). I'm not sure what our policy with
> backpatching this kind of issues is. This goes back to all supported
> versions, but given the lack of complaints, I chose to not backpatch.

Seems like a harmless think to backpatch. It is conceivable that someone
might want to run Valgrind on something other than HEAD.

--
Tristan Partin
Neon (https://neon.tech)



Re: Refactoring backend fork+exec code

От
Tom Lane
Дата:
"Tristan Partin" <tristan@neon.tech> writes:
> On Fri Dec 1, 2023 at 2:44 PM CST, Heikki Linnakangas wrote:
>> Committed a fix with memset(). I'm not sure what our policy with 
>> backpatching this kind of issues is. This goes back to all supported 
>> versions, but given the lack of complaints, I chose to not backpatch.

> Seems like a harmless think to backpatch. It is conceivable that someone 
> might want to run Valgrind on something other than HEAD.

FWIW, I agree with Heikki's conclusion.  EXEC_BACKEND on non-Windows
is already a niche developer-only setup, and given the lack of complaints,
nobody's that interested in running Valgrind with it.  Fixing it on HEAD
seems like plenty.

            regards, tom lane



Re: Refactoring backend fork+exec code

От
Alexander Lakhin
Дата:
Hello Heikki,

01.12.2023 23:44, Heikki Linnakangas wrote:
>
>> With memset(param, 0, sizeof(*param)); added at the beginning of
>> save_backend_variables(), server starts successfully, but then
>> `make check` fails with other Valgrind error.
>
> That's actually a pre-existing issue, I'm seeing that even on unpatched 'master'.

Thank you for fixing that!

Yes, I had discovered it before, but yesterday I decided to check whether
your patches improve the situation...

What bothered me additionally, is an error detected after server start. I
couldn't see it without the patches applied. I mean, on HEAD I now see
`make check` passing, but with the patches it fails:
...
# parallel group (20 tests):  interval date numerology polygon box macaddr8 macaddr multirangetypes line timestamp 
timetz timestamptz time circle strings lseg inet md5 path point
not ok 22    + strings                                  1048 ms
# (test process exited with exit code 2)
not ok 23    + md5                                      1052 ms
# (test process exited with exit code 2)
...
src/test/regress/log/postmaster.log contains:
==00:00:00:30.730 1713480== Syscall param write(buf) points to uninitialised byte(s)
==00:00:00:30.730 1713480==    at 0x5245A37: write (write.c:26)
==00:00:00:30.730 1713480==    by 0x51BBF6C: _IO_file_write@@GLIBC_2.2.5 (fileops.c:1180)
==00:00:00:30.730 1713480==    by 0x51BC84F: new_do_write (fileops.c:448)
==00:00:00:30.730 1713480==    by 0x51BC84F: _IO_new_file_xsputn (fileops.c:1254)
==00:00:00:30.730 1713480==    by 0x51BC84F: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1196)
==00:00:00:30.730 1713480==    by 0x51B1056: fwrite (iofwrite.c:39)
==00:00:00:30.730 1713480==    by 0x5540CF: internal_forkexec (postmaster.c:4526)
==00:00:00:30.730 1713480==    by 0x5543C0: bgworker_forkexec (postmaster.c:5624)
==00:00:00:30.730 1713480==    by 0x555477: do_start_bgworker (postmaster.c:5665)
==00:00:00:30.730 1713480==    by 0x555738: maybe_start_bgworkers (postmaster.c:5928)
==00:00:00:30.730 1713480==    by 0x556072: process_pm_pmsignal (postmaster.c:5080)
==00:00:00:30.730 1713480==    by 0x556610: ServerLoop (postmaster.c:1761)
==00:00:00:30.730 1713480==    by 0x557BE2: PostmasterMain (postmaster.c:1469)
==00:00:00:30.730 1713480==    by 0x47216B: main (main.c:198)
==00:00:00:30.730 1713480==  Address 0x1ffeffd8c0 is on thread 1's stack
==00:00:00:30.730 1713480==  in frame #4, created by internal_forkexec (postmaster.c:4482)
==00:00:00:30.730 1713480==
...
2023-12-02 05:14:30.751 MSK client backend[1713740] pg_regress/rangetypes FATAL:  terminating connection due to 
unexpected postmaster exit
2023-12-02 05:14:31.033 MSK client backend[1713734] pg_regress/numeric FATAL:  postmaster exited during a parallel 
transaction
TRAP: failed Assert("!IsTransactionOrTransactionBlock()"), File: "pgstat.c", Line: 591, PID: 1713734

I haven't looked deeper yet, but it seems that we see two issues here (and
Assert is not directly caused by the patches set.)

Best regards,
Alexander



Re: Refactoring backend fork+exec code

От
Heikki Linnakangas
Дата:
On 02/12/2023 05:00, Alexander Lakhin wrote:
> What bothered me additionally, is an error detected after server start. I
> couldn't see it without the patches applied. I mean, on HEAD I now see
> `make check` passing, but with the patches it fails:
> ...
> # parallel group (20 tests):  interval date numerology polygon box macaddr8 macaddr multirangetypes line timestamp
> timetz timestamptz time circle strings lseg inet md5 path point
> not ok 22    + strings                                  1048 ms
> # (test process exited with exit code 2)
> not ok 23    + md5                                      1052 ms
> # (test process exited with exit code 2)
> ...
> src/test/regress/log/postmaster.log contains:
> ==00:00:00:30.730 1713480== Syscall param write(buf) points to uninitialised byte(s)
> ==00:00:00:30.730 1713480==    at 0x5245A37: write (write.c:26)
> ==00:00:00:30.730 1713480==    by 0x51BBF6C: _IO_file_write@@GLIBC_2.2.5 (fileops.c:1180)
> ==00:00:00:30.730 1713480==    by 0x51BC84F: new_do_write (fileops.c:448)
> ==00:00:00:30.730 1713480==    by 0x51BC84F: _IO_new_file_xsputn (fileops.c:1254)
> ==00:00:00:30.730 1713480==    by 0x51BC84F: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1196)
> ==00:00:00:30.730 1713480==    by 0x51B1056: fwrite (iofwrite.c:39)
> ==00:00:00:30.730 1713480==    by 0x5540CF: internal_forkexec (postmaster.c:4526)
> ==00:00:00:30.730 1713480==    by 0x5543C0: bgworker_forkexec (postmaster.c:5624)
> ==00:00:00:30.730 1713480==    by 0x555477: do_start_bgworker (postmaster.c:5665)
> ==00:00:00:30.730 1713480==    by 0x555738: maybe_start_bgworkers (postmaster.c:5928)
> ==00:00:00:30.730 1713480==    by 0x556072: process_pm_pmsignal (postmaster.c:5080)
> ==00:00:00:30.730 1713480==    by 0x556610: ServerLoop (postmaster.c:1761)
> ==00:00:00:30.730 1713480==    by 0x557BE2: PostmasterMain (postmaster.c:1469)
> ==00:00:00:30.730 1713480==    by 0x47216B: main (main.c:198)
> ==00:00:00:30.730 1713480==  Address 0x1ffeffd8c0 is on thread 1's stack
> ==00:00:00:30.730 1713480==  in frame #4, created by internal_forkexec (postmaster.c:4482)
> ==00:00:00:30.730 1713480==
> ...

Ack, I see this too. I fixed it by adding MCXT_ALLOC_ZERO to the 
allocation of the BackendWorker struct. That's a little heavy-handed, 
like with the previous failures the uninitialized padding bytes are 
written to the file and read back, but not accessed after that. But it 
seems like the simplest fix. This isn't performance critical after all.

I also renamed the 'has_worker' field to 'has_bgworker', per Tristan's 
suggestion. Pushed with those changes.

Thanks for the reviews!

> 2023-12-02 05:14:30.751 MSK client backend[1713740] pg_regress/rangetypes FATAL:  terminating connection due to
> unexpected postmaster exit
> 2023-12-02 05:14:31.033 MSK client backend[1713734] pg_regress/numeric FATAL:  postmaster exited during a parallel
> transaction
> TRAP: failed Assert("!IsTransactionOrTransactionBlock()"), File: "pgstat.c", Line: 591, PID: 1713734
> 
> I haven't looked deeper yet, but it seems that we see two issues here (and
> Assert is not directly caused by the patches set.)

I have not been able to reproduce this one.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Refactoring backend fork+exec code

От
Heikki Linnakangas
Дата:
On 30/11/2023 22:26, Andres Freund wrote:
> On 2023-11-30 01:36:25 +0200, Heikki Linnakangas wrote:
>>   [...]
>>   33 files changed, 1787 insertions(+), 2002 deletions(-)
> 
> Well, that's not small...
> 
> I think it may be worth splitting some of the file renaming out into a
> separate commit, makes it harder to see what change here.

Here you are (details at end of this email)

>> +    [PMC_AV_LAUNCHER] = {"autovacuum launcher", AutoVacLauncherMain, true},
>> +    [PMC_AV_WORKER] = {"autovacuum worker", AutoVacWorkerMain, true},
>> +    [PMC_BGWORKER] = {"bgworker", BackgroundWorkerMain, true},
>> +    [PMC_SYSLOGGER] = {"syslogger", SysLoggerMain, false},
>> +
>> +    [PMC_STARTUP] = {"startup", StartupProcessMain, true},
>> +    [PMC_BGWRITER] = {"bgwriter", BackgroundWriterMain, true},
>> +    [PMC_ARCHIVER] = {"archiver", PgArchiverMain, true},
>> +    [PMC_CHECKPOINTER] = {"checkpointer", CheckpointerMain, true},
>> +    [PMC_WAL_WRITER] = {"wal_writer", WalWriterMain, true},
>> +    [PMC_WAL_RECEIVER] = {"wal_receiver", WalReceiverMain, true},
>> +};
> 
> 
> It feels like we have too many different ways of documenting the type of a
> process. This new PMC_ stuff, enum AuxProcType, enum BackendType.

Agreed. And "am_walsender" and such variables.

> Which then leads to code like this:
> 
>> -CheckpointerMain(void)
>> +CheckpointerMain(char *startup_data, size_t startup_data_len)
>>   {
>>       sigjmp_buf    local_sigjmp_buf;
>>       MemoryContext checkpointer_context;
>>
>> +    Assert(startup_data_len == 0);
>> +
>> +    MyAuxProcType = CheckpointerProcess;
>> +    MyBackendType = B_CHECKPOINTER;
>> +    AuxiliaryProcessInit();
>> +
> 
> For each type of child process. That seems a bit too redundant.  Can't we
> unify this at least somewhat? Can't we just reuse BackendType here? Sure,
> there'd be pointless entry for B_INVALID, but that doesn't seem like a
> problem, could even be useful, by pointing it to a function raising an error.

There are a few differences: B_INVALID (and B_STANDALONE_BACKEND) are 
pointless for this array as you noted. But also, we don't know if the 
backend is a regular backend or WAL sender until authentication, so for 
a WAL sender, we'd need to change MyBackendType from B_BACKEND to 
B_WAL_SENDER after forking. Maybe that's ok.

I didn't do anything about this yet, but I'll give it some more thought.

>> +    if (strncmp(argv[1], "--forkchild=", 12) != 0)
>> +        elog(FATAL, "invalid subpostmaster invocation (--forkchild argument missing)");
>> +    entry_name = argv[1] + 12;
>> +    found = false;
>> +    for (int idx = 0; idx < lengthof(entry_kinds); idx++)
>> +    {
>> +        if (strcmp(entry_kinds[idx].name, entry_name) == 0)
>> +        {
>> +            child_type = idx;
>> +            found = true;
>> +            break;
>> +        }
>> +    }
>> +    if (!found)
>> +        elog(ERROR, "unknown child kind %s", entry_name);
> 
> If we then have to search linearly, why don't we just pass the index into the
> array?

We could. I like the idea of a human-readable name on the command line, 
although I'm not sure if it's really visible anywhere.

>> +void
>> +BackendMain(char *startup_data, size_t startup_data_len)
>> +{
> 
> Is there any remaining reason for this to live in postmaster.c? Given that
> other backend types don't, that seems oddly assymmetrical.

Gee, another yak to shave, thanks ;-). You're right, that makes a lot of 
sense. I added another patch that moves that to a new file, 
src/backend/tcop/backend_startup.c. ProcessStartupPacket() and friends 
go there too. It might make sense to do this before the other patches, 
but it's the last patch in the series now.

I kept processCancelRequest() in postmaster.c because it looks at 
BackendList/ShmemBackendArray, which are static in postmaster.c. Some 
more refactoring might be in order there, perhaps moving those to a 
different file too. But that can be done separately, this split is 
pretty OK as is.

On 30/11/2023 20:44, Tristan Partin wrote:
>>  From 8886db1ed6bae21bf6d77c9bb1230edbb55e24f9 Mon Sep 17 00:00:00 2001
>>  From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
>>  Date: Thu, 30 Nov 2023 00:04:22 +0200
>>  Subject: [PATCH v3 4/7] Pass CAC as argument to backend process
> 
> For me, being new to the code, it would be nice to have more of an 
> explanation as to why this is "better." I don't doubt it; it would just 
> help me and future readers of this commit in the future. More of an 
> explanation in the commit message would suffice.

Updated the commit message. It's mainly to pave the way for the next 
patches, which move the initialization of Port to the backend process, 
after forking. And that in turn paves the way for the patches after 
that. But also, very subjectively, it feels more natural to me.

> My other comment on this commit is that we now seem to have lost the 
> context on what CAC stands for. Before we had the member variable to 
> explain it. A comment on the enum would be great or changing cac named 
> variables to canAcceptConnections. I did notice in patch 7 that there 
> are still some variables named canAcceptConnections around, so I'll 
> leave this comment up to you.

Good point. The last patch in this series - which is new compared to 
previous patch version - moves CAC_state to a different header file 
again. I added a comment there.

>>  +        if (fwrite(param, paramsz, 1, fp) != 1)
>>  +        {
>>  +                ereport(LOG,
>>  +                                (errcode_for_file_access(),
>>  +                                 errmsg("could not write to file \"%s\": %m", tmpfilename)));
>>  +                FreeFile(fp);
>>  +                return -1;
>>  +        }
>>  +
>>  +        /* Release file */
>>  +        if (FreeFile(fp))
>>  +        {
>>  +                ereport(LOG,
>>  +                                (errcode_for_file_access(),
>>  +                                 errmsg("could not write to file \"%s\": %m", tmpfilename)));
>>  +                return -1;
>>  +        }
> 
> Two pieces of feedback here. I generally find write(2) more useful than 
> fwrite(3) because write(2) will report a useful errno, whereas fwrite(2) 
> just uses ferror(3). The additional errno information might be valuable 
> context in the log message. Up to you if you think it is also valuable.

In general I agree. This patch just moves existing code though, so I 
left it as is.

> The log message if FreeFile() fails doesn't seem to make sense to me. 
> I didn't see any file writing in that code path, but it is possible that 
> I missed something.

FreeFile() calls fclose(), which flushes the buffer. If fclose() fails, 
it's most likely because the write() to flush the buffer failed, so 
"could not write" is usually appropriate. (It feels ugly to me too, 
error handling with the buffered i/o functions is a bit messy. As you 
said, plain open()/write() is more clear.)

>>  +        /*
>>  +         * Need to reinitialize the SSL library in the backend, since the context
>>  +         * structures contain function pointers and cannot be passed through the
>>  +         * parameter file.
>>  +         *
>>  +         * If for some reason reload fails (maybe the user installed broken key
>>  +         * files), soldier on without SSL; that's better than all connections
>>  +         * becoming impossible.
>>  +         *
>>  +         * XXX should we do this in all child processes?  For the moment it's
>>  +         * enough to do it in backend children.
>>  +         */
>>  +#ifdef USE_SSL
>>  +        if (EnableSSL)
>>  +        {
>>  +                if (secure_initialize(false) == 0)
>>  +                        LoadedSSL = true;
>>  +                else
>>  +                        ereport(LOG,
>>  +                                        (errmsg("SSL configuration could not be loaded in child process")));
>>  +        }
>>  +#endif
> 
> Do other child process types do any non-local communication?

No. Although in theory an extension-defined background worker could do 
whatever, including opening TLS connections. It's not clear if such a
background worker would want the same initialization that we do in 
secure_initialize(), or something else.


Here is a new patch set:

> v5-0001-Pass-CAC-as-argument-to-backend-process.patch
> v5-0002-Remove-ConnCreate-and-ConnFree-and-allocate-Port-.patch
> v5-0003-Move-initialization-of-Port-struct-to-child-proce.patch

These patches form a pretty well-contained unit. The gist is to move the 
initialization of the Port struct to after forking the backend process 
(in patch 3).

I plan to polish and commit these next, so any final reviews on these 
are welcome.

> v5-0004-Extract-registration-of-Win32-deadchild-callback-.patch
> v5-0005-Move-some-functions-from-postmaster.c-to-new-sour.patch
> v5-0006-Refactor-AuxProcess-startup.patch
> v5-0007-Refactor-postmaster-child-process-launching.patch

Patches 4-6 are refactorings that don't do much good on their own, but 
they help to make patch 7 much smaller and easier to review.

I left out some of the code-moving that I had in previous patch versions:

- Previously I moved fork_process() function from fork_process.c to the 
new launch_backend.c file. That might still make sense, there is nothing 
else in fork_process.c and the only caller is in launch_backend.c. But 
I'm not sure, and it can be done separately.

- Previously I moved InitPostmasterChild from miscinit.c to the new 
launch_backend.c file. That might also still make sense, but I'm not 
100% sure it's an improvement, and it can be done later if we want to.

> v5-0008-Move-code-for-backend-startup-to-separate-file.patch

This moves BackendMain() and friends from postmaster.c to a new file, 
per Andres's suggestion.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Вложения

Re: Refactoring backend fork+exec code

От
Heikki Linnakangas
Дата:
On 08/12/2023 14:33, Heikki Linnakangas wrote:
>>> +    [PMC_AV_LAUNCHER] = {"autovacuum launcher", AutoVacLauncherMain, true},
>>> +    [PMC_AV_WORKER] = {"autovacuum worker", AutoVacWorkerMain, true},
>>> +    [PMC_BGWORKER] = {"bgworker", BackgroundWorkerMain, true},
>>> +    [PMC_SYSLOGGER] = {"syslogger", SysLoggerMain, false},
>>> +
>>> +    [PMC_STARTUP] = {"startup", StartupProcessMain, true},
>>> +    [PMC_BGWRITER] = {"bgwriter", BackgroundWriterMain, true},
>>> +    [PMC_ARCHIVER] = {"archiver", PgArchiverMain, true},
>>> +    [PMC_CHECKPOINTER] = {"checkpointer", CheckpointerMain, true},
>>> +    [PMC_WAL_WRITER] = {"wal_writer", WalWriterMain, true},
>>> +    [PMC_WAL_RECEIVER] = {"wal_receiver", WalReceiverMain, true},
>>> +};
>>
>> It feels like we have too many different ways of documenting the type of a
>> process. This new PMC_ stuff, enum AuxProcType, enum BackendType.
> Agreed. And "am_walsender" and such variables.

Here's a patch that gets rid of AuxProcType. It's independent of the 
other patches in this thread; if this is committed, I'll rebase the rest 
of the patches over this and get rid of the new PMC_* enum.

Three patches, actually. The first one fixes an existing comment that I 
noticed to be incorrect while working on this. I'll push that soon, 
barring objections. The second one gets rid of AuxProcType, and the 
third one replaces IsBackgroundWorker, IsAutoVacuumLauncherProcess() and 
IsAutoVacuumWorkerProcess() with checks on MyBackendType. So 
MyBackendType is now the primary way to check what kind of a process the 
current process is.

'am_walsender' would also be fairly straightforward to remove and 
replace with AmWalSenderProcess(). I didn't do that because we also have 
am_db_walsender and am_cascading_walsender which cannot be directly 
replaced with MyBackendType. Given that, it might be best to keep 
am_walsender for symmetry.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Вложения

Re: Refactoring backend fork+exec code

От
Andres Freund
Дата:
Hi,

On 2024-01-10 14:35:52 +0200, Heikki Linnakangas wrote:
> Here's a patch that gets rid of AuxProcType. It's independent of the other
> patches in this thread; if this is committed, I'll rebase the rest of the
> patches over this and get rid of the new PMC_* enum.
> 
> Three patches, actually. The first one fixes an existing comment that I
> noticed to be incorrect while working on this. I'll push that soon, barring
> objections. The second one gets rid of AuxProcType, and the third one
> replaces IsBackgroundWorker, IsAutoVacuumLauncherProcess() and
> IsAutoVacuumWorkerProcess() with checks on MyBackendType. So MyBackendType
> is now the primary way to check what kind of a process the current process
> is.
> 
> 'am_walsender' would also be fairly straightforward to remove and replace
> with AmWalSenderProcess(). I didn't do that because we also have
> am_db_walsender and am_cascading_walsender which cannot be directly replaced
> with MyBackendType. Given that, it might be best to keep am_walsender for
> symmetry.


> @@ -561,13 +561,13 @@ static void ShmemBackendArrayAdd(Backend *bn);
>  static void ShmemBackendArrayRemove(Backend *bn);
>  #endif                            /* EXEC_BACKEND */
>  
> -#define StartupDataBase()        StartChildProcess(StartupProcess)
> -#define StartArchiver()            StartChildProcess(ArchiverProcess)
> -#define StartBackgroundWriter() StartChildProcess(BgWriterProcess)
> -#define StartCheckpointer()        StartChildProcess(CheckpointerProcess)
> -#define StartWalWriter()        StartChildProcess(WalWriterProcess)
> -#define StartWalReceiver()        StartChildProcess(WalReceiverProcess)
> -#define StartWalSummarizer()    StartChildProcess(WalSummarizerProcess)
> +#define StartupDataBase()        StartChildProcess(B_STARTUP)
> +#define StartArchiver()            StartChildProcess(B_ARCHIVER)
> +#define StartBackgroundWriter() StartChildProcess(B_BG_WRITER)
> +#define StartCheckpointer()        StartChildProcess(B_CHECKPOINTER)
> +#define StartWalWriter()        StartChildProcess(B_WAL_WRITER)
> +#define StartWalReceiver()        StartChildProcess(B_WAL_RECEIVER)
> +#define StartWalSummarizer()    StartChildProcess(B_WAL_SUMMARIZER)

Not for this commit, but we ought to rip out these macros - all they do is to
make it harder to understand the code.




> @@ -5344,31 +5344,31 @@ StartChildProcess(AuxProcType type)
>          errno = save_errno;
>          switch (type)
>          {
> -            case StartupProcess:
> +            case B_STARTUP:
>                  ereport(LOG,
>                          (errmsg("could not fork startup process: %m")));
>                  break;
> -            case ArchiverProcess:
> +            case B_ARCHIVER:
>                  ereport(LOG,
>                          (errmsg("could not fork archiver process: %m")));
>                  break;
> -            case BgWriterProcess:
> +            case B_BG_WRITER:
>                  ereport(LOG,
>                          (errmsg("could not fork background writer process: %m")));
>                  break;
> -            case CheckpointerProcess:
> +            case B_CHECKPOINTER:
>                  ereport(LOG,
>                          (errmsg("could not fork checkpointer process: %m")));
>                  break;
> -            case WalWriterProcess:
> +            case B_WAL_WRITER:
>                  ereport(LOG,
>                          (errmsg("could not fork WAL writer process: %m")));
>                  break;
> -            case WalReceiverProcess:
> +            case B_WAL_RECEIVER:
>                  ereport(LOG,
>                          (errmsg("could not fork WAL receiver process: %m")));
>                  break;
> -            case WalSummarizerProcess:
> +            case B_WAL_SUMMARIZER:
>                  ereport(LOG,
>                          (errmsg("could not fork WAL summarizer process: %m")));
>                  break;

Seems we should replace this with something slightly more generic one of these
days...


> diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
> index 1a1050c8da1..92f24db4e18 100644
> --- a/src/backend/utils/activity/backend_status.c
> +++ b/src/backend/utils/activity/backend_status.c
> @@ -257,17 +257,16 @@ pgstat_beinit(void)
>      else
>      {
>          /* Must be an auxiliary process */
> -        Assert(MyAuxProcType != NotAnAuxProcess);
> +        Assert(IsAuxProcess(MyBackendType));
>  
>          /*
>           * Assign the MyBEEntry for an auxiliary process.  Since it doesn't
>           * have a BackendId, the slot is statically allocated based on the
> -         * auxiliary process type (MyAuxProcType).  Backends use slots indexed
> -         * in the range from 0 to MaxBackends (exclusive), so we use
> -         * MaxBackends + AuxProcType as the index of the slot for an auxiliary
> -         * process.
> +         * auxiliary process type.  Backends use slots indexed in the range
> +         * from 0 to MaxBackends (exclusive), and aux processes use the slots
> +         * after that.
>           */
> -        MyBEEntry = &BackendStatusArray[MaxBackends + MyAuxProcType];
> +        MyBEEntry = &BackendStatusArray[MaxBackends + MyBackendType - FIRST_AUX_PROC];
>      }

Hm, this seems less than pretty. It's probably ok for now, but it seems like a
better fix might be to just start assigning backend ids to aux procs or switch
to indexing by pgprocno?


> From 795929a5f5a5d6ea4fa8a46bb15c68d2ff46ad3d Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Wed, 10 Jan 2024 12:59:48 +0200
> Subject: [PATCH v6 3/3] Use MyBackendType in more places to check what process
>  this is
> 
> Remove IsBackgroundWorker, IsAutoVacuumLauncherProcess() and
> IsAutoVacuumWorkerProcess() in favor of new Am*Process() macros that
> use MyBackendType. For consistency with the existing Am*Process()
> macros.

The Am*Process() macros aren't realy new, they are just implemented
differently, right? I guess there are a few more of them now.

Given that we are probably going to have more process types in the future, it
seems like a better direction would be a AmProcessType(proctype) style
macro/inline function. That we we don't have to mirror the list of process
types in the enum and a set of macros.

Greetings,

Andres Freund



Re: Refactoring backend fork+exec code

От
Heikki Linnakangas
Дата:
On 22/01/2024 23:07, Andres Freund wrote:
> On 2024-01-10 14:35:52 +0200, Heikki Linnakangas wrote:
>> @@ -5344,31 +5344,31 @@ StartChildProcess(AuxProcType type)
>>           errno = save_errno;
>>           switch (type)
>>           {
>> -            case StartupProcess:
>> +            case B_STARTUP:
>>                   ereport(LOG,
>>                           (errmsg("could not fork startup process: %m")));
>>                   break;
>> -            case ArchiverProcess:
>> +            case B_ARCHIVER:
>>                   ereport(LOG,
>>                           (errmsg("could not fork archiver process: %m")));
>>                   break;
>> -            case BgWriterProcess:
>> +            case B_BG_WRITER:
>>                   ereport(LOG,
>>                           (errmsg("could not fork background writer process: %m")));
>>                   break;
>> -            case CheckpointerProcess:
>> +            case B_CHECKPOINTER:
>>                   ereport(LOG,
>>                           (errmsg("could not fork checkpointer process: %m")));
>>                   break;
>> -            case WalWriterProcess:
>> +            case B_WAL_WRITER:
>>                   ereport(LOG,
>>                           (errmsg("could not fork WAL writer process: %m")));
>>                   break;
>> -            case WalReceiverProcess:
>> +            case B_WAL_RECEIVER:
>>                   ereport(LOG,
>>                           (errmsg("could not fork WAL receiver process: %m")));
>>                   break;
>> -            case WalSummarizerProcess:
>> +            case B_WAL_SUMMARIZER:
>>                   ereport(LOG,
>>                           (errmsg("could not fork WAL summarizer process: %m")));
>>                   break;
> 
> Seems we should replace this with something slightly more generic one of these
> days...

The later patches in this thread will turn these into

ereport(LOG,
         (errmsg("could not fork %s process: %m", 
PostmasterChildName(type))));

>> diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
>> index 1a1050c8da1..92f24db4e18 100644
>> --- a/src/backend/utils/activity/backend_status.c
>> +++ b/src/backend/utils/activity/backend_status.c
>> @@ -257,17 +257,16 @@ pgstat_beinit(void)
>>       else
>>       {
>>           /* Must be an auxiliary process */
>> -        Assert(MyAuxProcType != NotAnAuxProcess);
>> +        Assert(IsAuxProcess(MyBackendType));
>>   
>>           /*
>>            * Assign the MyBEEntry for an auxiliary process.  Since it doesn't
>>            * have a BackendId, the slot is statically allocated based on the
>> -         * auxiliary process type (MyAuxProcType).  Backends use slots indexed
>> -         * in the range from 0 to MaxBackends (exclusive), so we use
>> -         * MaxBackends + AuxProcType as the index of the slot for an auxiliary
>> -         * process.
>> +         * auxiliary process type.  Backends use slots indexed in the range
>> +         * from 0 to MaxBackends (exclusive), and aux processes use the slots
>> +         * after that.
>>            */
>> -        MyBEEntry = &BackendStatusArray[MaxBackends + MyAuxProcType];
>> +        MyBEEntry = &BackendStatusArray[MaxBackends + MyBackendType - FIRST_AUX_PROC];
>>       }
> 
> Hm, this seems less than pretty. It's probably ok for now, but it seems like a
> better fix might be to just start assigning backend ids to aux procs or switch
> to indexing by pgprocno?

Using pgprocno is a good idea. Come to think of it, why do we even have 
a concept of backend ID that's separate from pgprocno? backend ID is 
used to index the ProcState array, but AFAICS we could use pgprocno as 
the index to that, too.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Refactoring backend fork+exec code

От
Andres Freund
Дата:
Hi,

On 2024-01-23 21:07:08 +0200, Heikki Linnakangas wrote:
> On 22/01/2024 23:07, Andres Freund wrote:
> > > diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
> > > index 1a1050c8da1..92f24db4e18 100644
> > > --- a/src/backend/utils/activity/backend_status.c
> > > +++ b/src/backend/utils/activity/backend_status.c
> > > @@ -257,17 +257,16 @@ pgstat_beinit(void)
> > >       else
> > >       {
> > >           /* Must be an auxiliary process */
> > > -        Assert(MyAuxProcType != NotAnAuxProcess);
> > > +        Assert(IsAuxProcess(MyBackendType));
> > >           /*
> > >            * Assign the MyBEEntry for an auxiliary process.  Since it doesn't
> > >            * have a BackendId, the slot is statically allocated based on the
> > > -         * auxiliary process type (MyAuxProcType).  Backends use slots indexed
> > > -         * in the range from 0 to MaxBackends (exclusive), so we use
> > > -         * MaxBackends + AuxProcType as the index of the slot for an auxiliary
> > > -         * process.
> > > +         * auxiliary process type.  Backends use slots indexed in the range
> > > +         * from 0 to MaxBackends (exclusive), and aux processes use the slots
> > > +         * after that.
> > >            */
> > > -        MyBEEntry = &BackendStatusArray[MaxBackends + MyAuxProcType];
> > > +        MyBEEntry = &BackendStatusArray[MaxBackends + MyBackendType - FIRST_AUX_PROC];
> > >       }
> > 
> > Hm, this seems less than pretty. It's probably ok for now, but it seems like a
> > better fix might be to just start assigning backend ids to aux procs or switch
> > to indexing by pgprocno?
> 
> Using pgprocno is a good idea. Come to think of it, why do we even have a
> concept of backend ID that's separate from pgprocno? backend ID is used to
> index the ProcState array, but AFAICS we could use pgprocno as the index to
> that, too.

I think we should do that. There are a few processes not participating in
sinval, but it doesn't make enough of a difference to make sinval slower. And
I think there'd be far bigger efficiency improvements to sinvaladt than not
having a handful more entries.

Greetings,

Andres Freund



Re: Refactoring backend fork+exec code

От
Heikki Linnakangas
Дата:
On 23/01/2024 21:50, Andres Freund wrote:
> On 2024-01-23 21:07:08 +0200, Heikki Linnakangas wrote:
>> On 22/01/2024 23:07, Andres Freund wrote:
>>>> diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
>>>> index 1a1050c8da1..92f24db4e18 100644
>>>> --- a/src/backend/utils/activity/backend_status.c
>>>> +++ b/src/backend/utils/activity/backend_status.c
>>>> @@ -257,17 +257,16 @@ pgstat_beinit(void)
>>>>        else
>>>>        {
>>>>            /* Must be an auxiliary process */
>>>> -        Assert(MyAuxProcType != NotAnAuxProcess);
>>>> +        Assert(IsAuxProcess(MyBackendType));
>>>>            /*
>>>>             * Assign the MyBEEntry for an auxiliary process.  Since it doesn't
>>>>             * have a BackendId, the slot is statically allocated based on the
>>>> -         * auxiliary process type (MyAuxProcType).  Backends use slots indexed
>>>> -         * in the range from 0 to MaxBackends (exclusive), so we use
>>>> -         * MaxBackends + AuxProcType as the index of the slot for an auxiliary
>>>> -         * process.
>>>> +         * auxiliary process type.  Backends use slots indexed in the range
>>>> +         * from 0 to MaxBackends (exclusive), and aux processes use the slots
>>>> +         * after that.
>>>>             */
>>>> -        MyBEEntry = &BackendStatusArray[MaxBackends + MyAuxProcType];
>>>> +        MyBEEntry = &BackendStatusArray[MaxBackends + MyBackendType - FIRST_AUX_PROC];
>>>>        }
>>>
>>> Hm, this seems less than pretty. It's probably ok for now, but it seems like a
>>> better fix might be to just start assigning backend ids to aux procs or switch
>>> to indexing by pgprocno?
>>
>> Using pgprocno is a good idea. Come to think of it, why do we even have a
>> concept of backend ID that's separate from pgprocno? backend ID is used to
>> index the ProcState array, but AFAICS we could use pgprocno as the index to
>> that, too.
> 
> I think we should do that. There are a few processes not participating in
> sinval, but it doesn't make enough of a difference to make sinval slower. And
> I think there'd be far bigger efficiency improvements to sinvaladt than not
> having a handful more entries.

And here we go. BackendID is now a 1-based index directly into the 
PGPROC array.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Вложения

Re: Refactoring backend fork+exec code

От
reid.thompson@crunchydata.com
Дата:
On Thu, 2024-01-25 at 01:51 +0200, Heikki Linnakangas wrote:
>
> And here we go. BackendID is now a 1-based index directly into the
> PGPROC array.
>

Would it be worthwhile to also note in this comment FIRST_AUX_PROC's
and IsAuxProcess()'s dependency on B_ARCHIVER and it's location in the
enum table?

      /*
              
      ¦* Auxiliary processes. These have PGPROC entries, but they are not
              
      ¦* attached to any particular database. There can be only one of each of
              
      ¦* these running at a time.
              
      ¦*
              
      ¦* If you modify these, make sure to update NUM_AUXILIARY_PROCS and the
              
      ¦* glossary in the docs.
              
      ¦*/
              
      B_ARCHIVER,
              
      B_BG_WRITER,
              
      B_CHECKPOINTER,
              
      B_STARTUP,
              
      B_WAL_RECEIVER,
              
      B_WAL_SUMMARIZER,
              
      B_WAL_WRITER,
              
  } BackendType;
              

            
  #define BACKEND_NUM_TYPES (B_WAL_WRITER + 1)

              
  extern PGDLLIMPORT BackendType MyBackendType;

              
  #define FIRST_AUX_PROC B_ARCHIVER
  #define IsAuxProcess(type)          (MyBackendType >= FIRST_AUX_PROC)



Re: Refactoring backend fork+exec code

От
Heikki Linnakangas
Дата:
On 29/01/2024 17:54, reid.thompson@crunchydata.com wrote:
> On Thu, 2024-01-25 at 01:51 +0200, Heikki Linnakangas wrote:
>>
>> And here we go. BackendID is now a 1-based index directly into the
>> PGPROC array.
> 
> Would it be worthwhile to also note in this comment FIRST_AUX_PROC's
> and IsAuxProcess()'s dependency on B_ARCHIVER and it's location in the
> enum table?

Yeah, that might be in order. Although looking closer, it's only used in 
IsAuxProcess, which is only used in one sanity check in 
AuxProcessMain(). And even that gets refactored away by the later 
patches in this thread. So on second thoughts, I'll just remove it 
altogether.

I spent some more time on the 'lastBackend' optimization in sinvaladt.c. 
I realized that it became very useless with these patches, because aux 
processes are allocated pgprocno's after all the slots for regular 
backends. There are always aux processes running, so lastBackend would 
always have a value close to the max anyway. I replaced that with a 
dense 'pgprocnos' array that keeps track of the exact slots that are in 
use. I'm not 100% sure this is worth the effort; does any real world 
workload send shared invalidations so frequently that this matters? In 
any case, this should avoid the regression if such a workload exists.

New patch set attached. I think these are ready to be committed, but 
would appreciate a final review.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Вложения

Re: Refactoring backend fork+exec code

От
Heikki Linnakangas
Дата:
On 30/01/2024 02:08, Heikki Linnakangas wrote:
> On 29/01/2024 17:54, reid.thompson@crunchydata.com wrote:
>> On Thu, 2024-01-25 at 01:51 +0200, Heikki Linnakangas wrote:
>>>
>>> And here we go. BackendID is now a 1-based index directly into the
>>> PGPROC array.
>>
>> Would it be worthwhile to also note in this comment FIRST_AUX_PROC's
>> and IsAuxProcess()'s dependency on B_ARCHIVER and it's location in the
>> enum table?
> 
> Yeah, that might be in order. Although looking closer, it's only used in
> IsAuxProcess, which is only used in one sanity check in
> AuxProcessMain(). And even that gets refactored away by the later
> patches in this thread. So on second thoughts, I'll just remove it
> altogether.
> 
> I spent some more time on the 'lastBackend' optimization in sinvaladt.c.
> I realized that it became very useless with these patches, because aux
> processes are allocated pgprocno's after all the slots for regular
> backends. There are always aux processes running, so lastBackend would
> always have a value close to the max anyway. I replaced that with a
> dense 'pgprocnos' array that keeps track of the exact slots that are in
> use. I'm not 100% sure this is worth the effort; does any real world
> workload send shared invalidations so frequently that this matters? In
> any case, this should avoid the regression if such a workload exists.
> 
> New patch set attached. I think these are ready to be committed, but
> would appreciate a final review.

contrib/amcheck 003_cic_2pc.pl test failures revealed a bug that 
required some reworking:

In a PGPROC entry for a prepared xact, the PGPROC's backendID needs to 
be the original backend's ID, because the prepared xact is holding the 
lock on the original virtual transaction id. When a transaction's 
ownership is moved from the original backend's PGPROC entry to the 
prepared xact PGPROC entry, the backendID needs to be copied over. My 
patch removed the field altogether, so it was not copied over, which 
made it look like it the original VXID lock was released at prepare.

I fixed that by adding back the backendID field. For regular backends, 
it's always equal to pgprocno + 1, but for prepared xacts, it's the 
original backend's ID. To make that less confusing, I moved the 
backendID and lxid fields together under a 'vxid' struct. The two fields 
together form the virtual transaction ID, and that's the only context 
where the 'backendID' field should now be looked at.

I also squashed the 'lastBackend' changes in sinvaladt.c to the main patch.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Вложения

Re: Refactoring backend fork+exec code

От
Andres Freund
Дата:
Hi,

On 2024-01-30 02:08:36 +0200, Heikki Linnakangas wrote:
> I spent some more time on the 'lastBackend' optimization in sinvaladt.c. I
> realized that it became very useless with these patches, because aux
> processes are allocated pgprocno's after all the slots for regular backends.
> There are always aux processes running, so lastBackend would always have a
> value close to the max anyway. I replaced that with a dense 'pgprocnos'
> array that keeps track of the exact slots that are in use. I'm not 100% sure
> this is worth the effort; does any real world workload send shared
> invalidations so frequently that this matters? In any case, this should
> avoid the regression if such a workload exists.
>
> New patch set attached. I think these are ready to be committed, but would
> appreciate a final review.


> From 54f22231bb2540fc5957c14005956161e6fc9dac Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Wed, 24 Jan 2024 23:15:55 +0200
> Subject: [PATCH v8 1/5] Remove superfluous 'pgprocno' field from PGPROC
>
> It was always just the index of the PGPROC entry from the beginning of
> the proc array. Introduce a macro to compute it from the pointer
> instead.

Hm. The pointer math here is bit more expensive than in some other cases, as
the struct is fairly large and sizeof(PGPROC) isn't a power of two. Adding
more math into loops like in TransactionGroupUpdateXidStatus() might end up
showing up.

I've been thinking that we likely should pad PGPROC to some more convenient
boundary, but...


Is this really related to the rest of the series?


> From 4e0121e064804b73ef8a5dc10be27b85968ea1af Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Mon, 29 Jan 2024 23:50:34 +0200
> Subject: [PATCH v8 2/5] Redefine backend ID to be an index into the proc
>  array.
>
> Previously, backend ID was an index into the ProcState array, in the
> shared cache invalidation manager (sinvaladt.c). The entry in the
> ProcState array was reserved at backend startup by scanning the array
> for a free entry, and that was also when the backend got its backend
> ID. Things becomes slightly simpler if we redefine backend ID to be
> the index into the PGPROC array, and directly use it also as an index
> to the ProcState array. This uses a little more memory, as we reserve
> a few extra slots in the ProcState array for aux processes that don't
> need them, but the simplicity is worth it.

> Aux processes now also have a backend ID. This simplifies the
> reservation of BackendStatusArray and ProcSignal slots.
>
> You can now convert a backend ID into an index into the PGPROC array
> simply by subtracting 1. We still use 0-based "pgprocnos" in various
> places, for indexes into the PGPROC array, but the only difference now
> is that backend IDs start at 1 while pgprocnos start at 0.

Why aren't we using 0-based indexing for both? InvalidBackendId is -1, so
there'd not be a conflict, right?


> One potential downside of this patch is that the ProcState array might
> get less densely packed, as we we don't try so hard to assign
> low-numbered backend ID anymore. If it's less densely packed,
> lastBackend will stay at a higher value, and SIInsertDataEntries() and
> SICleanupQueue() need to scan over more unused entries. I think that's
> fine. They are performance critical enough to matter, and there was no
> guarantee on dense packing before either: If you launched a lot of
> backends concurrently, and kept the last one open, lastBackend would
> also stay at a high value.

It's perhaps worth noting here that there's a future patch that also addresses
this to some degree?


> @@ -457,7 +442,7 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
>      Assert(LWLockHeldByMeInMode(TwoPhaseStateLock, LW_EXCLUSIVE));
>
>      Assert(gxact != NULL);
> -    proc = &ProcGlobal->allProcs[gxact->pgprocno];
> +    proc = GetPGProcByNumber(gxact->pgprocno);
>
>      /* Initialize the PGPROC entry */
>      MemSet(proc, 0, sizeof(PGPROC));

This set of changes is independent of this commit, isn't it?


> diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
> index ab86e802f21..39171fea06b 100644
> --- a/src/backend/postmaster/auxprocess.c
> +++ b/src/backend/postmaster/auxprocess.c
> @@ -107,17 +107,7 @@ AuxiliaryProcessMain(AuxProcType auxtype)
>
>      BaseInit();
>
> -    /*
> -     * Assign the ProcSignalSlot for an auxiliary process.  Since it doesn't
> -     * have a BackendId, the slot is statically allocated based on the
> -     * auxiliary process type (MyAuxProcType).  Backends use slots indexed in
> -     * the range from 1 to MaxBackends (inclusive), so we use MaxBackends +
> -     * AuxProcType + 1 as the index of the slot for an auxiliary process.
> -     *
> -     * This will need rethinking if we ever want more than one of a particular
> -     * auxiliary process type.
> -     */
> -    ProcSignalInit(MaxBackends + MyAuxProcType + 1);
> +    ProcSignalInit();

Now that we don't need the offset here, we could move ProcSignalInit() into
BsaeInit() I think?



> +/*
> + * BackendIdGetProc -- get a backend's PGPROC given its backend ID
> + *
> + * The result may be out of date arbitrarily quickly, so the caller
> + * must be careful about how this information is used.  NULL is
> + * returned if the backend is not active.
> + */
> +PGPROC *
> +BackendIdGetProc(int backendID)
> +{
> +    PGPROC       *result;
> +
> +    if (backendID < 1 || backendID > ProcGlobal->allProcCount)
> +        return NULL;

Hm, doesn't calling BackendIdGetProc() with these values a bug? That's not
about being out of date or such.


> +/*
> + * BackendIdGetTransactionIds -- get a backend's transaction status
> + *
> + * Get the xid, xmin, nsubxid and overflow status of the backend.  The
> + * result may be out of date arbitrarily quickly, so the caller must be
> + * careful about how this information is used.
> + */
> +void
> +BackendIdGetTransactionIds(int backendID, TransactionId *xid,
> +                           TransactionId *xmin, int *nsubxid, bool *overflowed)
> +{
> +    PGPROC       *proc;
> +
> +    *xid = InvalidTransactionId;
> +    *xmin = InvalidTransactionId;
> +    *nsubxid = 0;
> +    *overflowed = false;
> +
> +    if (backendID < 1 || backendID > ProcGlobal->allProcCount)
> +        return;
> +    proc = GetPGProcByBackendId(backendID);
> +
> +    /* Need to lock out additions/removals of backends */
> +    LWLockAcquire(ProcArrayLock, LW_SHARED);
> +
> +    if (proc->pid != 0)
> +    {
> +        *xid = proc->xid;
> +        *xmin = proc->xmin;
> +        *nsubxid = proc->subxidStatus.count;
> +        *overflowed = proc->subxidStatus.overflowed;
> +    }
> +
> +    LWLockRelease(ProcArrayLock);
> +}

Hm, I'm not sure about the locking here. For one, previously we weren't
holding ProcArrayLock. For another, holding ProcArrayLock guarantees that the
backend doesn't end its transaction, but it can still assign xids etc.  And,
for that matter, the backendid could have been recycled between the caller
acquiring the backendId and calling BackendIdGetTransactionIds().


> --- a/src/backend/utils/error/elog.c
> +++ b/src/backend/utils/error/elog.c
> @@ -3074,18 +3074,18 @@ log_status_format(StringInfo buf, const char *format, ErrorData *edata)
>                  break;
>              case 'v':
>                  /* keep VXID format in sync with lockfuncs.c */
> -                if (MyProc != NULL && MyProc->backendId != InvalidBackendId)
> +                if (MyProc != NULL)

Doesn't this mean we'll include a vxid in more cases now, particularly
including aux processes? That might be ok, but I also suspect that it'll never
have meaningful values...


> From 94fd46c9ef30ba5e8ac1a8873fce577a4be425f4 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Mon, 29 Jan 2024 22:57:49 +0200
> Subject: [PATCH v8 3/5] Replace 'lastBackend' with an array of in-use slots
>
> Now that the procState array is indexed by pgprocno, the 'lastBackend'
> optimization is useless, because aux processes are assigned PGPROC
> slots and hence have numbers higher than max_connection. So
> 'lastBackend' was always set to almost the end of the array.
>
> To replace that optimization, mantain a dense array of in-use
> indexes. This's redundant with ProgGlobal->procarray, but I was afraid
> of adding any more contention to ProcArrayLock, and this keeps the
> code isolated to sinvaladt.c too.

I think it'd be good to include that explanation and justification in the code
as well.

I suspect we'll need to split out "procarray membership" locking from
ProcArrayLock at some point in some form (vagueness alert). To reduce
contention we already have to hold both ProcArrayLock and XidGenLock when
changing membership, so that holding either of the locks prevents the set of
members to change. This, kinda and differently, adds yet another lock to that.



> It's not clear if we need that optimization at all. I was able to
> write a test case that become slower without this: set max_connections
> to a very high number (> 5000), and create+truncate a table in the
> same transaction thousands of times to send invalidation messages,
> with fsync=off. That became about 20% slower on my laptop.  Arguably
> that's so unrealistic that it doesn't matter, but nevertheless, this
> commit restores the performance of that.

I think it's unfortunately not that uncommon to be bottlenecked by sinval
performance, so I think it's good that you're addressing it.

Greetings,

Andres Freund



Re: Refactoring backend fork+exec code

От
Heikki Linnakangas
Дата:
On 07/02/2024 20:25, Andres Freund wrote:
> On 2024-01-30 02:08:36 +0200, Heikki Linnakangas wrote:
>>  From 54f22231bb2540fc5957c14005956161e6fc9dac Mon Sep 17 00:00:00 2001
>> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
>> Date: Wed, 24 Jan 2024 23:15:55 +0200
>> Subject: [PATCH v8 1/5] Remove superfluous 'pgprocno' field from PGPROC
>>
>> It was always just the index of the PGPROC entry from the beginning of
>> the proc array. Introduce a macro to compute it from the pointer
>> instead.
> 
> Hm. The pointer math here is bit more expensive than in some other cases, as
> the struct is fairly large and sizeof(PGPROC) isn't a power of two. Adding
> more math into loops like in TransactionGroupUpdateXidStatus() might end up
> showing up.

I added a MyProcNumber global variable that is set to 
GetNumberFromPGProc(MyProc). I'm not really concerned about the extra 
math, but with MyProcNumber it should definitely not be an issue. The 
few GetNumberFromPGProc() invocations that remain are in less 
performance-critical paths.

(In later patch, I switch backend ids to 0-based indexing, which 
replaces MyProcNumber references with MyBackendId)

> Is this really related to the rest of the series?

It's not strictly necessary, but it felt prudent to remove it now, since 
I'm removing the backendID field too.

>> You can now convert a backend ID into an index into the PGPROC array
>> simply by subtracting 1. We still use 0-based "pgprocnos" in various
>> places, for indexes into the PGPROC array, but the only difference now
>> is that backend IDs start at 1 while pgprocnos start at 0.
> 
> Why aren't we using 0-based indexing for both? InvalidBackendId is -1, so
> there'd not be a conflict, right?

Correct. I was being conservative and didn't dare to change the old 
convention. The backend ids are visible in a few places like "pg_temp_0" 
schema names, and pg_stat_get_*() functions.

One alternative would be to reserve and waste allProcs[0]. Then pgprocno 
and backend ID could both be direct indexes to the array, but 0 would 
not be used.

If we switch to 0-based indexing, it begs the question: why don't we 
merge the concepts of "pgprocno" and "BackendId" completely and call it 
the same thing everywhere? It probably would be best in the long run. It 
feels like a lot of churn though.

Anyway, I switched to 0-based indexing in the attached new version, to 
see what it looks like.

>> @@ -457,7 +442,7 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
>>       Assert(LWLockHeldByMeInMode(TwoPhaseStateLock, LW_EXCLUSIVE));
>>
>>       Assert(gxact != NULL);
>> -    proc = &ProcGlobal->allProcs[gxact->pgprocno];
>> +    proc = GetPGProcByNumber(gxact->pgprocno);
>>
>>       /* Initialize the PGPROC entry */
>>       MemSet(proc, 0, sizeof(PGPROC));
> 
> This set of changes is independent of this commit, isn't it?

Yes. It's just for symmetry, now that we use GetNumberFromPGProc() to 
get the pgprocno.

>> diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
>> index ab86e802f21..39171fea06b 100644
>> --- a/src/backend/postmaster/auxprocess.c
>> +++ b/src/backend/postmaster/auxprocess.c
>> @@ -107,17 +107,7 @@ AuxiliaryProcessMain(AuxProcType auxtype)
>>
>>       BaseInit();
>>
>> -    /*
>> -     * Assign the ProcSignalSlot for an auxiliary process.  Since it doesn't
>> -     * have a BackendId, the slot is statically allocated based on the
>> -     * auxiliary process type (MyAuxProcType).  Backends use slots indexed in
>> -     * the range from 1 to MaxBackends (inclusive), so we use MaxBackends +
>> -     * AuxProcType + 1 as the index of the slot for an auxiliary process.
>> -     *
>> -     * This will need rethinking if we ever want more than one of a particular
>> -     * auxiliary process type.
>> -     */
>> -    ProcSignalInit(MaxBackends + MyAuxProcType + 1);
>> +    ProcSignalInit();
> 
> Now that we don't need the offset here, we could move ProcSignalInit() into
> BsaeInit() I think?

Hmm, doesn't feel right to me. BaseInit() is mostly concerned with 
setting up backend-private structures, and it's also called for a 
standalone backend.

I feel the process initialization codepaths could use some cleanup in 
general. Not sure what exactly.

>> +/*
>> + * BackendIdGetProc -- get a backend's PGPROC given its backend ID
>> + *
>> + * The result may be out of date arbitrarily quickly, so the caller
>> + * must be careful about how this information is used.  NULL is
>> + * returned if the backend is not active.
>> + */
>> +PGPROC *
>> +BackendIdGetProc(int backendID)
>> +{
>> +    PGPROC       *result;
>> +
>> +    if (backendID < 1 || backendID > ProcGlobal->allProcCount)
>> +        return NULL;
> 
> Hm, doesn't calling BackendIdGetProc() with these values a bug? That's not
> about being out of date or such.

Perhaps. I just followed the example of the old implementation, which 
also returns NULL on bogus inputs.

>> +/*
>> + * BackendIdGetTransactionIds -- get a backend's transaction status
>> + *
>> + * Get the xid, xmin, nsubxid and overflow status of the backend.  The
>> + * result may be out of date arbitrarily quickly, so the caller must be
>> + * careful about how this information is used.
>> + */
>> +void
>> +BackendIdGetTransactionIds(int backendID, TransactionId *xid,
>> +                           TransactionId *xmin, int *nsubxid, bool *overflowed)
>> +{
>> +    PGPROC       *proc;
>> +
>> +    *xid = InvalidTransactionId;
>> +    *xmin = InvalidTransactionId;
>> +    *nsubxid = 0;
>> +    *overflowed = false;
>> +
>> +    if (backendID < 1 || backendID > ProcGlobal->allProcCount)
>> +        return;
>> +    proc = GetPGProcByBackendId(backendID);
>> +
>> +    /* Need to lock out additions/removals of backends */
>> +    LWLockAcquire(ProcArrayLock, LW_SHARED);
>> +
>> +    if (proc->pid != 0)
>> +    {
>> +        *xid = proc->xid;
>> +        *xmin = proc->xmin;
>> +        *nsubxid = proc->subxidStatus.count;
>> +        *overflowed = proc->subxidStatus.overflowed;
>> +    }
>> +
>> +    LWLockRelease(ProcArrayLock);
>> +}
> 
> Hm, I'm not sure about the locking here. For one, previously we weren't
> holding ProcArrayLock. For another, holding ProcArrayLock guarantees that the
> backend doesn't end its transaction, but it can still assign xids etc.  And,
> for that matter, the backendid could have been recycled between the caller
> acquiring the backendId and calling BackendIdGetTransactionIds().

Yeah, the returned values could be out-of-date and even inconsistent 
with each other. I just faithfully copied the old implementation.

Perhaps this should just skip the ProcArrayLock altogether.

>> --- a/src/backend/utils/error/elog.c
>> +++ b/src/backend/utils/error/elog.c
>> @@ -3074,18 +3074,18 @@ log_status_format(StringInfo buf, const char *format, ErrorData *edata)
>>                   break;
>>               case 'v':
>>                   /* keep VXID format in sync with lockfuncs.c */
>> -                if (MyProc != NULL && MyProc->backendId != InvalidBackendId)
>> +                if (MyProc != NULL)
> 
> Doesn't this mean we'll include a vxid in more cases now, particularly
> including aux processes? That might be ok, but I also suspect that it'll never
> have meaningful values...

Fixed. (I thought I changed that back already in the last patch version, 
but apparently I only did it in jsonlog.c)

>>  From 94fd46c9ef30ba5e8ac1a8873fce577a4be425f4 Mon Sep 17 00:00:00 2001
>> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
>> Date: Mon, 29 Jan 2024 22:57:49 +0200
>> Subject: [PATCH v8 3/5] Replace 'lastBackend' with an array of in-use slots
>>
>> Now that the procState array is indexed by pgprocno, the 'lastBackend'
>> optimization is useless, because aux processes are assigned PGPROC
>> slots and hence have numbers higher than max_connection. So
>> 'lastBackend' was always set to almost the end of the array.
>>
>> To replace that optimization, mantain a dense array of in-use
>> indexes. This's redundant with ProgGlobal->procarray, but I was afraid
>> of adding any more contention to ProcArrayLock, and this keeps the
>> code isolated to sinvaladt.c too.
> 
> I think it'd be good to include that explanation and justification in the code
> as well.

Added a comment.


Attached is a new version of these BackendId changes. I kept it as three 
separate patches to highlight the changes from switching to 0-based 
indexing, but I think they should be squashed together before pushing.

I think the last remaining question here is about the 0- vs 1-based 
indexing of BackendIds. Is it a good idea to switch to 0-based indexing? 
And if we do it, should we reserve PGPROC 0. I'm on the fence on this one.

And if we switch to 0-based indexing, should we do a more comprehensive 
search & replace of "pgprocno" to "backendId", or something like that. 
My vote is no, the code churn doesn't feel worth it. And it can also be 
done separately later if we want to.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Вложения

Re: Refactoring backend fork+exec code

От
Andres Freund
Дата:
Hi,

On 2024-02-08 13:19:53 +0200, Heikki Linnakangas wrote:
> > > -    /*
> > > -     * Assign the ProcSignalSlot for an auxiliary process.  Since it doesn't
> > > -     * have a BackendId, the slot is statically allocated based on the
> > > -     * auxiliary process type (MyAuxProcType).  Backends use slots indexed in
> > > -     * the range from 1 to MaxBackends (inclusive), so we use MaxBackends +
> > > -     * AuxProcType + 1 as the index of the slot for an auxiliary process.
> > > -     *
> > > -     * This will need rethinking if we ever want more than one of a particular
> > > -     * auxiliary process type.
> > > -     */
> > > -    ProcSignalInit(MaxBackends + MyAuxProcType + 1);
> > > +    ProcSignalInit();
> >
> > Now that we don't need the offset here, we could move ProcSignalInit() into
> > BsaeInit() I think?
>
> Hmm, doesn't feel right to me. BaseInit() is mostly concerned with setting
> up backend-private structures, and it's also called for a standalone
> backend.

It already initializes a lot of shared subsystems (pgstat, replication slots
and arguable things like the buffer pool, temporary file access and WAL). And
note that it already requires that MyProc is already set (but it's not yet
"added" to the procarray, i.e. doesn't do visibility stuff at that stage).

I don't think that BaseInit() being called by standalone backends really poses
a problem? So is InitPostgres(), which does call ProcSignalInit() in
standalone processes.

My mental model is that BaseInit() is for stuff that's shared between
processes that do attach to databases and those that don't. Right now the
initialization flow is something like this ascii diagram:

standalone:     \
    /-> StartupXLOG() \
 
                 -> InitProcess()      -\                 /-> ProcArrayAdd() -> SharedInvalBackendInit() ->
ProcSignalInit()-                  -> pgstat_beinit() -> attach to db -> pgstat_bestart()
 
normal backend: /                        \               /
                                          -> BaseInit() -
aux process:    InitAuxiliaryProcess()  -/               \--                                             ->
ProcSignalInit()                   -> pgstat_beinit()                 -> pgstat_bestart()
 


The only reason ProcSignalInit() happens kinda late is that historically we
used BackendIds as the index, which were only assigned in
SharedInvalBackendInit() for normal processes. But that doesn't make sense
anymore after your changes.

Similarly, we do pgstat_beinit() quite late, but that's again only because it
uses MyBackendId, which today is only assigned during
SharedInvalBackendInit().  I don't think we can do pgstat_bestart() earlier
though, which is a shame, given the four calls to it inside InitPostgres().


> I feel the process initialization codepaths could use some cleanup in
> general. Not sure what exactly.

Very much agreed.


> > > +/*
> > > + * BackendIdGetProc -- get a backend's PGPROC given its backend ID
> > > + *
> > > + * The result may be out of date arbitrarily quickly, so the caller
> > > + * must be careful about how this information is used.  NULL is
> > > + * returned if the backend is not active.
> > > + */
> > > +PGPROC *
> > > +BackendIdGetProc(int backendID)
> > > +{
> > > +    PGPROC       *result;
> > > +
> > > +    if (backendID < 1 || backendID > ProcGlobal->allProcCount)
> > > +        return NULL;
> >
> > Hm, doesn't calling BackendIdGetProc() with these values a bug? That's not
> > about being out of date or such.
>
> Perhaps. I just followed the example of the old implementation, which also
> returns NULL on bogus inputs.

Fair enough. Makes it harder to not notice bugs, but that's not on this patchset to fix...



> I think the last remaining question here is about the 0- vs 1-based indexing
> of BackendIds. Is it a good idea to switch to 0-based indexing? And if we do
> it, should we reserve PGPROC 0. I'm on the fence on this one.

I lean towards it being a good idea. Having two internal indexing schemes was
bad enough so far, but at least one would fairly quickly notice if one used
the wrong one. If they're just offset by 1, it might end up taking longer,
because that'll often also be a valid id.  But I think you have the author's
prerogative on this one.

If we do so, I think it might be better to standardize on MyProcNumber instead
of MyBackendId. That'll force looking at code where indexing shifts by 1 - and
it also seems more descriptive, as inside postgres it's imo clearer what a
"proc number" is than what a "backend id" is. Particularly because the latter
is also used for things that aren't backends...


The only exception are SQL level users, for those I think it might make sense
to keep the current 1 based indexing, there's just a few functions where we'd
need to translate.



> @@ -791,6 +792,7 @@ ProcArrayEndTransactionInternal(PGPROC *proc, TransactionId latestXid)
>  static void
>  ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid)
>  {
> +    int            pgprocno = GetNumberFromPGProc(proc);
>      PROC_HDR   *procglobal = ProcGlobal;
>      uint32        nextidx;
>      uint32        wakeidx;

This one is the only one where I could see the additional math done in
GetNumberFromPGProc() hurting.  Which is somewhat silly, because the proc
passed in is always MyProc.  In the most unrealistic workload imaginable (many
backends doing nothing but assigning xids and committing, server-side), it
indeed seems to make a tiny difference. But not enough to worry about, I think.

FWIW, if I use GetNumberFromPGProc(MyProc) instead of MyProcNumber in
LWLockQueueSelf(), that does show up a bit more noticeable.


>  void
> -ProcSignalInit(int pss_idx)
> +ProcSignalInit(void)
>  {
>      ProcSignalSlot *slot;
>      uint64        barrier_generation;
>
> -    Assert(pss_idx >= 1 && pss_idx <= NumProcSignalSlots);
> -
> -    slot = &ProcSignal->psh_slot[pss_idx - 1];
> +    if (MyBackendId <= 0)
> +        elog(ERROR, "MyBackendId not set");
> +    if (MyBackendId > NumProcSignalSlots)
> +        elog(ERROR, "unexpected MyBackendId %d in ProcSignalInit (max %d)", MyBackendId, NumProcSignalSlots);
> +    slot = &ProcSignal->psh_slot[MyBackendId - 1];
>
>      /* sanity check */
>      if (slot->pss_pid != 0)
>          elog(LOG, "process %d taking over ProcSignal slot %d, but it's not empty",
> -             MyProcPid, pss_idx);
> +             MyProcPid, (int) (slot - ProcSignal->psh_slot));

Hm, why not use MyBackendId - 1 as above? Am I missing something?


>  /*
> @@ -212,11 +211,7 @@ ProcSignalInit(int pss_idx)
>  static void
>  CleanupProcSignalState(int status, Datum arg)
>  {
> -    int            pss_idx = DatumGetInt32(arg);
> -    ProcSignalSlot *slot;
> -
> -    slot = &ProcSignal->psh_slot[pss_idx - 1];
> -    Assert(slot == MyProcSignalSlot);
> +    ProcSignalSlot *slot = MyProcSignalSlot;

Maybe worth asserting that MyProcSignalSlot isn't NULL? Previously that was
checked via the assertion above.


> +            if (i != segP->numProcs - 1)
> +                segP->pgprocnos[i] = segP->pgprocnos[segP->numProcs - 1];
> +            break;

Hm. This means the list will be out-of-order more and more over time, leading
to less cache efficient access patterns. Perhaps we should keep this sorted,
like we do for ProcGlobal->xids etc?


> @@ -148,19 +148,11 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
>      PGPROC       *proc;
>      BackendId    backendId = InvalidBackendId;
>
> -    proc = BackendPidGetProc(pid);
> -
>      /*
>       * See if the process with given pid is a backend or an auxiliary process.
> -     *
> -     * If the given process is a backend, use its backend id in
> -     * SendProcSignal() later to speed up the operation. Otherwise, don't do
> -     * that because auxiliary processes (except the startup process) don't
> -     * have a valid backend id.
>       */
> -    if (proc != NULL)
> -        backendId = proc->backendId;
> -    else
> +    proc = BackendPidGetProc(pid);
> +    if (proc == NULL)
>          proc = AuxiliaryPidGetProc(pid);
>
>      /*
> @@ -183,6 +175,8 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
>          PG_RETURN_BOOL(false);
>      }
>
> +    if (proc != NULL)
> +        backendId = GetBackendIdFromPGProc(proc);

How can proc be NULL here?



Greetings,

Andres Freund



Re: Refactoring backend fork+exec code

От
Robert Haas
Дата:
On Thu, Feb 15, 2024 at 3:07 AM Andres Freund <andres@anarazel.de> wrote:
> > I think the last remaining question here is about the 0- vs 1-based indexing
> > of BackendIds. Is it a good idea to switch to 0-based indexing? And if we do
> > it, should we reserve PGPROC 0. I'm on the fence on this one.
>
> I lean towards it being a good idea. Having two internal indexing schemes was
> bad enough so far, but at least one would fairly quickly notice if one used
> the wrong one. If they're just offset by 1, it might end up taking longer,
> because that'll often also be a valid id.

Yeah, I think making everything 0-based is probably the best way
forward long term. It might require more cleanup work to get there,
but it's just a lot simpler in the end, IMHO.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Refactoring backend fork+exec code

От
Heikki Linnakangas
Дата:
On 15/02/2024 07:09, Robert Haas wrote:
> On Thu, Feb 15, 2024 at 3:07 AM Andres Freund <andres@anarazel.de> wrote:
>>> I think the last remaining question here is about the 0- vs 1-based indexing
>>> of BackendIds. Is it a good idea to switch to 0-based indexing? And if we do
>>> it, should we reserve PGPROC 0. I'm on the fence on this one.
>>
>> I lean towards it being a good idea. Having two internal indexing schemes was
>> bad enough so far, but at least one would fairly quickly notice if one used
>> the wrong one. If they're just offset by 1, it might end up taking longer,
>> because that'll often also be a valid id.
> 
> Yeah, I think making everything 0-based is probably the best way
> forward long term. It might require more cleanup work to get there,
> but it's just a lot simpler in the end, IMHO.

Here's another patch version that does that. Yeah, I agree it's nicer in 
the end.

I'm pretty happy with this now. I'll read through these patches myself 
again after sleeping over it and try to get this committed by the end of 
the week, but another pair of eyes wouldn't hurt.

On 14/02/2024 23:37, Andres Freund wrote:
>>   void
>> -ProcSignalInit(int pss_idx)
>> +ProcSignalInit(void)
>>   {
>>       ProcSignalSlot *slot;
>>       uint64        barrier_generation;
>>
>> -    Assert(pss_idx >= 1 && pss_idx <= NumProcSignalSlots);
>> -
>> -    slot = &ProcSignal->psh_slot[pss_idx - 1];
>> +    if (MyBackendId <= 0)
>> +        elog(ERROR, "MyBackendId not set");
>> +    if (MyBackendId > NumProcSignalSlots)
>> +        elog(ERROR, "unexpected MyBackendId %d in ProcSignalInit (max %d)", MyBackendId, NumProcSignalSlots);
>> +    slot = &ProcSignal->psh_slot[MyBackendId - 1];
>>
>>       /* sanity check */
>>       if (slot->pss_pid != 0)
>>           elog(LOG, "process %d taking over ProcSignal slot %d, but it's not empty",
>> -             MyProcPid, pss_idx);
>> +             MyProcPid, (int) (slot - ProcSignal->psh_slot));
> 
> Hm, why not use MyBackendId - 1 as above? Am I missing something?

You're right, fixed.

>>   /*
>> @@ -212,11 +211,7 @@ ProcSignalInit(int pss_idx)
>>   static void
>>   CleanupProcSignalState(int status, Datum arg)
>>   {
>> -    int            pss_idx = DatumGetInt32(arg);
>> -    ProcSignalSlot *slot;
>> -
>> -    slot = &ProcSignal->psh_slot[pss_idx - 1];
>> -    Assert(slot == MyProcSignalSlot);
>> +    ProcSignalSlot *slot = MyProcSignalSlot;
> 
> Maybe worth asserting that MyProcSignalSlot isn't NULL? Previously that was
> checked via the assertion above.

Added.

>> +            if (i != segP->numProcs - 1)
>> +                segP->pgprocnos[i] = segP->pgprocnos[segP->numProcs - 1];
>> +            break;
> 
> Hm. This means the list will be out-of-order more and more over time, leading
> to less cache efficient access patterns. Perhaps we should keep this sorted,
> like we do for ProcGlobal->xids etc?

Perhaps, although these are accessed much less frequently so I'm not 
convinced it's worth the trouble.

I haven't found a good performance test case that where the shared cache 
invalidation would show up. Would you happen to have one?

>> @@ -183,6 +175,8 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
>>           PG_RETURN_BOOL(false);
>>       }
>>
>> +    if (proc != NULL)
>> +        backendId = GetBackendIdFromPGProc(proc);
> 
> How can proc be NULL here?

Fixed.

-- 
Heikki Linnakangas
Neon (https://neon.tech)
Вложения

Re: Refactoring backend fork+exec code

От
Heikki Linnakangas
Дата:
On 22/02/2024 02:37, Heikki Linnakangas wrote:
> On 15/02/2024 07:09, Robert Haas wrote:
>> On Thu, Feb 15, 2024 at 3:07 AM Andres Freund <andres@anarazel.de> wrote:
>>>> I think the last remaining question here is about the 0- vs 1-based indexing
>>>> of BackendIds. Is it a good idea to switch to 0-based indexing? And if we do
>>>> it, should we reserve PGPROC 0. I'm on the fence on this one.
>>>
>>> I lean towards it being a good idea. Having two internal indexing schemes was
>>> bad enough so far, but at least one would fairly quickly notice if one used
>>> the wrong one. If they're just offset by 1, it might end up taking longer,
>>> because that'll often also be a valid id.
>>
>> Yeah, I think making everything 0-based is probably the best way
>> forward long term. It might require more cleanup work to get there,
>> but it's just a lot simpler in the end, IMHO.
> 
> Here's another patch version that does that. Yeah, I agree it's nicer in
> the end.
> 
> I'm pretty happy with this now. I'll read through these patches myself
> again after sleeping over it and try to get this committed by the end of
> the week, but another pair of eyes wouldn't hurt.

And pushed. Thanks for the reviews!

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Refactoring backend fork+exec code

От
Heikki Linnakangas
Дата:
I've now completed many of the side-quests, here are the patches that 
remain.

The first three patches form a logical unit. They move the 
initialization of the Port struct from postmaster to the backend 
process. Currently, that work is split between the postmaster and the 
backend process so that postmaster fills in the socket and some other 
fields, and the backend process fills the rest after reading the startup 
packet. With these patches, there is a new much smaller ClientSocket 
struct that is passed from the postmaster to the child process, which 
contains just the fields that postmaster initializes. The Port struct is 
allocated in the child process. That makes the backend startup easier to 
understand. I plan to commit those three patches next if there are no 
objections.

That leaves the rest of the patches. I think they're in pretty good 
shape too, and I've gotten some review on those earlier and have 
addressed the comments I got so far, but would still appreciate another 
round of review.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Вложения

Re: Refactoring backend fork+exec code

От
Richard Guo
Дата:

On Mon, Mar 4, 2024 at 1:40 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 22/02/2024 02:37, Heikki Linnakangas wrote:
> Here's another patch version that does that. Yeah, I agree it's nicer in
> the end.
>
> I'm pretty happy with this now. I'll read through these patches myself
> again after sleeping over it and try to get this committed by the end of
> the week, but another pair of eyes wouldn't hurt.

And pushed. Thanks for the reviews!

I noticed that there are still three places in backend_status.c where
pgstat_get_beentry_by_backend_id() is referenced.  I think we should
replace them with pgstat_get_beentry_by_proc_number().

Thanks
Richard

Re: Refactoring backend fork+exec code

От
Heikki Linnakangas
Дата:
On 05/03/2024 11:44, Richard Guo wrote:
> I noticed that there are still three places in backend_status.c where
> pgstat_get_beentry_by_backend_id() is referenced.  I think we should
> replace them with pgstat_get_beentry_by_proc_number().

Fixed, thanks!

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Refactoring backend fork+exec code

От
"Tristan Partin"
Дата:
On Mon Mar 4, 2024 at 3:05 AM CST, Heikki Linnakangas wrote:
> I've now completed many of the side-quests, here are the patches that
> remain.
>
> The first three patches form a logical unit. They move the
> initialization of the Port struct from postmaster to the backend
> process. Currently, that work is split between the postmaster and the
> backend process so that postmaster fills in the socket and some other
> fields, and the backend process fills the rest after reading the startup
> packet. With these patches, there is a new much smaller ClientSocket
> struct that is passed from the postmaster to the child process, which
> contains just the fields that postmaster initializes. The Port struct is
> allocated in the child process. That makes the backend startup easier to
> understand. I plan to commit those three patches next if there are no
> objections.
>
> That leaves the rest of the patches. I think they're in pretty good
> shape too, and I've gotten some review on those earlier and have
> addressed the comments I got so far, but would still appreciate another
> round of review.

> -         * *MyProcPort, because ConnCreate() allocated that space with malloc()
> -         * ... else we'd need to copy the Port data first.  Also, subsidiary data
> -         * such as the username isn't lost either; see ProcessStartupPacket().
> +         * *MyProcPort, because that space is allocated in stack ... else we'd
> +         * need to copy the Port data first.  Also, subsidiary data such as the
> +         * username isn't lost either; see ProcessStartupPacket().

s/allocated in/allocated on the

The first 3 patches seem good to go, in my opinion.

> @@ -225,14 +331,13 @@ internal_forkexec(int argc, char *argv[], ClientSocket *client_sock, BackgroundW
>                  return -1;
>          }
>
> -        /* Make sure caller set up argv properly */
> -        Assert(argc >= 3);
> -        Assert(argv[argc] == NULL);
> -        Assert(strncmp(argv[1], "--fork", 6) == 0);
> -        Assert(argv[2] == NULL);
> -
> -        /* Insert temp file name after --fork argument */
> +        /* set up argv properly */
> +        argv[0] = "postgres";
> +        snprintf(forkav, MAXPGPATH, "--forkchild=%s", child_kind);
> +        argv[1] = forkav;
> +        /* Insert temp file name after --forkchild argument */
>          argv[2] = tmpfilename;
> +        argv[3] = NULL;

Should we use postgres_exec_path instead of the naked "postgres" here?

> +                /* in postmaster, fork failed ... */
> +                ereport(LOG,
> +                                (errmsg("could not fork worker process: %m")));
> +                /* undo what assign_backendlist_entry did */
> +                ReleasePostmasterChildSlot(rw->rw_child_slot);
> +                rw->rw_child_slot = 0;
> +                pfree(rw->rw_backend);
> +                rw->rw_backend = NULL;
> +                /* mark entry as crashed, so we'll try again later */
> +                rw->rw_crashed_at = GetCurrentTimestamp();
> +                return false;

I think the error message should include the word "background." It would
be more consistent with the log message above it.

> +typedef struct
> +{
> +        int                        syslogFile;
> +        int                        csvlogFile;
> +        int                        jsonlogFile;
> +} syslogger_startup_data;

It would be nice if all of these startup data structs were named
similarly. For instance, a previous one was BackendStartupInfo. It would
help with greppability.

I noticed there were a few XXX comments left that you created. I'll
highlight them here for more visibility.

> +/* XXX: where does this belong? */
> +extern bool LoadedSSL;

Perhaps near the My* variables or maybe in the Port struct?

> +#ifdef EXEC_BACKEND
> +
> +        /*
> +         * Need to reinitialize the SSL library in the backend, since the context
> +         * structures contain function pointers and cannot be passed through the
> +         * parameter file.
> +         *
> +         * If for some reason reload fails (maybe the user installed broken key
> +         * files), soldier on without SSL; that's better than all connections
> +         * becoming impossible.
> +         *
> +         * XXX should we do this in all child processes?  For the moment it's
> +         * enough to do it in backend children. XXX good question indeed
> +         */
> +#ifdef USE_SSL
> +        if (EnableSSL)
> +        {
> +                if (secure_initialize(false) == 0)
> +                        LoadedSSL = true;
> +                else
> +                        ereport(LOG,
> +                                        (errmsg("SSL configuration could not be loaded in child process")));
> +        }
> +#endif
> +#endif

Here you added the "good question indeed." I am not sure what the best
answer is either! :)

> +                /* XXX: translation? */
> +                ereport(LOG,
> +                                (errmsg("could not fork %s process: %m", PostmasterChildName(type))));

I assume you are referring to the child name here?

> XXX: We now have functions called AuxiliaryProcessInit() and
> InitAuxiliaryProcess(). Confusing.

Based on my analysis, the *Init() is called in the Main functions, while
Init*() is called before the Main functions. Maybe
AuxiliaryProcessInit() could be renamed to AuxiliaryProcessStartup()?
Rename the other to AuxiliaryProcessInit().

--
Tristan Partin
Neon (https://neon.tech)



Re: Refactoring backend fork+exec code

От
Heikki Linnakangas
Дата:
On 06/03/2024 01:02, Tristan Partin wrote:
> The first 3 patches seem good to go, in my opinion.

Committed these first patches, with a few more changes. Notably, I 
realized that we should move the logic that I originally put in the new 
InitClientConnection function to the existing pq_init() function. It 
servers the same purpose, initialization of the socket in the child 
process. Thanks for the review!

>> @@ -225,14 +331,13 @@ internal_forkexec(int argc, char *argv[], ClientSocket *client_sock, BackgroundW
>>                   return -1;
>>           }
>>
>> -        /* Make sure caller set up argv properly */
>> -        Assert(argc >= 3);
>> -        Assert(argv[argc] == NULL);
>> -        Assert(strncmp(argv[1], "--fork", 6) == 0);
>> -        Assert(argv[2] == NULL);
>> -
>> -        /* Insert temp file name after --fork argument */
>> +        /* set up argv properly */
>> +        argv[0] = "postgres";
>> +        snprintf(forkav, MAXPGPATH, "--forkchild=%s", child_kind);
>> +        argv[1] = forkav;
>> +        /* Insert temp file name after --forkchild argument */
>>           argv[2] = tmpfilename;
>> +        argv[3] = NULL;
> 
> Should we use postgres_exec_path instead of the naked "postgres" here?

I don't know, but it's the same as on 'master' currently. The code just 
got moved around.

>> +                /* in postmaster, fork failed ... */
>> +                ereport(LOG,
>> +                                (errmsg("could not fork worker process: %m")));
>> +                /* undo what assign_backendlist_entry did */
>> +                ReleasePostmasterChildSlot(rw->rw_child_slot);
>> +                rw->rw_child_slot = 0;
>> +                pfree(rw->rw_backend);
>> +                rw->rw_backend = NULL;
>> +                /* mark entry as crashed, so we'll try again later */
>> +                rw->rw_crashed_at = GetCurrentTimestamp();
>> +                return false;
> 
> I think the error message should include the word "background." It would
> be more consistent with the log message above it.

This is also a pre-existing message I just moved around. But yeah, I 
agree, so changed.

>> +typedef struct
>> +{
>> +        int                        syslogFile;
>> +        int                        csvlogFile;
>> +        int                        jsonlogFile;
>> +} syslogger_startup_data;
> 
> It would be nice if all of these startup data structs were named
> similarly. For instance, a previous one was BackendStartupInfo. It would
> help with greppability.

Renamed them to SysloggerStartupData and BackendStartupData. Background 
worker startup still passes a struct called BackgroundWorker, however. I 
left that as it is, because the struct is used for other purposes too.

> I noticed there were a few XXX comments left that you created. I'll
> highlight them here for more visibility.
> 
>> +/* XXX: where does this belong? */
>> +extern bool LoadedSSL;
> 
> Perhaps near the My* variables or maybe in the Port struct?

It is valid in the postmaster, too, though. The My* variables and Port 
struct only make sense in the child process.

I think this is the best place after all, so I just removed the XXX comment.

>> +#ifdef EXEC_BACKEND
>> +
>> +        /*
>> +         * Need to reinitialize the SSL library in the backend, since the context
>> +         * structures contain function pointers and cannot be passed through the
>> +         * parameter file.
>> +         *
>> +         * If for some reason reload fails (maybe the user installed broken key
>> +         * files), soldier on without SSL; that's better than all connections
>> +         * becoming impossible.
>> +         *
>> +         * XXX should we do this in all child processes?  For the moment it's
>> +         * enough to do it in backend children. XXX good question indeed
>> +         */
>> +#ifdef USE_SSL
>> +        if (EnableSSL)
>> +        {
>> +                if (secure_initialize(false) == 0)
>> +                        LoadedSSL = true;
>> +                else
>> +                        ereport(LOG,
>> +                                        (errmsg("SSL configuration could not be loaded in child process")));
>> +        }
>> +#endif
>> +#endif
> 
> Here you added the "good question indeed." I am not sure what the best
> answer is either! :)

I just removed the extra XXX comment. It's still a valid question, but 
this patch just moves it around, we don't need to answer it here.

>> +                /* XXX: translation? */
>> +                ereport(LOG,
>> +                                (errmsg("could not fork %s process: %m", PostmasterChildName(type))));
> 
> I assume you are referring to the child name here?

Correct. Does the process name need to be translated? And this way of 
constructing sentences is not translation-friendly anyway. In some 
languages, the word 'process' might need to be inflected differently 
depending on the child name, for example.

I put the process name in quotes, and didn't mark the process name for 
translation.

>> XXX: We now have functions called AuxiliaryProcessInit() and
>> InitAuxiliaryProcess(). Confusing.
> 
> Based on my analysis, the *Init() is called in the Main functions, while
> Init*() is called before the Main functions. Maybe
> AuxiliaryProcessInit() could be renamed to AuxiliaryProcessStartup()?
> Rename the other to AuxiliaryProcessInit().

Hmm. There's also BackendStartup() function in postmaster.c, which is 
very different: it runs in the postmaster process and launches the 
backend process. So the Startup suffix is not great either.

I renamed AuxiliaryProcessInit() to AuxiliaryProcessMainCommon(). As in 
"the common parts of the main functions of all the aux processes".

(We should perhaps merge InitProcess() and InitAuxiliaryProcess() into 
one function. There's a lot of duplicated code between them. And the 
parts that differ should perhaps be refactored to be more similar 
anyway. I don't want to take on that refactoring right now though.)

Attached is a new version of the remaining patches.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Вложения

Re: Refactoring backend fork+exec code

От
Heikki Linnakangas
Дата:
On 13/03/2024 09:30, Heikki Linnakangas wrote:
> Attached is a new version of the remaining patches.

Committed, with some final cosmetic cleanups. Thanks everyone!

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Refactoring backend fork+exec code

От
Tom Lane
Дата:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> Committed, with some final cosmetic cleanups. Thanks everyone!

A couple of buildfarm animals don't like these tests:

    Assert(child_type >= 0 && child_type < lengthof(child_process_kinds));

for example

 ayu           | 2024-03-19 13:08:05 | launch_backend.c:211:39: warning: comparison of constant 16 with expression of
type'BackendType' (aka 'enum BackendType') is always true [-Wtautological-constant-out-of-range-compare] 
 ayu           | 2024-03-19 13:08:05 | launch_backend.c:233:39: warning: comparison of constant 16 with expression of
type'BackendType' (aka 'enum BackendType') is always true [-Wtautological-constant-out-of-range-compare] 

I'm not real sure why it's moaning about the "<" check but not the
">= 0" check, which ought to be equally tautological given the
assumption that the input is a valid member of the enum.  But
in any case, exactly how much value do these assertions carry?
If you're intent on keeping them, perhaps casting child_type to
int here would suppress the warnings.  But personally I think
I'd lose the Asserts.

            regards, tom lane



Re: Refactoring backend fork+exec code

От
Heikki Linnakangas
Дата:
On 20/03/2024 07:37, Tom Lane wrote:
> A couple of buildfarm animals don't like these tests:
> 
>     Assert(child_type >= 0 && child_type < lengthof(child_process_kinds));
> 
> for example
> 
>   ayu           | 2024-03-19 13:08:05 | launch_backend.c:211:39: warning: comparison of constant 16 with expression
oftype 'BackendType' (aka 'enum BackendType') is always true [-Wtautological-constant-out-of-range-compare]
 
>   ayu           | 2024-03-19 13:08:05 | launch_backend.c:233:39: warning: comparison of constant 16 with expression
oftype 'BackendType' (aka 'enum BackendType') is always true [-Wtautological-constant-out-of-range-compare]
 
> 
> I'm not real sure why it's moaning about the "<" check but not the
> ">= 0" check, which ought to be equally tautological given the
> assumption that the input is a valid member of the enum.  But
> in any case, exactly how much value do these assertions carry?
> If you're intent on keeping them, perhaps casting child_type to
> int here would suppress the warnings.  But personally I think
> I'd lose the Asserts.

Yeah, it's not a very valuable assertion. Removed, thanks!

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Refactoring backend fork+exec code

От
Jelte Fennema-Nio
Дата:
On Wed, 20 Mar 2024 at 08:16, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Yeah, it's not a very valuable assertion. Removed, thanks!

How about we add it as a static assert instead of removing it, like we
have for many other similar arrays.

Вложения

Re: Refactoring backend fork+exec code

От
"Anton A. Melnikov"
Дата:
Hello!

Maybe add PGDLLIMPORT to
extern bool LoadedSSL;
and
extern struct ClientSocket *MyClientSocket;
definitions in the src/include/postmaster/postmaster.h ?

With the best regards,

-- 
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Refactoring backend fork+exec code

От
Heikki Linnakangas
Дата:
On 27/04/2024 11:27, Anton A. Melnikov wrote:
> Hello!
> 
> Maybe add PGDLLIMPORT to
> extern bool LoadedSSL;
> and
> extern struct ClientSocket *MyClientSocket;
> definitions in the src/include/postmaster/postmaster.h ?
Peter E noticed and Michael fixed them in commit 768ceeeaa1 already.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Refactoring backend fork+exec code

От
"Anton A. Melnikov"
Дата:
On 28.04.2024 22:36, Heikki Linnakangas wrote:
> Peter E noticed and Michael fixed them in commit 768ceeeaa1 already.

Didn't check that is already fixed in the current master. Sorry!
Thanks for pointing this out!

With the best wishes,

-- 
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Refactoring backend fork+exec code

От
Thomas Munro
Дата:
On Mon, Mar 18, 2024 at 10:41 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Committed, with some final cosmetic cleanups. Thanks everyone!

Nitpicking from UBSan with EXEC_BACKEND on Linux (line numbers may be
a bit off, from a branch of mine):

../src/backend/postmaster/launch_backend.c:772:2: runtime error: null
pointer passed as argument 2, which is declared to never be null
==13303==Using libbacktrace symbolizer.
    #0 0x5555564b0202 in save_backend_variables
../src/backend/postmaster/launch_backend.c:772
    #1 0x5555564b0242 in internal_forkexec
../src/backend/postmaster/launch_backend.c:311
    #2 0x5555564b0bdd in postmaster_child_launch
../src/backend/postmaster/launch_backend.c:244
    #3 0x5555564b3121 in StartChildProcess
../src/backend/postmaster/postmaster.c:3928
    #4 0x5555564b933a in PostmasterMain
../src/backend/postmaster/postmaster.c:1357
    #5 0x5555562de4ad in main ../src/backend/main/main.c:197
    #6 0x7ffff667ad09 in __libc_start_main
(/lib/x86_64-linux-gnu/libc.so.6+0x23d09)
    #7 0x555555e34279 in _start
(/tmp/cirrus-ci-build/build/tmp_install/usr/local/pgsql/bin/postgres+0x8e0279)

This silences it:

-       memcpy(param->startup_data, startup_data, startup_data_len);
+       if (startup_data_len > 0)
+               memcpy(param->startup_data, startup_data, startup_data_len);

(I found that out by testing EXEC_BACKEND on CI.  I also learned that
the Mac and FreeBSD tasks fail with EXEC_BACKEND because of SysV shmem
bleating.  We probably should go and crank up the relevant sysctls in
the .cirrus.tasks.yml...)