Обсуждение: Re: Reorder shutdown sequence, to flush pgstats later

Поиск

Список

Период

Сортировка

Re: Reorder shutdown sequence, to flush pgstats later

От

Heikki Linnakangas

Дата:

08 января, 16:10:03

On 08/01/2025 04:10, Andres Freund wrote:
> I instead opted to somewhat rearrange the shutdown sequence:
> 
>      This commit changes the shutdown sequence so checkpointer is signalled to
>      trigger writing the shutdown checkpoint without terminating it. Once
>      checkpointer wrote the checkpoint it will wait for a termination
>      signal.
> 
>      Postmaster now triggers the shutdown checkpoint where we previously did so by
>      terminating checkpointer. Checkpointer is terminated after all other children
>      have been terminated, tracked using the new PM_SHUTDOWN_CHECKPOINTER PMState.
> 
>      In addition, the existing PM_SHUTDOWN_2 state is renamed to
>      PM_XLOG_IS_SHUTDOWN, that seems a lot more descriptive.

Sounds good.

> This patch is not fully polished, there are a few details I'm not sure about:
> 
> - Previously the shutdown checkpoint and process exit were triggered from
>    HandleCheckpointerInterrupts().  I could have continued doing so in the
>    patch, but it seemed quite weird to have a wait loop in a function called
>    HandleCheckpointerInterrupts().
> 
>    Instead I made the main loop in CheckpointerMain() terminate if checkpointer
>    was asked to write the shutdown checkpoint or exit
> 
> - In process_pm_pmsignal() the code now triggers a PostmasterStateMachine() if
>    checkpointer signalled ShutdownXLOG having completed. We didn't really have
>    a need for that before. process_pm_child_exit() does call
>    PostmasterStateMachine(), but somehow calling it in process_pm_pmsignal()
>    feels slightly off

Looks good to me.

> - I don't love the naming of the various PMState values (existing and new),
>    but a larger renaming should probably be done separately?

We currently have PM_WAIT_BACKENDS and PM_WAIT_DEAD_END. Maybe follow 
that example and name all the shutdown states after what you're waiting for:

     PM_WAIT_BACKENDS, /* waiting for live backends to exit */
     PM_WAIT_XLOG_SHUTDOWN, /* waiting for checkpointer to do shutdown 
ckpt */
     PM_WAIT_XLOG_ARCHIVAL, /* waiting for archiver and walsenders to
                               finish */
     PM_WAIT_DEAD_END,      /* waiting for dead-end children to exit */
     PM_WAIT_CHECKPOINTER,  /* waiting for checkpointer to shut down */
     PM_NO_CHILDREN,        /* all important children have exited */

> - There's logging in ShutdownXLOG() that's only triggered if called from
>    checkpointer. Seems kinda odd - shouldn't we just move that to
>    checkpointer.c?

What logging do you mean? The "LOG: shutting down" message? No 
objections to moving it, although it doesn't bother me either way.

> The first two patches are just logging improvements that I found helpful to
> write this patch:
> 
> 0001: Instead of modifying pmState directly, do so via a new function
>        UpdatePMState(), which logs the state transition at DEBUG2.
> 
>        Requires a helper to stringify PMState.
> 
>        I've written one-off versions of this patch quite a few times. Without
>        knowing in what state postmaster is in, it's quite hard to debug the
>        state machine.

+1

>     elog(DEBUG1, "updating PMState from %d/%s to %d/%s",
>          pmState, pmstate_name(pmState), newState, pmstate_name(newState));

I think the state names are enough, no need to print the numerical values.

> 0002: Make logging of postmaster signalling child processes more consistent
> 
>        I found it somewhat hard to understand what's happening during state
>        changes without being able to see the signals being sent. While we did
>        have logging in SignalChildren(), we didn't in signal_child(), and most
>        signals that are important for the shutdown sequence are sent via
>        signal_child().

+1

I don't think the cases where DEBUG2 or DEBUG4 are used currently are 
very well thought out. To save a few lines of code, I'd be happy with 
merging signal_child_ext() and signal_child() and just always use DEBUG2 
or DEBUG4. Or DEBUG3 as a compromise :-).

> The next is a small prerequisite:
> 
> 0003: postmaster: Introduce variadic btmask_all_except()
> 
>        I proposed this in another thread, where I also needed
>        btmask_all_except3() but then removed the only call to
>        btmask_all_except2(). Instead of adding/removing code, it seems better
>        to just use a variadic function.

Nice. A variadic btmask_add() might be handy too.

> And then 0004, the reason for this thread.

Overall, looks good to me.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Reorder shutdown sequence, to flush pgstats later

От

Bertrand Drouvot

Дата:

08 января, 17:48:21

Hi,

On Wed, Jan 08, 2025 at 03:10:03PM +0200, Heikki Linnakangas wrote:
> On 08/01/2025 04:10, Andres Freund wrote:
> 
> >     elog(DEBUG1, "updating PMState from %d/%s to %d/%s",
> >          pmState, pmstate_name(pmState), newState, pmstate_name(newState));
> 
> I think the state names are enough, no need to print the numerical values.

+1

> > 0002: Make logging of postmaster signalling child processes more consistent
> > 
> >        I found it somewhat hard to understand what's happening during state
> >        changes without being able to see the signals being sent. While we did
> >        have logging in SignalChildren(), we didn't in signal_child(), and most
> >        signals that are important for the shutdown sequence are sent via
> >        signal_child().
> 
> +1

Random comments:

=== 1

+static const char *
+pm_signame(int signal)
+{
+#define PM_CASE(state) case state: return #state
+       switch (signal)

What about?
"
#define PM_SIGNAL_CASE(sig) case sig: return #sig
"

to better reflect its context.

=== 2

+               default:
+                       /* all signals sent by postmaster should be listed here */
+                       Assert(false);
+                       return "(unknown)";
+       }
+#undef PM_CASE
+       pg_unreachable();

Shouldn't we remove the "return "(unknown)"? (if not the pg_unreachable() looks
like dead code).

> I don't think the cases where DEBUG2 or DEBUG4 are used currently are very
> well thought out. To save a few lines of code, I'd be happy with merging
> signal_child_ext() and signal_child()

+1

> > The next is a small prerequisite:
> > 
> > 0003: postmaster: Introduce variadic btmask_all_except()
> > 
> >        I proposed this in another thread, where I also needed
> >        btmask_all_except3() but then removed the only call to
> >        btmask_all_except2(). Instead of adding/removing code, it seems better
> >        to just use a variadic function.

LGTM.

> Nice. A variadic btmask_add() might be handy too.
> 
> > And then 0004, the reason for this thread.

Did not give it a detailled review, but IIUC it now provides this flow:

PM_SHUTDOWN->PM_XLOG_IS_SHUTDOWN->PM_WAIT_DEAD_END->PM_SHUTDOWN_CHECKPOINTER->PM_NO_CHILDREN

and that looks good to me to fix the issue.

A few random comments:

=== 3

+ postmaster.c will only
+ * signal checkpointer to exit after all processes that could emit stats
+ * have been shut down.

s/postmaster.c/PostmasterStateMachine()/?

=== 4

+ * too. That allows checkpointer to perform some last bits of of

Typo "of of"

I'll give 0004 a closer look once it leaves the WIP state but want to share that
it looks like a good way to fix the issue.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Reorder shutdown sequence, to flush pgstats later

От

Andres Freund

Дата:

08 января, 22:32:24

Hi,

On 2025-01-08 14:48:21 +0000, Bertrand Drouvot wrote:
> On Wed, Jan 08, 2025 at 03:10:03PM +0200, Heikki Linnakangas wrote:
> > On 08/01/2025 04:10, Andres Freund wrote:
> > > 0002: Make logging of postmaster signalling child processes more consistent
> > >
> > >        I found it somewhat hard to understand what's happening during state
> > >        changes without being able to see the signals being sent. While we did
> > >        have logging in SignalChildren(), we didn't in signal_child(), and most
> > >        signals that are important for the shutdown sequence are sent via
> > >        signal_child().
> >
> > +1
>
> Random comments:
>
> === 1
>
> +static const char *
> +pm_signame(int signal)
> +{
> +#define PM_CASE(state) case state: return #state
> +       switch (signal)
>
> What about?
> "
> #define PM_SIGNAL_CASE(sig) case sig: return #sig
> "
> to better reflect its context.

I went for PM_TOSTR_CASE - that way the same name can be used in
pmstate_name().


> === 2
>
> +               default:
> +                       /* all signals sent by postmaster should be listed here */
> +                       Assert(false);
> +                       return "(unknown)";
> +       }
> +#undef PM_CASE
> +       pg_unreachable();
>
> Shouldn't we remove the "return "(unknown)"? (if not the pg_unreachable() looks
> like dead code).

I don't think so - we're not listing all signals here, just ones that
postmaster is currently using. They're also typically not defined in an enum
allowing the compiler to warn about uncovered values. It seemed better to
actually return something in a production build rather than aborting.

Some compilers tend to be pretty annoying about missing return values, that's
why I added the pg_unreachable(). Perhaps I should have done a
return "" /* silence compilers */;

or such.



> > Nice. A variadic btmask_add() might be handy too.
> >
> > > And then 0004, the reason for this thread.
>
> Did not give it a detailled review, but IIUC it now provides this flow:
>
> PM_SHUTDOWN->PM_XLOG_IS_SHUTDOWN->PM_WAIT_DEAD_END->PM_SHUTDOWN_CHECKPOINTER->PM_NO_CHILDREN
>
> and that looks good to me to fix the issue.
>
> A few random comments:
>
> === 3
>
> + postmaster.c will only
> + * signal checkpointer to exit after all processes that could emit stats
> + * have been shut down.
>
> s/postmaster.c/PostmasterStateMachine()/?

I just went for 'postmaster' (without the .c) - I don't think it's really
useful to specifically reference s/postmaster.c/PostmasterStateMachine() in
checkpointer.c. But I could be swayed if you feel strongly.

> === 4
>
> + * too. That allows checkpointer to perform some last bits of of
>
> Typo "of of"

Fixed.


> I'll give 0004 a closer look once it leaves the WIP state but want to share that
> it looks like a good way to fix the issue.

Cool.

Thanks for the review,

Andres Freund

Re: Reorder shutdown sequence, to flush pgstats later

От

Bertrand Drouvot

Дата:

09 января, 11:05:26

Hi,

On Wed, Jan 08, 2025 at 02:32:24PM -0500, Andres Freund wrote:
> On 2025-01-08 14:48:21 +0000, Bertrand Drouvot wrote:
> > === 2
> >
> > +               default:
> > +                       /* all signals sent by postmaster should be listed here */
> > +                       Assert(false);
> > +                       return "(unknown)";
> > +       }
> > +#undef PM_CASE
> > +       pg_unreachable();
> >
> > Shouldn't we remove the "return "(unknown)"? (if not the pg_unreachable() looks
> > like dead code).
> 
> I don't think so - we're not listing all signals here, just ones that
> postmaster is currently using. They're also typically not defined in an enum
> allowing the compiler to warn about uncovered values.

Oh right, the parameter is an "int" not an "enum" (and anyway, as you said, we're
not listing all the signals) (did not pay attention to that). So we may need to
"help" some compilers regarding missing return values. 

> It seemed better to
> actually return something in a production build rather than aborting.

Yeah, fully agree.

> Perhaps I should have done a
> return "" /* silence compilers */;

Using pg_unreachable() is fine but a comment (like the return value you're 
suggesting above) could help.

Thanks for the explanation!

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Reorder shutdown sequence, to flush pgstats later

От

Bertrand Drouvot

Дата:

09 января, 15:01:05

Hi,

On Wed, Jan 08, 2025 at 02:26:15PM -0500, Andres Freund wrote:
> I'm currently to plan the four patches relatively soon, unless somebody speaks
> up, of course. They seem fairly uncontroversial.  The renaming of the phases
> doesn't need to wait much longer, I think.

Thanks for the patches.

A few comments:

0001 LGTM.

0002,

=== 1

+static const char *
+pm_signame(int signal)
+{
+#define PM_TOSTR_CASE(state) case state: return #state
+   switch (signal)

s/state/signal/? (seems better in the pm_signame() context)

0003 and 0004 LGTM.

0005,

=== 2

+ PM_WAIT_XLOG_ARCHIVAL,          /* waiting for archiver and walsenders to

> I don't love PM_WAIT_XLOG_ARCHIVAL, but I can't think of anything better.

PM_WAIT_ARCHIVER_WALSENDERS maybe? (that would follow the pattern of naming
the processes like PM_WAIT_BACKENDS, PM_WAIT_CHECKPOINTER for example).

That said, I'm not 100% convinced it makes it more clear though...

> The last two (0006: trigger checkpoints via SetLatch, 0007: change the
> shutdown sequence), probably can use a few more eyes.

0006,

=== 3

+        * when it does start, with or without getting a signal.

s/getting a signal/getting a latch set/ or "getting notified"?

=== 4

+  if (checkpointerProc == INVALID_PROC_NUMBER)
   {
      if (ntries >= MAX_SIGNAL_TRIES || !(flags & CHECKPOINT_WAIT))
      {
          elog((flags & CHECKPOINT_WAIT) ? ERROR : LOG,
-               "could not signal for checkpoint: checkpointer is not running");
+                "could not notify checkpoint: checkpointer is not running");

Worth renaming MAX_SIGNAL_TRIES with MAX_NOTIFY_TRIES then?

0007,

=== 5

+       pqsignal(SIGINT, ReqShutdownXLOG);

Worth a comment like:

pqsignal(SIGINT, ReqShutdownXLOG);  /* trigger shutdown checkpoint */


=== 6

+ * Wait until we're asked to shut down. By seperating the writing of the

Typo: s/seperating/separating/

I don't really anything else in 0006 and 0007 but as you said it's probably
worth a few more eyes as the code change is pretty substantial.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Reorder shutdown sequence, to flush pgstats later

От

Nazir Bilal Yavuz

Дата:

09 января, 16:50:45

Hi,

Thanks for working on this!

On Wed, 8 Jan 2025 at 22:26, Andres Freund <andres@anarazel.de> wrote:
>
> I'm currently to plan the four patches relatively soon, unless somebody speaks
> up, of course. They seem fairly uncontroversial.  The renaming of the phases
> doesn't need to wait much longer, I think.
>
> The last two (0006: trigger checkpoints via SetLatch, 0007: change the
> shutdown sequence), probably can use a few more eyes.

Some minor comments:

=== 0001

LGTM.

=== 0002

+    }
+#undef PM_TOSTR_CASE
+    pg_unreachable();
+}

Maybe a blank line after '#undef PM_TOSTR_CASE' (or no blank line at
the end of the pmstate_name() at 0001)?

+    ereport(DEBUG3,
+            (errmsg_internal("sending signal %d/%s to %s process %d",

I am not sure if that makes sense but with the addition of the backend
type here, I think 'sending signal %d/%s to %s process with the pid of
%d' would be clearer.

=== 0003

LGTM.

=== 0004

LGTM.

=== 0005

> I think this is much better than before. I don't love PM_WAIT_XLOG_ARCHIVAL,
> but I can't think of anything better.

I liked this, I think it is good enough.

-    PM_SHUTDOWN,                /* waiting for checkpointer to do shutdown
+    PM_WAIT_XLOG_SHUTDOWN,        /* waiting for checkpointer to do shutdown
                                  * ckpt */

There are couple of variables and functions which include pm_shutdown
in their names:
    pending_pm_shutdown_request
    handle_pm_shutdown_request_signal()
    process_pm_shutdown_request()

Do you think these need to be updated as well?

=== 0006

Please see below.

=== 0007

-                 * told to shut down.  We expect that it wrote a shutdown
-                 * checkpoint.  (If for some reason it didn't, recovery will
-                 * occur on next postmaster start.)
+                 * told to shut down.  We know it wrote a shutdown checkpoint,
+                 * otherwise we'd still be in PM_SHUTDOWN.

s/PM_SHUTDOWN/PM_WAIT_XLOG_SHUTDOWN/

I have these comments for now. I need to study 0006 and 0007 more.

--
Regards,
Nazir Bilal Yavuz
Microsoft

Re: Reorder shutdown sequence, to flush pgstats later

От

Andres Freund

Дата:

10 января, 21:42:11

Hi,

On 2025-01-09 16:50:45 +0300, Nazir Bilal Yavuz wrote:
> On Wed, 8 Jan 2025 at 22:26, Andres Freund <andres@anarazel.de> wrote:
> === 0005
> 
> > I think this is much better than before. I don't love PM_WAIT_XLOG_ARCHIVAL,
> > but I can't think of anything better.
> 
> I liked this, I think it is good enough.
> 
> -    PM_SHUTDOWN,                /* waiting for checkpointer to do shutdown
> +    PM_WAIT_XLOG_SHUTDOWN,        /* waiting for checkpointer to do shutdown
>                                   * ckpt */
> 
> There are couple of variables and functions which include pm_shutdown
> in their names:
>     pending_pm_shutdown_request
>     handle_pm_shutdown_request_signal()
>     process_pm_shutdown_request()
>
> Do you think these need to be updated as well?

I don't think so - I think those are actually not specifically referencing the
PM_SHUTDOWN symbol. They're referencing shutting down postmaster - which
neither starts nor ends with PM_SHUTDOWN.

Greetings,

Andres Freund

Re: Reorder shutdown sequence, to flush pgstats later

От

Andres Freund

Дата:

10 января, 21:43:14

Hi,

Thanks for the reviews!

On 2025-01-09 12:01:05 +0000, Bertrand Drouvot wrote:
> On Wed, Jan 08, 2025 at 02:26:15PM -0500, Andres Freund wrote:
> === 2
>
> + PM_WAIT_XLOG_ARCHIVAL,          /* waiting for archiver and walsenders to
>
> > I don't love PM_WAIT_XLOG_ARCHIVAL, but I can't think of anything better.
>
> PM_WAIT_ARCHIVER_WALSENDERS maybe? (that would follow the pattern of naming
> the processes like PM_WAIT_BACKENDS, PM_WAIT_CHECKPOINTER for example).
>
> That said, I'm not 100% convinced it makes it more clear though...

Yea, I just went with PM_WAIT_XLOG_ARCHIVAL, it's ok enough.


> > The last two (0006: trigger checkpoints via SetLatch, 0007: change the
> > shutdown sequence), probably can use a few more eyes.
>
> 0006,
>
> === 3
>
> +        * when it does start, with or without getting a signal.
>
> s/getting a signal/getting a latch set/ or "getting notified"?

I went with "with or without the SetLatch()".


> === 4
>
> +  if (checkpointerProc == INVALID_PROC_NUMBER)
>    {
>       if (ntries >= MAX_SIGNAL_TRIES || !(flags & CHECKPOINT_WAIT))
>       {
>           elog((flags & CHECKPOINT_WAIT) ? ERROR : LOG,
> -               "could not signal for checkpoint: checkpointer is not running");
> +                "could not notify checkpoint: checkpointer is not running");
>
> Worth renaming MAX_SIGNAL_TRIES with MAX_NOTIFY_TRIES then?

IDK, didn't seem worth it. SetLatch() is a form of signalling and I don't
think "notify" really is better. And it requires changing more lines...

> === 5
>
> +       pqsignal(SIGINT, ReqShutdownXLOG);
>
> Worth a comment like:
>
> pqsignal(SIGINT, ReqShutdownXLOG);  /* trigger shutdown checkpoint */

That seems just a restatement of the function name, don't think we gain
anything by that.

Greetings,

Andres Freund

Re: Reorder shutdown sequence, to flush pgstats later

От

Bertrand Drouvot

Дата:

13 января, 15:20:39

Hi,

On Fri, Jan 10, 2025 at 02:07:25PM -0500, Andres Freund wrote:
> However, I think 0007 needs a bit more work.
> 
> One thing that worries me a bit is that using SIGINT for triggering the
> shutdown checkpoint could somehow be unintentionally triggered? Previously
> checkpointer would effectively have ignored that,

Right.

> but now it'd bring the whole
> cluster down (because postmaster would see an unexpected ShutdownXLOG()). But
> I can't really see a legitimate reason for checkpointer to get spurious
> SIGINTs.

Yeah, appart from human error on the OS side, I don't see any reasons too.

> This made me think about what happens if a spurious SIGINT is sent - and in
> the last version the answer wasn't great, namely everything seemed to go on
> normal, except that the cluster couldn't be shut down normally.

Yeah, and I think that checkpoints would hang.

> There wasn't
> any code to treat receiving PMSIGNAL_XLOG_IS_SHUTDOWN in the wrong state as
> fatal.  I think this needs to be treated the same way we treat not being able
> to fork checkpointer during a shutdown.  Now receiving
> PMSIGNAL_XLOG_IS_SHUTDOWN while not in PM_WAIT_XLOG_SHUTDOWN triggers
> FatalError processing after logging "WAL was shut down unexpectedly".

I wonder if we could first ask the postmaster a confirmation that the SIGINT is
coming from it? (and if not, then simply ignore the SIGINT). I thought about a
shared memory flag but the postmaster has to be keept away from shared memory
operations, so that's not a valid idea. Another idea could be that the checkpointer
sends a new "confirm" request to the postmater first. But then I'm not sure how
the postmaster could reply back (given the signals that are already being used)
and that might overcomplicate things.

So yeah, might be better to be on the safe side of thing and "simply" enter Fatal.

> We have
> multiple copies of code to go to FatalError, that doesn't seem great.

+  * FIXME: This should probably not have a copy fo the code to
+  * transition into FatalError state.
+  */
+  ereport(LOG,
+          (errmsg("WAL was shut down unexpectedly")));

Indeed, might be worth extracting this into a helper function or such.

> Another thing I started worrying about is that the commit added
> PM_WAIT_CHECKPOINTER *after* PM_WAIT_DEAD_END. However, we have a few places
> where we directly jump to PM_WAIT_DEAD_END in fatal situations:
> 
>                 /*
>                  * Stop any dead-end children and stop creating new ones.
>                  */
>                 UpdatePMState(PM_WAIT_DEAD_END);
>                 ConfigurePostmasterWaitSet(false);
>                 SignalChildren(SIGQUIT, btmask(B_DEAD_END_BACKEND));
> and
>                     /*
>                      * If we failed to fork a checkpointer, just shut down.
>                      * Any required cleanup will happen at next restart. We
>                      * set FatalError so that an "abnormal shutdown" message
>                      * gets logged when we exit.
>                      *
>                      * We don't consult send_abort_for_crash here, as it's
>                      * unlikely that dumping cores would illuminate the reason
>                      * for checkpointer fork failure.
>                      */
>                     FatalError = true;
>                     UpdatePMState(PM_WAIT_DEAD_END);
>                     ConfigurePostmasterWaitSet(false);
> 
> 
> I don't think this actively causes problems today,

For the first case, we'd ask the checkpointer to do work in the PM_WAIT_DEAD_END
case while already SIGQUIT'd (through SignalHandlerForCrashExit()). But that's
not an issue per say as we check for CheckpointerPMChild != NULL when
pmState == PM_WAIT_DEAD_END.

For the second case (where we failed to fork a checkpointer), we'd also
check for CheckpointerPMChild != NULL when pmState == PM_WAIT_DEAD_END.

So in both case we'd ask a non existing checkpointer to do work but we later
check that it exists, so I also don't think that it causes any problem.

> but it seems rather iffy.

Yeah, we intend to ask a non existing checkpointer process to do work...

> Maybe PM_WAIT_CHECKPOINTER should be before PM_WAIT_DEAD_END? I earlier
> thought it'd be better to shut checkpointer down after even dead-end children
> exited, in case we ever wanted to introduce stats for dead-end children - but
> that seems rather unlikely?

Yeah, given the above, it looks more clean to change that ordering.

Indeed that seems unlikely to introduce dead-end children stats and even if we do
, we could think back to put PM_WAIT_CHECKPOINTER after PM_WAIT_DEAD_END (or any
oher way to ensure the stats are flushed after they are shut down). So it's
probably better to focus on the current state and then put
PM_WAIT_CHECKPOINTER before PM_WAIT_DEAD_END.

Anyway, if we reach the 2 cases mentioned above we're not going to flush the
stats anyway (CheckpointerPMChild is NULL) so better to reorder then.

> Independent of this patch, we seem to deal with FatalError processing in a
> somewhat inconsistently.  We have this comment (in master):
>         /*
>          * PM_WAIT_DEAD_END state ends when all other children are gone except
>          * for the logger.  During normal shutdown, all that remains are
>          * dead-end backends, but in FatalError processing we jump straight
>          * here with more processes remaining.  Note that they have already
>          * been sent appropriate shutdown signals, either during a normal
>          * state transition leading up to PM_WAIT_DEAD_END, or during
>          * FatalError processing.
> 
> However that doesn't actually appear to be true in the most common way to
> reach FatalError, via HandleChildCrash():
> 
>     if (Shutdown != ImmediateShutdown)
>         FatalError = true;
> 
>     /* We now transit into a state of waiting for children to die */
>     if (pmState == PM_RECOVERY ||
>         pmState == PM_HOT_STANDBY ||
>         pmState == PM_RUN ||
>         pmState == PM_STOP_BACKENDS ||
>         pmState == PM_WAIT_XLOG_SHUTDOWN)
>         UpdatePMState(PM_WAIT_BACKENDS);
> 
> Here we
> a) don't directly go to PM_WAIT_DEAD_END, but PM_WAIT_BACKENDS
> 
>    From PM_WAIT_BACKENDS we'll switch to PM_WAIT_DEAD_END if all processes
>    other than walsender, archiver and dead-end children exited.
> 
> b) in PM_WAIT_XLOG_SHUTDOWN (previously named PM_SHUTDOWN) we go *back* to
>    PM_WAIT_BACKENDS?
> 
> This comment seems to have been added in
> 
> commit a78af0427015449269fb7d9d8c6057cfcb740149
> Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date:   2024-11-14 16:12:28 +0200
> 
>     Assign a child slot to every postmaster child process
> 
> ISTM that section of the comment is largely wrong and has never really been
> the case if my git sleuthing is correct?

I agree that does not look correct but I don't think it's coming from a78af0427015449269fb7d9d8c6057cfcb740149.
ISTM that a78af0427015449269fb7d9d8c6057cfcb740149 re-worded this already wrong
comment:

"
        /*
         * PM_WAIT_DEAD_END state ends when the BackendList is entirely empty
         * (ie, no dead_end children remain), and the archiver is gone too.
         *
         * The reason we wait for those two is to protect them against a new
         * postmaster starting conflicting subprocesses; this isn't an
         * ironclad protection, but it at least helps in the
         * shutdown-and-immediately-restart scenario.  Note that they have
         * already been sent appropriate shutdown signals, either during a
         * normal state transition leading up to PM_WAIT_DEAD_END, or during
         * FatalError processing.
         */
"

while we could also see:

"
    if (Shutdown != ImmediateShutdown)
        FatalError = true;

    /* We now transit into a state of waiting for children to die */
    if (pmState == PM_RECOVERY ||
        pmState == PM_HOT_STANDBY ||
        pmState == PM_RUN ||
        pmState == PM_STOP_BACKENDS ||
        pmState == PM_SHUTDOWN)
        pmState = PM_WAIT_BACKENDS;
"

It seems to me that it has been introduced in e6a442c71b30f62e7b5eee6058afc961b1c7f29b
where the above comment has been added and does not align with what HandleChildCrash()
was already doing (was already setting PM_WAIT_BACKENDS).

But anyway I do agree that this comment look wrong.

> We do have one place that directly switches to PM_WAIT_DEAD_END:
>                     /*
>                      * If we failed to fork a checkpointer, just shut down.
>                      * Any required cleanup will happen at next restart. We
>                      * set FatalError so that an "abnormal shutdown" message
>                      * gets logged when we exit.
>                      *
>                      * We don't consult send_abort_for_crash here, as it's
>                      * unlikely that dumping cores would illuminate the reason
>                      * for checkpointer fork failure.
>                      */
>                     FatalError = true;
>                     UpdatePMState(PM_WAIT_DEAD_END);
>                     ConfigurePostmasterWaitSet(false);
> 
> but I suspect that's just about unreachable these days.  If checkpointer exits
> outside of shutdown processing, it's treated as as a reason to crash-restart:
>                 /*
>                  * Any unexpected exit of the checkpointer (including FATAL
>                  * exit) is treated as a crash.
>                  */
>                 HandleChildCrash(pid, exitstatus,
>                                  _("checkpointer process"));
>

I do agree. I was trying to reach this code path while replying to one of
the above comments (about PM_WAIT_DEAD_END ordering) but had to replace with
another if condition to reach it during testing.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Reorder shutdown sequence, to flush pgstats later

От

Andres Freund

Дата:

14 января, 01:33:41

Hi,

On 2025-01-13 12:20:39 +0000, Bertrand Drouvot wrote:
> On Fri, Jan 10, 2025 at 02:07:25PM -0500, Andres Freund wrote:
> > There wasn't
> > any code to treat receiving PMSIGNAL_XLOG_IS_SHUTDOWN in the wrong state as
> > fatal.  I think this needs to be treated the same way we treat not being able
> > to fork checkpointer during a shutdown.  Now receiving
> > PMSIGNAL_XLOG_IS_SHUTDOWN while not in PM_WAIT_XLOG_SHUTDOWN triggers
> > FatalError processing after logging "WAL was shut down unexpectedly".
> 
> I wonder if we could first ask the postmaster a confirmation that the SIGINT is
> coming from it? (and if not, then simply ignore the SIGINT). I thought about a
> shared memory flag but the postmaster has to be keept away from shared memory
> operations, so that's not a valid idea. Another idea could be that the checkpointer
> sends a new "confirm" request to the postmater first. But then I'm not sure how
> the postmaster could reply back (given the signals that are already being used)
> and that might overcomplicate things.

That sounds more complicated than it's worth it for a hypothetical risk.



> > Maybe PM_WAIT_CHECKPOINTER should be before PM_WAIT_DEAD_END? I earlier
> > thought it'd be better to shut checkpointer down after even dead-end children
> > exited, in case we ever wanted to introduce stats for dead-end children - but
> > that seems rather unlikely?
> 
> Yeah, given the above, it looks more clean to change that ordering.

I'll post a version that does that in a bit.


> > Independent of this patch, we seem to deal with FatalError processing in a
> > somewhat inconsistently.  We have this comment (in master):
> >         /*
> >          * PM_WAIT_DEAD_END state ends when all other children are gone except
> >          * for the logger.  During normal shutdown, all that remains are
> >          * dead-end backends, but in FatalError processing we jump straight
> >          * here with more processes remaining.  Note that they have already
> >          * been sent appropriate shutdown signals, either during a normal
> >          * state transition leading up to PM_WAIT_DEAD_END, or during
> >          * FatalError processing.
> > 
> > However that doesn't actually appear to be true in the most common way to
> > reach FatalError, via HandleChildCrash():
> > 
> >     if (Shutdown != ImmediateShutdown)
> >         FatalError = true;
> > 
> >     /* We now transit into a state of waiting for children to die */
> >     if (pmState == PM_RECOVERY ||
> >         pmState == PM_HOT_STANDBY ||
> >         pmState == PM_RUN ||
> >         pmState == PM_STOP_BACKENDS ||
> >         pmState == PM_WAIT_XLOG_SHUTDOWN)
> >         UpdatePMState(PM_WAIT_BACKENDS);
> > 
> > Here we
> > a) don't directly go to PM_WAIT_DEAD_END, but PM_WAIT_BACKENDS
> > 
> >    From PM_WAIT_BACKENDS we'll switch to PM_WAIT_DEAD_END if all processes
> >    other than walsender, archiver and dead-end children exited.
> > 
> > b) in PM_WAIT_XLOG_SHUTDOWN (previously named PM_SHUTDOWN) we go *back* to
> >    PM_WAIT_BACKENDS?
> > 
> > This comment seems to have been added in
> > 
> > commit a78af0427015449269fb7d9d8c6057cfcb740149
> > Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> > Date:   2024-11-14 16:12:28 +0200
> > 
> >     Assign a child slot to every postmaster child process
> > 
> > ISTM that section of the comment is largely wrong and has never really been
> > the case if my git sleuthing is correct?
> 
> I agree that does not look correct but I don't think it's coming from a78af0427015449269fb7d9d8c6057cfcb740149.
> ISTM that a78af0427015449269fb7d9d8c6057cfcb740149 re-worded this already wrong
> comment:
> 
> "
>         /*
>          * PM_WAIT_DEAD_END state ends when the BackendList is entirely empty
>          * (ie, no dead_end children remain), and the archiver is gone too.
>          *
>          * The reason we wait for those two is to protect them against a new
>          * postmaster starting conflicting subprocesses; this isn't an
>          * ironclad protection, but it at least helps in the
>          * shutdown-and-immediately-restart scenario.  Note that they have
>          * already been sent appropriate shutdown signals, either during a
>          * normal state transition leading up to PM_WAIT_DEAD_END, or during
>          * FatalError processing.
>          */
> "

Hm. This version doesn't really seem wrong in the same way / to the same
degree.


> while we could also see:
> 
> "
>     if (Shutdown != ImmediateShutdown)
>         FatalError = true;
> 
>     /* We now transit into a state of waiting for children to die */
>     if (pmState == PM_RECOVERY ||
>         pmState == PM_HOT_STANDBY ||
>         pmState == PM_RUN ||
>         pmState == PM_STOP_BACKENDS ||
>         pmState == PM_SHUTDOWN)
>         pmState = PM_WAIT_BACKENDS;
> "

I don't think this make the version of the comment you included above
inaccurate? It's saying that the signal has been sent in the state *leading
up" to PM_WAIT_DEAD_END, which does seem accurate?  The code the comment is
about just won't be reached until postmaster transition to PM_WAIT_DEAD_END.


> > but I suspect that's just about unreachable these days.  If checkpointer exits
> > outside of shutdown processing, it's treated as as a reason to crash-restart:
> >                 /*
> >                  * Any unexpected exit of the checkpointer (including FATAL
> >                  * exit) is treated as a crash.
> >                  */
> >                 HandleChildCrash(pid, exitstatus,
> >                                  _("checkpointer process"));
> >
> 
> I do agree. I was trying to reach this code path while replying to one of
> the above comments (about PM_WAIT_DEAD_END ordering) but had to replace with
> another if condition to reach it during testing.

I was able to reach it unfortunately

1) Repeated fork failures of checkpointer can lead to it

2) When crash-restarting we, don't start checkpointer immediately, but defer
   that until LaunchMissingBackgroundProcesses(). While not easy, it's
   possible to trigger a shutdown before that happens.

Greetings,

Andres Freund

Re: Reorder shutdown sequence, to flush pgstats later

От

Bertrand Drouvot

Дата:

14 января, 16:06:31

Hi,

On Tue, Jan 14, 2025 at 12:58:44AM -0500, Andres Freund wrote:
> Hi,
> 
> On 2025-01-13 12:20:39 +0000, Bertrand Drouvot wrote:
> > > We have
> > > multiple copies of code to go to FatalError, that doesn't seem great.
> > 
> > +  * FIXME: This should probably not have a copy fo the code to
> > +  * transition into FatalError state.
> > +  */
> > +  ereport(LOG,
> > +          (errmsg("WAL was shut down unexpectedly")));
> > 
> > Indeed, might be worth extracting this into a helper function or such.
> 
> That is quite a rats nest of issues :(. There are quite a few oddities around
> the related code.  The attached series contains a few commits trying to unify
> the paths transitioning to FatalError = true into the new HandleFatalError().

Thanks! 

> The current code has the weird behaviour of going through PM_WAIT_BACKENDS. I
> left it like that for now. In fact more paths do so now, because we did so for
> PM_WAIT_XLOG_SHUTDOWN but *didn't* do so for PM_WAIT_XLOG_ARCHIVAL, which
> seems rather weird.
> 
> I suspect it'd be better to switch directly to PM_DEAD_END and have
> HandleFatalError always do ConfigurePostmasterWaitSet(false), but that's left
> for another time.

That would probably make more sense, but would be more invasive and would
need careful testing. Worth a XXX comment in the code?

> After the earlier commit that just moves the code I turned the existing if ()
> into a switch so that whoever adds new states is told to look at that code by
> the compiler.

Good idea!

> Thoughts?
> 

======== Patch 0001:

LGTM

======== Patch 0002:

It makes use of TerminateChildren() in HandleChildCrash(). TerminateChildren()
follows the exact same logic as the code being replaced (exclude syslogger,
StartupPMChild special case), so LGTM.

======== Patch 0003:

It replaces the take_action boolean flag with an early return.

One comment:

=== 1

+       TerminateChildren(send_abort_for_crash ? SIGABRT : SIGQUIT);

        if (Shutdown != ImmediateShutdown)
                FatalError = true;

I think we can avoid this remaining extra check and set FatalError to true
unconditionally (as we already know that Shutdown != ImmediateShutdown as per
the first check in the function).

======== Patch 0004:

One comment:

=== 2

As per comment === 1 above, then we could remove:

    if (Shutdown != ImmediateShutdown)
        FatalError = true;

from HandleFatalError(). And that would be up to the HandleFatalError() caller
to set FatalError to true (prior calling HandleFatalError()).

HandleFatalError becomes a pure "transition to error state" function then.
Does that make sense to you?

======== Patch 0005:

Comments:

=== 3

If so, then in this change:

-  FatalError = true;
-  UpdatePMState(PM_WAIT_DEAD_END);
-  ConfigurePostmasterWaitSet(false);
-
-  /* Kill the walsenders and archiver too */
-  SignalChildren(SIGQUIT, btmask_all_except(B_LOGGER));
+  HandleFatalError(PMQUIT_FOR_CRASH, false);

we should keep the "FatalError = true" part.

=== 4

+  * XXX: Is it worth inventing a different PMQUIT value
+  * that signals that the cluster is in a bad state,
+  * without a process having crashed?
   */

That would be more "accurate". Something like PMQUIT_FORK_FAILURE or such.

> I suspect it'd be better to switch directly to PM_DEAD_END and have
> HandleFatalError always do ConfigurePostmasterWaitSet(false), but that's left
> for another time.

Before doing so, what do you think about:

1. keeping FatalError = true; as suggeted above
2. put back the UpdatePMState(PM_WAIT_DEAD_END) (prior to the HandleFatalError(
PMQUIT_FOR_CRASH, false) call
3. put the ConfigurePostmasterWaitSet(false) in the PM_WAIT_DEAD_END switch in
HandleFatalError()?

That's not ideal but that way we would keep the "current" behavior (i.e failure
to fork checkpointer transitions through PM_WAIT_DEAD_END) during the time the
"switch directly to PM_DEAD_END" is not implemented?

But that would also need to call TerminateChildren() (a second time) again after
ConfigurePostmasterWaitSet() which is not ideal though (unless we move the
TerminateChildren() calls in the switch or add a check on PM_WAIT_DEAD_END in
the first call).

======== Patch 0006:

The patch adds archiver and walsender to processes that should exit during
crash/immediate shutdown.

LGTM.

======== Patch 0007:

Comments:

=== 5

commit message says "Change shutdown sequence to terminate checkpointer last",
which is not 100% accurate.

While this one is:

"
Checkpointer now is terminated
    after all children other than dead-end ones have been terminated,
"

=== 6 (Nit)

+                * Close down the database.

s/database/database system/?

Was already there before the patch, just suggesting in passing...

=== 7

+  * PM_WAIT_XLOG_ARCHIVAL is in proccess_pm_pmsignal(), in response to

typo: s/proccess_pm_pmsignal/process_pm_pmsignal/

=== 8

+   * Now that everyone important is gone

s/everyone important is/walsenders and archiver are also gone/?

=== 9

+        * in proccess_pm_child_exit().

typo: s/proccess_pm_child_exit/process_pm_child_exit/

=== 10

+   /*
+    * Doesn't seem likely to help to take send_abort_for_crash into
+    * account here.
+    */
+    HandleFatalError(PMQUIT_FOR_CRASH, false);

If comments === 2 and === 3 made sense to you then we should set FatalError
to true here prior the HandleFatalError call. 

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Reorder shutdown sequence, to flush pgstats later

От

Andres Freund

Дата:

15 января, 02:55:03

Hi,

On 2025-01-14 13:06:31 +0000, Bertrand Drouvot wrote:
> On Tue, Jan 14, 2025 at 12:58:44AM -0500, Andres Freund wrote:
> > The current code has the weird behaviour of going through PM_WAIT_BACKENDS. I
> > left it like that for now. In fact more paths do so now, because we did so for
> > PM_WAIT_XLOG_SHUTDOWN but *didn't* do so for PM_WAIT_XLOG_ARCHIVAL, which
> > seems rather weird.
> > 
> > I suspect it'd be better to switch directly to PM_DEAD_END and have
> > HandleFatalError always do ConfigurePostmasterWaitSet(false), but that's left
> > for another time.
> 
> That would probably make more sense, but would be more invasive and would
> need careful testing. Worth a XXX comment in the code?

There is, although I guess it could be reformulated a bit.


> ======== Patch 0003:
> 
> It replaces the take_action boolean flag with an early return.
> 
> One comment:
> 
> === 1
> 
> +       TerminateChildren(send_abort_for_crash ? SIGABRT : SIGQUIT);
> 
>         if (Shutdown != ImmediateShutdown)
>                 FatalError = true;
> 
> I think we can avoid this remaining extra check and set FatalError to true
> unconditionally (as we already know that Shutdown != ImmediateShutdown as per
> the first check in the function).
>
> ======== Patch 0004:
> 
> One comment:
> 
> === 2
> 
> As per comment === 1 above, then we could remove:
> 
>     if (Shutdown != ImmediateShutdown)
>         FatalError = true;
> 
> from HandleFatalError(). And that would be up to the HandleFatalError() caller
> to set FatalError to true (prior calling HandleFatalError()).
> 
> HandleFatalError becomes a pure "transition to error state" function then.
> Does that make sense to you?

I don't see what we'd gain from moving FatalError = true separate from
HandleFatalError? All it'll mean is more copied code?


> === 4
> 
> +  * XXX: Is it worth inventing a different PMQUIT value
> +  * that signals that the cluster is in a bad state,
> +  * without a process having crashed?
>    */
> 
> That would be more "accurate". Something like PMQUIT_FORK_FAILURE or such.

For now I think I'll leave it as such, as we'd need to have a new message in
quickdie(). While the primary message for PMQUIT_FOR_CRASH message would
differ, the rest seems like it'd largely be duplicated:

        case PMQUIT_FOR_CRASH:
            /* A crash-and-restart cycle is in progress */
            ereport(WARNING_CLIENT_ONLY,
                    (errcode(ERRCODE_CRASH_SHUTDOWN),
                     errmsg("terminating connection because of crash of another server process"),
                     errdetail("The postmaster has commanded this server process to roll back"
                               " the current transaction and exit, because another"
                               " server process exited abnormally and possibly corrupted"
                               " shared memory."),
                     errhint("In a moment you should be able to reconnect to the"
                             " database and repeat your command.")));
            break;

That seems like a separate change.  Particularly because I'm just working on
this as part of some nasty three layer deep dependency chain - aio is pretty
big on its own...


> > I suspect it'd be better to switch directly to PM_DEAD_END and have
> > HandleFatalError always do ConfigurePostmasterWaitSet(false), but that's left
> > for another time.
> 
> Before doing so, what do you think about:
> 
> 1. keeping FatalError = true; as suggeted above
> 2. put back the UpdatePMState(PM_WAIT_DEAD_END) (prior to the HandleFatalError(
> PMQUIT_FOR_CRASH, false) call

I don't like that, what's the point of having something like HandleFatalError
if we duplicate more code than it saves to each place?

I don't think we need to either, we can just do that in the relevant case
statement in HandleFatalError(). That works, I went back and forth several
times :)


> 3. put the ConfigurePostmasterWaitSet(false) in the PM_WAIT_DEAD_END switch in
> HandleFatalError()?

That one would make sense to me.



> But that would also need to call TerminateChildren() (a second time) again after
> ConfigurePostmasterWaitSet() which is not ideal though (unless we move the
> TerminateChildren() calls in the switch or add a check on PM_WAIT_DEAD_END in
> the first call).

Why would it need to?

> === 8
> 
> +   * Now that everyone important is gone
> 
> s/everyone important is/walsenders and archiver are also gone/?

I'd rather get away from these lists of specific processes - they end up being
noisy in the git history because the lines get updated to add/remove items (or
updates to them get missed).

Greetings,

Andres Freund

Re: Reorder shutdown sequence, to flush pgstats later

От

Bertrand Drouvot

Дата:

15 января, 11:38:33

Hi,

On Tue, Jan 14, 2025 at 06:55:03PM -0500, Andres Freund wrote:
> Hi,
> 
> On 2025-01-14 13:06:31 +0000, Bertrand Drouvot wrote:
> > On Tue, Jan 14, 2025 at 12:58:44AM -0500, Andres Freund wrote:
> > > The current code has the weird behaviour of going through PM_WAIT_BACKENDS. I
> > > left it like that for now. In fact more paths do so now, because we did so for
> > > PM_WAIT_XLOG_SHUTDOWN but *didn't* do so for PM_WAIT_XLOG_ARCHIVAL, which
> > > seems rather weird.
> > > 
> > > I suspect it'd be better to switch directly to PM_DEAD_END and have
> > > HandleFatalError always do ConfigurePostmasterWaitSet(false), but that's left
> > > for another time.
> > 
> > That would probably make more sense, but would be more invasive and would
> > need careful testing. Worth a XXX comment in the code?
> 
> There is, although I guess it could be reformulated a bit.

Okay, thanks!

> > I think we can avoid this remaining extra check and set FatalError to true
> > unconditionally (as we already know that Shutdown != ImmediateShutdown as per
> > the first check in the function).
> >
> > ======== Patch 0004:
> > 
> > One comment:
> > 
> > === 2
> > 
> > As per comment === 1 above, then we could remove:
> > 
> >     if (Shutdown != ImmediateShutdown)
> >         FatalError = true;
> > 
> > from HandleFatalError(). And that would be up to the HandleFatalError() caller
> > to set FatalError to true (prior calling HandleFatalError()).
> > 
> > HandleFatalError becomes a pure "transition to error state" function then.
> > Does that make sense to you?
> 
> I don't see what we'd gain from moving FatalError = true separate from
> HandleFatalError? All it'll mean is more copied code?

The "Shutdown != ImmediateShutdown" test made me think that it might be cases
where we don't want to set FatalError to true in HandleFatalError(), so the
proposal. I can see that that's not the case (which makes fully sense) so I agree
that it's better to set FatalError to true unconditionally in HandleFatalError().

> 
> > === 4
> > 
> > +  * XXX: Is it worth inventing a different PMQUIT value
> > +  * that signals that the cluster is in a bad state,
> > +  * without a process having crashed?
> >    */
> > 
> > That would be more "accurate". Something like PMQUIT_FORK_FAILURE or such.
> 
> For now I think I'll leave it as such, as we'd need to have a new message in
> quickdie(). While the primary message for PMQUIT_FOR_CRASH message would
> differ, the rest seems like it'd largely be duplicated:
> 
>         case PMQUIT_FOR_CRASH:
>             /* A crash-and-restart cycle is in progress */
>             ereport(WARNING_CLIENT_ONLY,
>                     (errcode(ERRCODE_CRASH_SHUTDOWN),
>                      errmsg("terminating connection because of crash of another server process"),
>                      errdetail("The postmaster has commanded this server process to roll back"
>                                " the current transaction and exit, because another"
>                                " server process exited abnormally and possibly corrupted"
>                                " shared memory."),
>                      errhint("In a moment you should be able to reconnect to the"
>                              " database and repeat your command.")));
>             break;
> 
> That seems like a separate change.  Particularly because I'm just working on
> this as part of some nasty three layer deep dependency chain - aio is pretty
> big on its own...

Fair enough. I'll look at the remaining pieces once this stuff goes in.

> > > I suspect it'd be better to switch directly to PM_DEAD_END and have
> > > HandleFatalError always do ConfigurePostmasterWaitSet(false), but that's left
> > > for another time.
> > 
> > Before doing so, what do you think about:
> > 
> > 1. keeping FatalError = true; as suggeted above
> > 2. put back the UpdatePMState(PM_WAIT_DEAD_END) (prior to the HandleFatalError(
> > PMQUIT_FOR_CRASH, false) call
> 
> I don't like that, what's the point of having something like HandleFatalError
> if we duplicate more code than it saves to each place?
> 
> I don't think we need to either, we can just do that in the relevant case
> statement in HandleFatalError(). That works, I went back and forth several
> times :)

Okay ;-)

> > 3. put the ConfigurePostmasterWaitSet(false) in the PM_WAIT_DEAD_END switch in
> > HandleFatalError()?
> 
> That one would make sense to me.

> > But that would also need to call TerminateChildren() (a second time) again after
> > ConfigurePostmasterWaitSet() which is not ideal though (unless we move the
> > TerminateChildren() calls in the switch or add a check on PM_WAIT_DEAD_END in
> > the first call).
> 
> Why would it need to?

Because I thought that TerminateChildren() needs to be called after
ConfigurePostmasterWaitSet(false). If not, could new connections be accepted in
the window between TerminateChildren() and ConfigurePostmasterWaitSet(false)?

> > === 8
> > 
> > +   * Now that everyone important is gone
> > 
> > s/everyone important is/walsenders and archiver are also gone/?
> 
> I'd rather get away from these lists of specific processes - they end up being
> noisy in the git history because the lines get updated to add/remove items (or
> updates to them get missed).

Good point, that's more "difficult" to maintain. Though I'm not sure that
"important" is the right wording. "Now that non dead processes have finished"
maybe? Could be seen as a Nit too.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Reorder shutdown sequence, to flush pgstats later

От

Andres Freund

Дата:

25 января, 19:44:54

Hi,

On 2025-01-24 15:06:17 -0500, Andres Freund wrote:
> Unless somebody argues against, I'm planning to push all but the last later
> today, wait for the buildfarm to settle, and then push the last.  This is a
> dependency of multiple other things, so it'd be good to get it in soon.

I did push all but the last yesterday and, not seeing any related issues in
the BF, pushed the last one just now.  Let's hope the BF stays green(ish)...

Greetings,

Andres

Re: Reorder shutdown sequence, to flush pgstats later

От

Bertrand Drouvot

Дата:

27 января, 10:26:37

Hi,

On Sat, Jan 25, 2025 at 11:44:54AM -0500, Andres Freund wrote:
> Hi,
> 
> On 2025-01-24 15:06:17 -0500, Andres Freund wrote:
> > Unless somebody argues against, I'm planning to push all but the last later
> > today, wait for the buildfarm to settle, and then push the last.  This is a
> > dependency of multiple other things, so it'd be good to get it in soon.
> 
> I did push all but the last yesterday and, not seeing any related issues in
> the BF, pushed the last one just now.  Let's hope the BF stays green(ish)...

Just did a quick check and that still looks green(ish) (and when it's not,
it does not seem related to this change).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Reorder shutdown sequence, to flush pgstats later

От

Bertrand Drouvot

Дата:

27 января, 18:07:01

Hi,

On Fri, Jan 24, 2025 at 03:06:17PM -0500, Andres Freund wrote:
> On 2025-01-15 08:38:33 +0000, Bertrand Drouvot wrote:
> > Fair enough. I'll look at the remaining pieces once this stuff goes in.
> 
> Cool. I did try writing the change, but it does indeed seem better as a
> separate patch. Besides the error message, it also seems we ought to invent a
> different ERRCODE, as neither ERRCODE_ADMIN_SHUTDOWN nor
> ERRCODE_CRASH_SHUTDOWN seem appropriate.

Thanks for the feedback, will look at it.

> Vaguely related and not at all crucial:
> 
> quickdie(), in the PMQUIT_FOR_CRASH, does
>                      errhint("In a moment you should be able to reconnect to the"
>                              " database and repeat your command.")));
> 
> but in case of an immediate *restart* (via pg_ctl restart -m immediate) we'll
> not have such hint. postmaster.c doesn't know it's an immediate restart, so
> that's not surprising, but it still seems a bit weird. One would hope that
> immediate restarts are more common than crashes...

Yeah, it does not make the distinction between an "immediate restart" and an
"immediate stop". I agree that such hint "should" be added in case of "immediate
restart" (and keep "immediate stop" as it is): will give it a look.

> > 
> > > > But that would also need to call TerminateChildren() (a second time) again after
> > > > ConfigurePostmasterWaitSet() which is not ideal though (unless we move the
> > > > TerminateChildren() calls in the switch or add a check on PM_WAIT_DEAD_END in
> > > > the first call).
> > > 
> > > Why would it need to?
> > 
> > Because I thought that TerminateChildren() needs to be called after
> > ConfigurePostmasterWaitSet(false). If not, could new connections be accepted in
> > the window between TerminateChildren() and ConfigurePostmasterWaitSet(false)?
> 
> I don't think that can happen with the attached patch - the
> ConfigurePostmasterWaitSet() is called before HandleFatalError() returns, so
> no new connections can be accepted, as that happens only from
> ServerLoop()->AcceptConnection().

Thanks for the explanation, that was the missing part in my reasoning. So yeah,
all good as no new connections are accepted.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Reorder shutdown sequence, to flush pgstats later

От

Bertrand Drouvot

Дата:

29 января, 13:12:54

Hi,

On Mon, Jan 27, 2025 at 03:07:01PM +0000, Bertrand Drouvot wrote:
> On Fri, Jan 24, 2025 at 03:06:17PM -0500, Andres Freund wrote:
> > but in case of an immediate *restart* (via pg_ctl restart -m immediate) we'll
> > not have such hint. postmaster.c doesn't know it's an immediate restart, so
> > that's not surprising, but it still seems a bit weird. One would hope that
> > immediate restarts are more common than crashes...
> 
> Yeah, it does not make the distinction between an "immediate restart" and an
> "immediate stop". I agree that such hint "should" be added in case of "immediate
> restart" (and keep "immediate stop" as it is): will give it a look.

I looked at it and learned that an immediate restart is nothing but a stop
followed by a start (do_restart()): "pg_ctl" sends a SIGQUIT to postmaster to
indicate an immediate stop and later asks for a start (do_start()).

So when the postmaster receives the SIGQUIT it has no clue that a start will
follow (as you said up-thread).

I’m not familiar with this area of the code, and just discovered it behaves that
way.

To give quickdie() more context I can see 3 options:

1. Use another signal for an immediate restart (but looking at PostmasterMain()
I don’t see any "good" remaining signal to use)
2. Use a shared memory flag to indicate an immediate restart (but this kind of
"communication" looks to be new between "pg_ctl" and postmaster and might be
over complicating things)
3. Use a new DB_SHUTDOWNING_FOR_RESTART or such in DBState and let pg_ctl modify
the control file? (That way we could give more context to quickdie()) 

It looks to me that 3. makes the more sense (but as said I'm not familiar with
this code so maybe there is another option one could think of or my proposal
is not appropriate), thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Re: Reorder shutdown sequence, to flush pgstats later