Обсуждение: BUG #16785: Postgresql shutdown during initial pg_prewarm prewarming causes Pg to get stuck

Поиск
Список
Период
Сортировка

BUG #16785: Postgresql shutdown during initial pg_prewarm prewarming causes Pg to get stuck

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      16785
Logged by:          John Thompson
Email address:      jthompson@maxmind.com
PostgreSQL version: 12.5
Operating system:   Ubuntu Linux
Description:

We use pg_prewarm via shared_preload_libraries and we've noticed that if we
start Pg then stop it again before pg_prewarm finishes its initial
prewarming, Pg gets stuck in shutdown until we kill the 'autoprewarm master'
process with 'kill -9'. Removing pg_prewarm from shared_preload_libraries
allows Pg to stop without issue. Also, if we start Pg, wait for 'autoprewarm
successfully prewarmed XXXXXX of YYYYYY previously-loaded blocks' to appear
in the logs, then stop, it shuts down as expected.

This is happening on several different servers, but the one I've been
testing on has 855569 lines in
/var/lib/postgresql/12/main/autoprewarm.blocks, 32GB of RAM, and
shared_buffers is 8028MB.

We're using the 12.5 packages from apt.postgresql.org. Normally we use
systemd to start and stop services, but here's an example with just
pg_ctl:

$ sudo -u postgres /usr/lib/postgresql/12/bin/pg_ctl start -D
/var/lib/postgresql/12/main -l /var/log/postgresql/postgresql-12-main.log -o
'-c config_file="/etc/postgresql/12/main/postgresql.conf"'
waiting for server to start.... done
server started
$ sudo -u postgres /usr/lib/postgresql/12/bin/pg_ctl stop -s -w -D
/var/lib/postgresql/12/main -m fast
pg_ctl: server does not shut down
$ ps auxwww | grep ^postgres
postgres   448  0.7  0.7 8667168 240252 ?      Ss   21:10   0:01
/usr/lib/postgresql/12/bin/postgres -D /var/lib/postgresql/12/main -c
config_file=/etc/postgresql/12/main/postgresql.conf
postgres   455  0.1  0.0 182036  4252 ?        Ss   21:10   0:00 postgres:
logger   
postgres   458  0.0  0.0 8666132 4088 ?        Ss   21:10   0:00 postgres:
checkpointer   
postgres   462  0.0  0.0 185232  5928 ?        Ss   21:10   0:00 postgres:
stats collector   
postgres   463  0.3  0.0 8682976 22740 ?       Ss   21:10   0:00 postgres:
autoprewarm master   
postgres   507  0.0  0.0 8667996 12616 ?       Ss   21:10   0:00 postgres:
walsender repmgr 10.150.0.16(33718) streaming 34A65/4243A5C8
postgres   508  0.0  0.0 8667996 12612 ?       Ss   21:10   0:00 postgres:
walsender repmgr 10.150.0.15(52982) streaming 34A65/4243A5C8
$ sudo kill 463 -9
$ ps auxwww | grep ^postgres
$ 

Please let me know if any other information would be useful, I'm happy to
help. Thanks for any help you can offer.


Re: BUG #16785: Postgresql shutdown during initial pg_prewarm prewarming causes Pg to get stuck

От
Tom Lane
Дата:
PG Bug reporting form <noreply@postgresql.org> writes:
> We use pg_prewarm via shared_preload_libraries and we've noticed that if we
> start Pg then stop it again before pg_prewarm finishes its initial
> prewarming, Pg gets stuck in shutdown until we kill the 'autoprewarm master'
> process with 'kill -9'.

Yeah.  I assume you've got multiple large databases?  When I try this
here, the current autoprewarm worker process shuts down promptly, but
then the master process tries to launch another worker for the next DB,
and then just sits there because the postmaster is now ignoring worker
start requests.

So this is partly autoprewarm's fault, because it's taking no thought
for the possibility that a shutdown will be requested before it's
managed to preload everything.  But it's *mostly* the fault of the
incredibly brain-dead design for background workers, which has no
way to tell worker requestors that their request is never going to
be granted.  I've ranted about that before, with little effect on
the people responsible, so I guess at some point I'm going to have
to go fix it myself.

In the meantime, attached is a proposed patch that improves
autoprewarm's behavior.  It will only get stuck if you're unlucky
enough to issue "pg_ctl stop" just as it's trying to launch
another worker, which should be a pretty tight window.

I noticed while testing this that when autoprewarm did get out of
the situation promptly, it'd rewrite autoprewarm.blocks with only
as many blocks as it'd managed to prewarm before being canceled.
This doesn't seem like a good thing, because then the next start
might have next door to no prewarm behavior.  So the patch also
hacks it to skip the final autoprewarm.blocks dump if we were
told to terminate before the initial prewarming pass is done.

This is against HEAD but it seems to apply cleanly to v12.
If you're in a position to rebuild pg_prewarm with the patch,
please see if it resolves the problem for you.

            regards, tom lane

diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index bfcea9bde3..8d6c81998c 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -153,6 +153,7 @@ void
 autoprewarm_main(Datum main_arg)
 {
     bool        first_time = true;
+    bool        final_dump_allowed = true;
     TimestampTz last_dump_time = 0;

     /* Establish signal handlers; once that's done, unblock signals. */
@@ -193,10 +194,15 @@ autoprewarm_main(Datum main_arg)
      * There's not much point in performing a dump immediately after we finish
      * preloading; so, if we do end up preloading, consider the last dump time
      * to be equal to the current time.
+     *
+     * If apw_load_buffers() is terminated early by a shutdown request,
+     * prevent dumping out our state below the loop, because we'd effectively
+     * just truncate the saved state to however much we'd managed to preload.
      */
     if (first_time)
     {
         apw_load_buffers();
+        final_dump_allowed = !ShutdownRequestPending;
         last_dump_time = GetCurrentTimestamp();
     }

@@ -254,7 +260,8 @@ autoprewarm_main(Datum main_arg)
      * Dump one last time.  We assume this is probably the result of a system
      * shutdown, although it's possible that we've merely been terminated.
      */
-    apw_dump_now(true, true);
+    if (final_dump_allowed)
+        apw_dump_now(true, true);
 }

 /*
@@ -387,6 +394,18 @@ apw_load_buffers(void)
         if (!have_free_buffer())
             break;

+        /*
+         * Likewise, don't launch if we've already been told to shut down.
+         *
+         * There is a race condition here: if the postmaster has received a
+         * fast-shutdown signal, but we've not heard about it yet, then the
+         * postmaster will ignore our worker start request and we'll wait
+         * forever.  However, that's a bug in the general background-worker
+         * logic, not the fault of this module.
+         */
+        if (ShutdownRequestPending)
+            break;
+
         /*
          * Start a per-database worker to load blocks for this database; this
          * function will return once the per-database worker exits.
@@ -404,10 +423,11 @@ apw_load_buffers(void)
     apw_state->pid_using_dumpfile = InvalidPid;
     LWLockRelease(&apw_state->lock);

-    /* Report our success. */
-    ereport(LOG,
-            (errmsg("autoprewarm successfully prewarmed %d of %d previously-loaded blocks",
-                    apw_state->prewarmed_blocks, num_elements)));
+    /* Report our success, if we were able to finish. */
+    if (!ShutdownRequestPending)
+        ereport(LOG,
+                (errmsg("autoprewarm successfully prewarmed %d of %d previously-loaded blocks",
+                        apw_state->prewarmed_blocks, num_elements)));
 }

 /*

Re: BUG #16785: Postgresql shutdown during initial pg_prewarm prewarming causes Pg to get stuck

От
Tom Lane
Дата:
I wrote:
> This is against HEAD but it seems to apply cleanly to v12.
> If you're in a position to rebuild pg_prewarm with the patch,
> please see if it resolves the problem for you.

Eh, sorry about that, patch(1) was willing to apply it but the
result didn't compile :-(.  If you want to try local testing,
the version that actually works in v12 is at

https://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=f581e5383648913385d3e7789ad9323a3ebfb1d0

            regards, tom lane



Re: BUG #16785: Postgresql shutdown during initial pg_prewarm prewarming causes Pg to get stuck

От
John Thompson
Дата:
Hi Tom,

Thanks so much for taking a look at this. As you mentioned the patch didn't apply cleanly to 12, but I ended up making the same changes as are in the link you posted, and it fixes the problem in my testing. I appreciate your help.

Take care


On Tue, Dec 22, 2020 at 12:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> This is against HEAD but it seems to apply cleanly to v12.
> If you're in a position to rebuild pg_prewarm with the patch,
> please see if it resolves the problem for you.

Eh, sorry about that, patch(1) was willing to apply it but the
result didn't compile :-(.  If you want to try local testing,
the version that actually works in v12 is at

https://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=f581e5383648913385d3e7789ad9323a3ebfb1d0

                        regards, tom lane

Re: BUG #16785: Postgresql shutdown during initial pg_prewarm prewarming causes Pg to get stuck

От
Tom Lane
Дата:
John Thompson <jthompson@maxmind.com> writes:
> Thanks so much for taking a look at this. As you mentioned the patch didn't
> apply cleanly to 12, but I ended up making the same changes as are in the
> link you posted, and it fixes the problem in my testing. I appreciate your
> help.

OK, thanks for confirming!

            regards, tom lane