Обсуждение: Dubious assertion in RegisterDynamicBackgroundWorker

Поиск
Список
Период
Сортировка

Dubious assertion in RegisterDynamicBackgroundWorker

От
Tom Lane
Дата:
I noticed this recent crash on lorikeet:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&dt=2021-05-05%2009%3A19%3A29

The relevant bits of the log seem to be

2021-05-05 05:36:22.011 EDT [60926716.bb24:1] ERROR:  could not map dynamic shared memory segment
...
2021-05-05 05:36:22.013 EDT [609266c5.b793:4] LOG:  background worker "parallel worker" (PID 47908) exited with exit
code1 
...
TRAP: FailedAssertion("BackgroundWorkerData->parallel_register_count - BackgroundWorkerData->parallel_terminate_count
<=MAX_PARALLEL_WORKER_LIMIT", File:
"/home/andrew/bf/root/REL_13_STABLE/pgsql.build/../pgsql/src/backend/postmaster/bgworker.c",Line: 1016) 
*** starting debugger for pid 47743, tid 1264
2021-05-05 05:36:26.629 EDT [609266c5.b793:5] LOG:  server process (PID 47743) exited with exit code 127

So we had a parallel worker fail to start, whereupon its leader went down
with an assertion failure.  I know that the parallel-worker code is held
together with chewing gum and baling wire, but that's a bit much.

Looking at the indicated code, we find

    /*
     * If this is a parallel worker, check whether there are already too many
     * parallel workers; if so, don't register another one.  Our view of
     * parallel_terminate_count may be slightly stale, but that doesn't really
     * matter: we would have gotten the same result if we'd arrived here
     * slightly earlier anyway.  There's no help for it, either, since the
     * postmaster must not take locks; a memory barrier wouldn't guarantee
     * anything useful.
     */
    if (parallel && (BackgroundWorkerData->parallel_register_count -
                     BackgroundWorkerData->parallel_terminate_count) >=
        max_parallel_workers)
    {
        Assert(BackgroundWorkerData->parallel_register_count -
               BackgroundWorkerData->parallel_terminate_count <=
               MAX_PARALLEL_WORKER_LIMIT);
        LWLockRelease(BackgroundWorkerLock);
        return false;
    }

I would like to know on what grounds that Assert isn't insanity?
We just finished pointing out that we might see an old
parallel_terminate_count, which ISTM says specifically that
parallel_register_count minus parallel_terminate_count might
be larger than expected.

Admittedly, it seems unlikely that the difference could exceed
MAX_PARALLEL_WORKER_LIMIT = 1024 in a regression test run where
the limit on number of parallel workers is only 8.  What I think is
more likely, given that these counters are unsigned, is that the
difference was actually negative.  Which could be a bug, or it could
be an expectable race condition, or it could just be some flakiness
on lorikeet's part (that machine has had a lot of issues lately).

I trawled the buildfarm logs going back 180 days, and found no
other instances of this assertion, which seems to be evidence
in favor of the "lorikeet got flaky" theory.  But it's not proof.

In any case, I see zero value in this assertion, so I propose
we remove it.  If we don't remove it, it needs serious revision,
because it seems absolutely obvious to me that it could trigger
when there is nothing wrong.  A system pushing the limit of
number of parallel workers would be at considerable risk.

            regards, tom lane



Re: Dubious assertion in RegisterDynamicBackgroundWorker

От
Robert Haas
Дата:
On Wed, May 5, 2021 at 3:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Admittedly, it seems unlikely that the difference could exceed
> MAX_PARALLEL_WORKER_LIMIT = 1024 in a regression test run where
> the limit on number of parallel workers is only 8.  What I think is
> more likely, given that these counters are unsigned, is that the
> difference was actually negative.  Which could be a bug, or it could
> be an expectable race condition, or it could just be some flakiness
> on lorikeet's part (that machine has had a lot of issues lately).

I think that assertion was added by me, and I think the thought
process was that the value shouldn't go negative and that if it does
it's probably a bug which we might want to fix. But since the values
are unsigned I could hardly check for < 0, so I did it this way
instead.

But since there's no memory barrier between the two loads, I guess
there's no guarantee that they have the expected relationship, even if
there is a memory barrier on the store side. I wonder if it's worth
trying to tighten that up so that the assertion is more meaningful, or
just give up and rip it out. I'm afraid that if we do have (or
develop) bugs in this area, someone will discover that the effective
max_parallel_workers value on their system slowly drifts up or down
from the configured value, and we'll have no clue where things are
going wrong. The assertion was intended to give us a chance of
noticing that sort of problem in the buildfarm or on a developer's
machine before the code gets out into the real world.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Dubious assertion in RegisterDynamicBackgroundWorker

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, May 5, 2021 at 3:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Admittedly, it seems unlikely that the difference could exceed
>> MAX_PARALLEL_WORKER_LIMIT = 1024 in a regression test run where
>> the limit on number of parallel workers is only 8.  What I think is
>> more likely, given that these counters are unsigned, is that the
>> difference was actually negative.  Which could be a bug, or it could
>> be an expectable race condition, or it could just be some flakiness
>> on lorikeet's part (that machine has had a lot of issues lately).

> I think that assertion was added by me, and I think the thought
> process was that the value shouldn't go negative and that if it does
> it's probably a bug which we might want to fix. But since the values
> are unsigned I could hardly check for < 0, so I did it this way
> instead.

> But since there's no memory barrier between the two loads, I guess
> there's no guarantee that they have the expected relationship, even if
> there is a memory barrier on the store side. I wonder if it's worth
> trying to tighten that up so that the assertion is more meaningful, or
> just give up and rip it out. I'm afraid that if we do have (or
> develop) bugs in this area, someone will discover that the effective
> max_parallel_workers value on their system slowly drifts up or down
> from the configured value, and we'll have no clue where things are
> going wrong. The assertion was intended to give us a chance of
> noticing that sort of problem in the buildfarm or on a developer's
> machine before the code gets out into the real world.

I follow your concern, but I'm not convinced that this assertion is
a useful aid; first because the asynchrony involved makes the edge
cases rather squishy, and second because allowing 1024 bogus
increments before complaining will likely mean that developer test
runs will not last long enough to trigger the assertion, and third
because if it does fire it's too far removed from the perpetrator
to be much help in figuring out what went wrong, or even if
anything *is* wrong.

I've not tried to trace the code, but I'm now a bit suspicious
that there is indeed a design bug here.  I gather from the
comments that parallel_register_count is incremented by the
worker processes, which of course implies that a worker that
fails to reattach to shared memory won't do that.  But
parallel_terminate_count is incremented by the postmaster.
If the postmaster will do that even in the case of a worker that
failed at startup, then lorikeet's symptoms are neatly explained.

I'd be more comfortable with this code if the increments and
decrements were handled by the same process.

            regards, tom lane



Re: Dubious assertion in RegisterDynamicBackgroundWorker

От
Robert Haas
Дата:
On Wed, May 5, 2021 at 5:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I follow your concern, but I'm not convinced that this assertion is
> a useful aid; first because the asynchrony involved makes the edge
> cases rather squishy, and second because allowing 1024 bogus
> increments before complaining will likely mean that developer test
> runs will not last long enough to trigger the assertion, and third
> because if it does fire it's too far removed from the perpetrator
> to be much help in figuring out what went wrong, or even if
> anything *is* wrong.

Well, it allows 1024 bogus increments in one direction, but a negative
value will trip it PDQ.

> I've not tried to trace the code, but I'm now a bit suspicious
> that there is indeed a design bug here.  I gather from the
> comments that parallel_register_count is incremented by the
> worker processes, which of course implies that a worker that
> fails to reattach to shared memory won't do that.  But
> parallel_terminate_count is incremented by the postmaster.
> If the postmaster will do that even in the case of a worker that
> failed at startup, then lorikeet's symptoms are neatly explained.

parallel_register_count is incremented by
RegisterDynamicBackgroundWorker, i.e. when the worker process is
requested, not after it starts. To try to do it from the worker after
it's launched would, as you suppose, be completely busted. The more
general point here is that we should be adjusting these at the same
point we allocate and free the worker slots. Wherever that happens is
the right place to do this, because the goal of the mechanism is
precisely to limit the number of such slots that can be used by
parallel query.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Dubious assertion in RegisterDynamicBackgroundWorker

От
Tom Lane
Дата:
I wrote:
> I've not tried to trace the code, but I'm now a bit suspicious
> that there is indeed a design bug here.  I gather from the
> comments that parallel_register_count is incremented by the
> worker processes, which of course implies that a worker that
> fails to reattach to shared memory won't do that.  But
> parallel_terminate_count is incremented by the postmaster.
> If the postmaster will do that even in the case of a worker that
> failed at startup, then lorikeet's symptoms are neatly explained.

That theory seems to be nonsense.  After a bit more study of the
code, I see that parallel_register_count is incremented by the *leader*
process, when it reserves a BackgroundWorkerSlot for the worker.
And parallel_terminate_count is incremented by the postmaster when
it releases the slot; so it's darn hard to see how
parallel_terminate_count could get ahead of parallel_register_count.

I noticed that lorikeet's worker didn't fail at shared memory reattach,
as I first thought, anyway.  It failed at
    ERROR:  could not map dynamic shared memory segment
which means we ought to be able to reproduce the symptoms by faking
failure of dsm_attach(), as I did in the quick hack attached.
What I get is a lot of "parallel worker failed to initialize" and
"lost connection to parallel worker" errors, but no assertion.
(I also tried this with an EXEC_BACKEND build, just in case that'd
change the behavior, but it didn't.)  So it seems like the "lorikeet
is flaky" theory is looking pretty plausible.

I do see what seems to be a bug-let in ForgetBackgroundWorker.
BackgroundWorkerStateChange is careful to do this when freeing
a slot:

            /*
             * We need a memory barrier here to make sure that the load of
             * bgw_notify_pid and the update of parallel_terminate_count
             * complete before the store to in_use.
             */
            notify_pid = slot->worker.bgw_notify_pid;
            if ((slot->worker.bgw_flags & BGWORKER_CLASS_PARALLEL) != 0)
                BackgroundWorkerData->parallel_terminate_count++;
            pg_memory_barrier();
            slot->pid = 0;
            slot->in_use = false;

but the mainline case in ForgetBackgroundWorker is a lot less
paranoid:

    Assert(rw->rw_shmem_slot < max_worker_processes);
    slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
    if ((rw->rw_worker.bgw_flags & BGWORKER_CLASS_PARALLEL) != 0)
        BackgroundWorkerData->parallel_terminate_count++;

    slot->in_use = false;

One of these functions is mistaken.  However, I can't construct
a theory whereby that explains lorikeet's symptoms, mainly because
Intel chips don't do out-of-order stores so the messing with
parallel_terminate_count should be done before in_use is cleared,
even without an explicit memory barrier.

            regards, tom lane

diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 3550ef13ba..d3b00c2f9e 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -1278,6 +1278,11 @@ ParallelWorkerMain(Datum main_arg)
                                                  "Parallel worker",
                                                  ALLOCSET_DEFAULT_SIZES);

+    if (random() < INT_MAX / 64)
+        ereport(ERROR,
+                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                 errmsg("fake failure to map dynamic shared memory segment")));
+
     /*
      * Attach to the dynamic shared memory segment for the parallel query, and
      * find its table of contents.

Re: Dubious assertion in RegisterDynamicBackgroundWorker

От
Noah Misch
Дата:
On Wed, May 05, 2021 at 03:46:43PM -0400, Tom Lane wrote:
> I know that the parallel-worker code is held
> together with chewing gum and baling wire

Having spent time in that code, I didn't notice such weakness.  I'm sure it
has bugs, but I doubt its bug density is unusual for PostgreSQL.