Re: pgsql: Add kqueue(2) support to the WaitEventSet API.

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: pgsql: Add kqueue(2) support to the WaitEventSet API.
Дата
Msg-id CA+hUKGKsTnvROYqx4hUjGpwvCTYi+=+da7sTPCw2Mf2Yxfdi4Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pgsql: Add kqueue(2) support to the WaitEventSet API.  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Ответы Re: pgsql: Add kqueue(2) support to the WaitEventSet API.  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-committers
On Tue, Mar 17, 2020 at 9:30 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> On 2020-Mar-17, Thomas Munro wrote:
> > Reproduced here.  The problem seems to be that macOS's getppid()
> > returns the debugger's PID, while the debugger is attached.  This
> > doesn't happen on FreeBSD (even though the debugger does internally
> > become the parent, getppid() is careful to return the "real" parent
> > PID so that user space doesn't notice this trickery; apparently Apple
> > made a different choice).
>
> Wow ...  Yeah, that was a known problem with FreeBSD, see
> https://postgr.es/m/1292851036-sup-5399@alvh.no-ip.org
> Evidently FreeBSD must have fixed it, but macOS has not caught up with
> that ...

Oh, interesting.  Sorry to bring a variant of this problem back.

> > The getppid() check is there to close a vanishingly rare race
> > condition: when creating a WaitEventSet, we ask the kernel to tell us
> > when the postmaster exits, but there is a possibility that the
> > postmaster has already exited; normally that causes an error with
> > errno == ESRCH (no such PID, it's already gone), but another unrelated
> > process might have started that has the same PID, so we check if our
> > ppid has changed after a successful return code.  That's not going to
> > work under a debugger on this OS.
>
> Irk.

I'm now far away from my home Mac so I can't test until later but I
think we can fix this by double checking with the pipe:

-       else if (event->events == WL_POSTMASTER_DEATH && PostmasterPid
!= getppid())
+       else if (event->events == WL_POSTMASTER_DEATH &&
+                        PostmasterPid != getppid() &&
+                        !PostmasterIsAliveInternal())
+       {
+               /*
+                * The extra PostmasterIsAliveInternal() check
prevents false alarms
+                * from systems where getppid() returns a debugger PID
while being
+                * traced.
+                */
                set->report_postmaster_not_running = true;
+       }

The fast getppid() check will prevent the slow and redundant
PostmasterIsAliveInternal() check from being reached on production
systems, until the postmaster really is gone in the race scenario
described.

(Note that all of this per-lock-wait work will go away with
https://commitfest.postgresql.org/27/2452/, so I'm glad Alexander
found this now).



В списке pgsql-committers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: pgsql: Add kqueue(2) support to the WaitEventSet API.
Следующее
От: Alvaro Herrera
Дата:
Сообщение: pgsql: Update comment