Обсуждение: Re: convert libpgport's pqsignal() to a void function

Поиск
Список
Период
Сортировка

Re: convert libpgport's pqsignal() to a void function

От
Tom Lane
Дата:
Nathan Bossart <nathandbossart@gmail.com> writes:
> Thanks to commit 9a45a89, legacy-pqsignal.c now has its own dedicated
> extern for pqsignal(), which decouples it enough that we can follow through
> with changing libpqport's pqsignal() to a void function.

> Thoughts?

LGTM, although I don't know enough about Windows to know if the
"== SIG_ERR" test in that path is correct.

            regards, tom lane



Re: convert libpgport's pqsignal() to a void function

От
Nathan Bossart
Дата:
On Tue, Jan 14, 2025 at 10:02:46PM -0500, Tom Lane wrote:
> LGTM, although I don't know enough about Windows to know if the
> "== SIG_ERR" test in that path is correct.

It's apparently not [0].  :(

My guess is that this has something to do with redefining SIG_ERR in
win32_port.h.  We might be able to use push_macro/pop_macro to keep the old
value around, but at the moment I'm leaning towards just removing the
assertion in that path.

[0] https://cirrus-ci.com/task/6237809813487616

-- 
nathan



Re: convert libpgport's pqsignal() to a void function

От
Tom Lane
Дата:
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Tue, Jan 14, 2025 at 10:02:46PM -0500, Tom Lane wrote:
>> LGTM, although I don't know enough about Windows to know if the
>> "== SIG_ERR" test in that path is correct.

> It's apparently not [0].  :(

Bleah.

> My guess is that this has something to do with redefining SIG_ERR in
> win32_port.h.  We might be able to use push_macro/pop_macro to keep the old
> value around, but at the moment I'm leaning towards just removing the
> assertion in that path.

I wonder why we redefine those values?  But I tend to agree that just
removing the test is sufficient for now.  Given the lack of failure
checks in the existing code, and the lack of trouble reports
suggesting any problem, it's hard to muster enthusiasm for spending
a lot of effort on this.

            regards, tom lane



Re: convert libpgport's pqsignal() to a void function

От
Thomas Munro
Дата:
On Thu, Jan 16, 2025 at 8:15 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> On Tue, Jan 14, 2025 at 11:08:05PM -0500, Tom Lane wrote:
> > Nathan Bossart <nathandbossart@gmail.com> writes:
> >> My guess is that this has something to do with redefining SIG_ERR in
> >> win32_port.h.  We might be able to use push_macro/pop_macro to keep the old
> >> value around, but at the moment I'm leaning towards just removing the
> >> assertion in that path.
> >
> > I wonder why we redefine those values?
>
> I wondered the same.  Those redefines have been there since commit 5049196,
> but I haven't been able to find any real discussion in the archives about
> it.  Maybe I will bug Magnus about it sometime, in case he happens to
> remember the reason.

My guess would be: perhaps some ancient version of MinGW didn't define
them?  They're defined by MinGW and native signal.h now and they have
the same values, so we should remove them I think.

Assertion failed: 0, file ../src/port/pqsignal.c, line 147

Could be due to calling native signal() with a signal number other
than the 6 values required to work by the C standard?



Re: convert libpgport's pqsignal() to a void function

От
Nathan Bossart
Дата:
On Thu, Jan 16, 2025 at 08:21:08AM +1300, Thomas Munro wrote:
> On Thu, Jan 16, 2025 at 8:15 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> On Tue, Jan 14, 2025 at 11:08:05PM -0500, Tom Lane wrote:
>> > I wonder why we redefine those values?
>>
>> I wondered the same.  Those redefines have been there since commit 5049196,
>> but I haven't been able to find any real discussion in the archives about
>> it.  Maybe I will bug Magnus about it sometime, in case he happens to
>> remember the reason.
> 
> My guess would be: perhaps some ancient version of MinGW didn't define
> them?  They're defined by MinGW and native signal.h now and they have
> the same values, so we should remove them I think.

Okay.  If nothing else, it'd be interesting to see what the buildfarm
thinks.

> Assertion failed: 0, file ../src/port/pqsignal.c, line 147
> 
> Could be due to calling native signal() with a signal number other
> than the 6 values required to work by the C standard?

Looking closer, that probably makes more sense than my SIG_ERR redefinition
theory.  If that assertion is getting hit, that means signal() _is_
returning SIG_ERR (either the system one or our redefined version), and it
looks like it's pretty common to use -1 for SIG_ERR.  That'd only affect
Windows frontend programs, but it still sounds scary.  I'll try getting
more details about the error with some custom cfbot runs.

-- 
nathan



Re: convert libpgport's pqsignal() to a void function

От
Nathan Bossart
Дата:
On Wed, Jan 15, 2025 at 01:47:18PM -0600, Nathan Bossart wrote:
> On Thu, Jan 16, 2025 at 08:21:08AM +1300, Thomas Munro wrote:
>> Could be due to calling native signal() with a signal number other
>> than the 6 values required to work by the C standard?
> 
> Looking closer, that probably makes more sense than my SIG_ERR redefinition
> theory.  If that assertion is getting hit, that means signal() _is_
> returning SIG_ERR (either the system one or our redefined version), and it
> looks like it's pretty common to use -1 for SIG_ERR.  That'd only affect
> Windows frontend programs, but it still sounds scary.  I'll try getting
> more details about the error with some custom cfbot runs.

I think this is what's happening.  cfbot's test failures are caused by
initdb's setup_signals().  The call to pqsignal(SIGHUP, trapsig) seems to
fail because SIGHUP isn't a real signal number, just something that's made
up in win32_port.h.  This SIGHUP definition was added by commit 12c9423 in
May 2003, then the pqsignal(SIGHUP, ...) call was added in initdb by commit
279598b in November 2003, but it might not have been broken at that time
because it doesn't look like initdb.c included the SIGHUP #define.  In any
case, I think this has been broken since at least commit ed9b360 (November
2017), if not earlier.

Perhaps we should surround all those extra signal #defines in win32_port.h
with an #ifndef FRONTEND.  initdb seems to be good about avoiding
pqsignal() calls if the signal doesn't exist, but I wouldn't be surprised
if there are other frontend programs that are not so cautious.  I'll give
it a try on cfbot and see what breaks...

-- 
nathan



Re: convert libpgport's pqsignal() to a void function

От
Thomas Munro
Дата:
On Thu, Jan 16, 2025 at 8:47 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> On Thu, Jan 16, 2025 at 08:21:08AM +1300, Thomas Munro wrote:
> > Could be due to calling native signal() with a signal number other
> > than the 6 values required to work by the C standard?
>
> Looking closer, that probably makes more sense than my SIG_ERR redefinition
> theory.  If that assertion is getting hit, that means signal() _is_
> returning SIG_ERR (either the system one or our redefined version), and it
> looks like it's pretty common to use -1 for SIG_ERR.  That'd only affect
> Windows frontend programs, but it still sounds scary.  I'll try getting
> more details about the error with some custom cfbot runs.

Windows barely has signals, so I don't think it's too scary...  SIGINT
works but surprisingly runs the handler in another thread when you
press ^C, SIGABRT, SIGFPE, SIGSEGV presumably have the obvious
synchronous/exception-trapping implementations but we aren't even
trying to catch those*, and SIGTERM, SIGILL are pro forma, never
generated by the system.  We've basically just invented more pro forma
ones that will also never be generated except in the backend's
entirely separate fake signal system that I hope to remove.   +1 for
your idea of not defining them at all outside the backend, it's just
confusing noise.

The second surprising thing, unless you're an armchair Unix
archeologist, is that all but SIGFPE revert to SIG_DFL when they fire,
like POSIX's SA_RESETHAND mode.  I don't know if it also has
SA_NODEFER behaviour (a combination of behaviours seen in some old
Unixen of yestermillennium: your handler had to keep reinstalling
itself, cf initdb.c:trapsig, but for a brief window the default
process-terminating-core-dumping-nasal-daemon behaviour was installed
and there was nothing you could do about that race; the BSD crew fixed
that mistake a long time ago, everyone does it that way now, and POSIX
sigaction() made those policies explicit and is the recommended
replacement, but to this day the C standard has just signal() with
undefined semantics and no requirement that any of it even work).

*Even in the backend we don't catch Windows' native SIGFPE AFAICS, so
I guess those must exit instead of being converted to an ERROR by
FloatExceptionHandler.  I wonder if there is a way to reach that
condition from a SQL query.



Re: convert libpgport's pqsignal() to a void function

От
Thomas Munro
Дата:
On Thu, Jan 16, 2025 at 12:47 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:
> On Thu, Jan 16, 2025 at 11:07:41AM +1300, Thomas Munro wrote:
> > +1 for your idea of not defining them at all outside the backend, it's
> > just confusing noise.
>
> I tried that, but these extra signals are needed even in the frontend for
> pgkill(), etc.

Right, of course.

(In an unconf in Vancouver, a group of us were plotting that there
should be a more explicit control socket to talk to the postmaster, on
all OSes, to get current status, recovery progress, etc, and also
support shutdown commands more explicitly.  If you separately turn all
the backend IPC signals into interrupts (CF 5118 and further work like
that), and these cluster-level ones you just mentioned into commands
in the hypothetical new cluster control protocol, then you don't need
fake signals for Windows anymore, but apparently I got ahead of myself
with that comment :-))

>  My next thought was to simply ignore signal() errors for
> the extra signals, but I don't think that's very nice because it just masks
> broken code.  Finally, I settled on modifying initdb's setup_signals() to
> use WIN32 checks instead of checking for the signals themselves.  This is
> how it's done elsewhere, and this is apparently all that's needed to make
> cfbot's Windows run happy.

Cool.

> I've also attached a 0002 that removes the redefinitions of SIG_ERR and
> friends.  That looks good on cfbot, but we'll see what the buildfarm has to
> say...

Ditto.



Re: convert libpgport's pqsignal() to a void function

От
Nathan Bossart
Дата:
I've now committed all of this.  I ended up finding a couple other frontend
programs that called pqsignal() with an invalid signal number on Windows,
so I fixed those as well.  AFAICT the reason I didn't catch them in my
earlier testing is because they aren't tested!  I'll keep an eye on the
buildfarm for any problems with the SIG_* redefinition removal, too.

Thanks for reviewing and for sharing lots of context about this stuff.

-- 
nathan