Обсуждение: Converting pqsignal to void return

Поиск

Список

Период

Сортировка

Converting pqsignal to void return

От

Paul Ramsey

Дата:

22 января, 18:57:52

Nathan,

I just ran across this change when a bunch of CI’s I run barfed.

https://github.com/postgres/postgres/commit/d4a43b283751b23d32bbfa1ecc2cad2d16e3dde9

The problem we are having in extension land is that we often run functions in external libraries that take a long time to return, and we would like to ensure that PgSQL users can cancel their queries, even when control has passed into those functions.

The way we have done it, historically, has been to take the return value of pqsignal(SIGINT, extension_signint_handler) and remember it, and then, inside extension_signint_handler, call the pgsql handler once we have done our own business.

https://github.com/pramsey/pgsql-http/blob/master/http.c#L345

It is possible we have been Doing It Wrong all this time, and would love some pointers on the right way to do this.

Alternatively, if we have a valid use case, it would be nice to have the return value from pqsignal() back.

Thanks!

Paul

Re: Converting pqsignal to void return

От

Nathan Bossart

Дата:

22 января, 19:22:45

On Wed, Jan 22, 2025 at 07:57:52AM -0800, Paul Ramsey wrote:
> I just ran across this change when a bunch of CI´s I run barfed.

Oops, sorry.

> The problem we are having in extension land is that we often run
> functions in external libraries that take a long time to return, and we
> would like to ensure that PgSQL users can cancel their queries, even when
> control has passed into those functions.
> 
> The way we have done it, historically, has been to take the return value
> of pqsignal(SIGINT, extension_signint_handler) and remember it, and then,
> inside extension_signint_handler, call the pgsql handler once we have
> done our own business..

I see a couple other projects that might be doing something similar [0].

> It is possible we have been Doing It Wrong all this time, and would love
> some pointers on the right way to do this.
> 
> Alternatively, if we have a valid use case, it would be nice to have the
> return value from pqsignal() back.

I don't think you were doing it wrong before, but commit 3b00fdb did add
some weird race conditions to the return value, but only in rare cases that
I don't think apply to the case you shared.

I was tempted to suggest that you try using sigaction() with the second
"act" argument set to NULL to retrieve the original signal handler, but I
think there's a good chance you'd get our wrapper_handler() function (we
store the actual handlers in a separate array), which is no good.  So, I
should probably just revert d4a43b2.  An alternative approach could be to
provide our own way to retrieve the current handler, but that comes with
its own race conditions, although perhaps those are more obvious than the
pqsignal() ones.  WDYT?

[0] https://codesearch.debian.net/search?q=%3D+pqsignal&literal=1

-- 
nathan

Re: Converting pqsignal to void return

От

Andres Freund

Дата:

22 января, 19:50:30

Hi,

On 2025-01-22 07:57:52 -0800, Paul Ramsey wrote:
> I just ran across this change when a bunch of CI’s I run barfed.
>
> https://github.com/postgres/postgres/commit/d4a43b283751b23d32bbfa1ecc2cad2d16e3dde9
>
> The problem we are having in extension land is that we often run functions
> in external libraries that take a long time to return, and we would like to
> ensure that PgSQL users can cancel their queries, even when control has
> passed into those functions.
>
> The way we have done it, historically, has been to take the return value of
> pqsignal(SIGINT, extension_signint_handler) and remember it, and then,
> inside extension_signint_handler, call the pgsql handler once we have done
> our own business.

I think that's a really bad idea. For one, postgres might use signals in
different process types for different things. We are also working on not using
plain signals in a bunch of places.

It's also really hard to write correct signal handlers. E.g.:

> https://github.com/pramsey/pgsql-http/blob/master/http.c#L345

That signal handler is profoundly unsafe:

http_interrupt_handler(int sig)
{
    /* Handle the signal here */
    elog(DEBUG2, "http_interrupt_handler: sig=%d", sig);
    http_interrupt_requested = sig;
    pgsql_interrupt_handler(sig);
    return;
}

You're definitely not allowed to elog in a signal handler unless you take
*extraordinary* precautions (basically converting the normal execution path to
only perform async-signal-safe work).

This signal handler can e.g. corrupt the memory context used by elog.c,
deadlock in the libc memory allocator, break the FE/BE protocol if any client
ever set client_min_messages=debug2, jump out of a signal handler in case of
an allocation failure and many other things.

> It is possible we have been Doing It Wrong all this time, and would love some pointers on the right way to do this.

All your interrupt handler is doing "for you" is setting
http_interrupt_requested, right?  Why do you need that variable? Seems you
could just check postgres' QueryCancelPending? And perhaps ProcDiePending, if
you want to handle termination too.

Greetings,

Andres Freund

Re: Converting pqsignal to void return

От

Andres Freund

Дата:

22 января, 20:01:01

Hi,

On 2025-01-22 11:50:30 -0500, Andres Freund wrote:
> On 2025-01-22 07:57:52 -0800, Paul Ramsey wrote:
> > It is possible we have been Doing It Wrong all this time, and would love some pointers on the right way to do
this.
> 
> All your interrupt handler is doing "for you" is setting
> http_interrupt_requested, right?  Why do you need that variable? Seems you
> could just check postgres' QueryCancelPending? And perhaps ProcDiePending, if
> you want to handle termination too.

FWIW, the even better approach in this case probably would be to use curl in a
non-blocking manner. Then you can do all the blocking in a loop around
WaitLatchOrSocket (or, for slightly improved efficiency, create a wait event
and wait with WaitEventSetWait()) and react to interrupts with
CHECK_FOR_INTERRUPTS() like normal postgres code.

Greetings,

Andres Freund

Re: Converting pqsignal to void return

От

Paul Ramsey

Дата:

22 января, 20:01:09


> On Jan 22, 2025, at 8:50 AM, Andres Freund <andres@anarazel.de> wrote:
>
>> It is possible we have been Doing It Wrong all this time, and would love some pointers on the right way to do this.
>
> All your interrupt handler is doing "for you" is setting
> http_interrupt_requested, right?  Why do you need that variable? Seems you
> could just check postgres' QueryCancelPending? And perhaps ProcDiePending, if
> you want to handle termination too.

This is the Knowledge that was sitting right in front of me, but I could not see.
From the road to Damascus,
Paul

Re: Converting pqsignal to void return

От

Andres Freund

Дата:

22 января, 20:36:01

Hi,

On 2025-01-22 10:22:45 -0600, Nathan Bossart wrote:
> On Wed, Jan 22, 2025 at 07:57:52AM -0800, Paul Ramsey wrote:
> > The problem we are having in extension land is that we often run
> > functions in external libraries that take a long time to return, and we
> > would like to ensure that PgSQL users can cancel their queries, even when
> > control has passed into those functions.
> >
> > The way we have done it, historically, has been to take the return value
> > of pqsignal(SIGINT, extension_signint_handler) and remember it, and then,
> > inside extension_signint_handler, call the pgsql handler once we have
> > done our own business..
>
> I see a couple other projects that might be doing something similar [0].
>
> [0] https://codesearch.debian.net/search?q=%3D+pqsignal&literal=1

Grouping by package I see three packages.

All these seem to not handle backend termination, which certainly doesn't seem
optimal - it e.g. means that a fast shutdown won't really work.

1) postgis

   For at least some of the cases it doesn't look trivial to convert to
   checking QueryCancelPending/ProcDiePending because the signal handler calls
   into GEOS and lwgeom.

2) psql-http

   Also doesn't handle termination.

   Looks like it could trivially be converted to checking
   QueryCancelPending/ProcDiePending.

3) timescaledb

   This looks to be test only code where the signal handler only prints out a
   message.  I'd assume it's not critical to log that message.

IOW, the only worrisome case here is postgis.

Given that we are working towards *not* relying on signals for a lot of this,
I wonder if we ought to support registering callbacks to be called on receipt
of query cancellation and backend termination. That then could e.g. call
GEOS_interruptRequest() as postgis does.  That'd have a chance of working in a
threaded postgres too - unfortunately it looks like neither lwgeom's nor
GEOS's interrupt mechanisms are thread safe at this point.

It's worth noting that postgis ends up with a bunch of postgres-internals
knowledge due to its desire to handle signals:
https://github.com/postgis/postgis/blob/master/postgis/postgis_module.c#L51-L56

#ifdef WIN32
static void interruptCallback() {
  if (UNBLOCKED_SIGNAL_QUEUE())
    pgwin32_dispatch_queued_signals();
}
#endif

Which seems really rather undesirable.

But given how windows signals are currently delivered via the equivalent code
in CHECK_FOR_INTERRUPTS(), I don't really see an alternative at this point :(.

Greetings,

Andres Freund

Re: Converting pqsignal to void return

От

Paul Ramsey

Дата:

22 января, 20:50:18


> On Jan 22, 2025, at 9:36 AM, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2025-01-22 10:22:45 -0600, Nathan Bossart wrote:
>> On Wed, Jan 22, 2025 at 07:57:52AM -0800, Paul Ramsey wrote:
>>> The problem we are having in extension land is that we often run
>>> functions in external libraries that take a long time to return, and we
>>> would like to ensure that PgSQL users can cancel their queries, even when
>>> control has passed into those functions.
>>>
>>> The way we have done it, historically, has been to take the return value
>>> of pqsignal(SIGINT, extension_signint_handler) and remember it, and then,
>>> inside extension_signint_handler, call the pgsql handler once we have
>>> done our own business..
>>
>> I see a couple other projects that might be doing something similar [0].
>>
>> [0] https://codesearch.debian.net/search?q=%3D+pqsignal&literal=1
>
> Grouping by package I see three packages.

So many things for me to atone for! :)

> All these seem to not handle backend termination, which certainly doesn't seem
> optimal - it e.g. means that a fast shutdown won't really work.
>
> 1) postgis
>
>   For at least some of the cases it doesn't look trivial to convert to
>   checking QueryCancelPending/ProcDiePending because the signal handler calls
>   into GEOS and lwgeom.

We control both those libraries, so we just may need to change them up a little to maybe pass in an callback for their
internalcheck-for-interrupt to call. 

> 2) psql-http
>
>   Also doesn't handle termination.
>   Looks like it could trivially be converted to checking
>   QueryCancelPending/ProcDiePending.

Going to fix this one first.

> 3) timescaledb

Not me!

> IOW, the only worrisome case here is postgis.
> Given that we are working towards *not* relying on signals for a lot of this,
> I wonder if we ought to support registering callbacks to be called on receipt
> of query cancellation and backend termination. That then could e.g. call
> GEOS_interruptRequest() as postgis does.  That'd have a chance of working in a
> threaded postgres too - unfortunately it looks like neither lwgeom's nor
> GEOS's interrupt mechanisms are thread safe at this point.

I think callbacks are a good thing. Two of the libraries of interest to me (curl, GDAL) I end up using the progress
metercallback to wedge myself in and force an interrupt as necessary. This seems like a common feature of libraries
withlong running work.  

ATB,

P

>
>
> It's worth noting that postgis ends up with a bunch of postgres-internals
> knowledge due to its desire to handle signals:
> https://github.com/postgis/postgis/blob/master/postgis/postgis_module.c#L51-L56
>
> #ifdef WIN32
> static void interruptCallback() {
>  if (UNBLOCKED_SIGNAL_QUEUE())
>    pgwin32_dispatch_queued_signals();
> }
> #endif
>
> Which seems really rather undesirable.
>
>
> But given how windows signals are currently delivered via the equivalent code
> in CHECK_FOR_INTERRUPTS(), I don't really see an alternative at this point :(.
>
>
>
>
> Greetings,
>
> Andres Freund

Re: Converting pqsignal to void return

От

Andres Freund

Дата:

22 января, 21:15:59

Hi,

On 2025-01-22 09:50:18 -0800, Paul Ramsey wrote:
> > On Jan 22, 2025, at 9:36 AM, Andres Freund <andres@anarazel.de> wrote:
> > On 2025-01-22 10:22:45 -0600, Nathan Bossart wrote:
> >> On Wed, Jan 22, 2025 at 07:57:52AM -0800, Paul Ramsey wrote:
> >>> The problem we are having in extension land is that we often run
> >>> functions in external libraries that take a long time to return, and we
> >>> would like to ensure that PgSQL users can cancel their queries, even when
> >>> control has passed into those functions.
> >>>
> >>> The way we have done it, historically, has been to take the return value
> >>> of pqsignal(SIGINT, extension_signint_handler) and remember it, and then,
> >>> inside extension_signint_handler, call the pgsql handler once we have
> >>> done our own business..
> >>
> >> I see a couple other projects that might be doing something similar [0].
> >>
> >> [0] https://codesearch.debian.net/search?q=%3D+pqsignal&literal=1
> >
> > Grouping by package I see three packages.
>
> So many things for me to atone for! :)

Turns out creating good stuff almost invariably leeds to creating not so great
parts of those good things :)


> > All these seem to not handle backend termination, which certainly doesn't seem
> > optimal - it e.g. means that a fast shutdown won't really work.
> >
> > 1) postgis
> >
> >   For at least some of the cases it doesn't look trivial to convert to
> >   checking QueryCancelPending/ProcDiePending because the signal handler calls
> >   into GEOS and lwgeom.
>
> We control both those libraries, so we just may need to change them up a
> little to maybe pass in an callback for their internal check-for-interrupt
> to call.

Yea, makes sense. Looking at the callsites for the interrupt checking, it
doesn't look like a function call during interrupt checking will have a
meaningful performance impact.


> > 2) psql-http
> >
> >   Also doesn't handle termination.
> >   Looks like it could trivially be converted to checking
> >   QueryCancelPending/ProcDiePending.
>
> Going to fix this one first.

Makes sense.


> > 3) timescaledb
>
> Not me!

:)


> > IOW, the only worrisome case here is postgis.
> > Given that we are working towards *not* relying on signals for a lot of this,
> > I wonder if we ought to support registering callbacks to be called on receipt
> > of query cancellation and backend termination. That then could e.g. call
> > GEOS_interruptRequest() as postgis does.  That'd have a chance of working in a
> > threaded postgres too - unfortunately it looks like neither lwgeom's nor
> > GEOS's interrupt mechanisms are thread safe at this point.
>
> I think callbacks are a good thing. Two of the libraries of interest to me
> (curl, GDAL) I end up using the progress meter callback to wedge myself in
> and force an interrupt as necessary. This seems like a common feature of
> libraries with long running work.

As I mentioned earlier, for something like curl, where afaict you'd want to
accept interrupts while waiting for network IO, I think it's better to convert
to non-blocking use of such APIs and checking interrupt in the event
loop. It's been a while, but I have done that with curl, and it wasn't
particularly hard.


That approach imo is already cleaner today and I think it's what we'll
eventually require for at least a bunch of architectural improvements:

- faster query execution (so that execution can switch to another part of the
  query tree while blocked on network IO)

- connection scalability (if there's no 1:1 mapping between sessions and
  processes/threads, threads have to switch between executing queries for
  different sessions)

- to benefit from asynchronous completion-based FE/BE network handling.

But of course all of these are all quite a bit away.


Obviously event based loops are not a viable approach for wanting to check for
interrupts in, what I assume to be CPU bound, cases like GDAL/lwgeom.


If you have a "progress meter callback", I don't think there's really a
benefit in postgres providing a way to register a callback when receiving a
cancellation / termination event?

Greetings,

Andres Freund

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Converting pqsignal to void return