Обсуждение: Proper cleanup at backend exit

Поиск
Список
Период
Сортировка

Proper cleanup at backend exit

От
Tom Lane
Дата:
I have been looking into the problem reported by Massimo, and also seen
by me, that backends sometimes exit without cleaning up entries they
made in the pg_listener table.  I have fixed several contributing bugs
(having to do with the limited size of the on_shmem_exit table), but
I still see the problem occasionally.

I just discovered one fairly fundamental reason why there's still a
problem:

    test=> listen test;
    LISTEN
    test=> begin;
    BEGIN
    test=> \q
    $

    (start new psql)

    test=> select * from pg_listener;
    relname|listenerpid|notification
    -------+-----------+------------
    test   |      20024|           0
    (1 row)

Oops.  It would seem that async.c is *trying* to clean up, but the
changes are thrown away because they are considered part of a
never-committed transaction :-(.  More generally, I believe that
a backend killed in the middle of any individual SQL statement will
fail to remove its LISTEN entries, because it will think that any
database changes made just before exit ought to be discarded.

If we want a backend to clean out its pg_listen entries at shutdown,
it looks to me like backend exit will have to work like this:

1. Abort the current transaction, if any, and exit the current
   transaction block, if any.

2. Perform LISTEN cleanup within a new transaction.  Commit this
   transaction.

3. Continue with closing up shop.

I don't know whether there are any other things besides LISTEN cleanup
that really need to be done, not discarded, during backend exit.  If
there are, then we probably ought to restructure the exit sequence to
handle this cleanly.  But I'm worried about doing this much processing
during backend exit --- if you're trying to kill a backend that's gone
wrong, the last thing you want is for it to start doing new
transactions.

(BTW, that last thought also says that we *still* won't be able to get
rid of pg_listen entries 100% reliably.  In a panic exit scenario,
ie after one backend has crashed, none of the backends will clean up
their pg_listen entries.  We dare not tamper with that.  At least
I dare not.)

I'm half tempted to say that we should rip out the cleanup code and
not even try to remove pg_listener entries at backend exit.  They'll
go away anyway over time, because the next notifier to try to notify
that relation name will delete the entries when he discovers he can't
signal that backend PID anymore.  So cleanup-at-exit is not at all a
critical function, and it's hard to justify a bunch of new complexity
just for this.

Comments?  This depends a lot on whether you think LISTEN cleanup
is a unique case, or whether there will be other standard cleanup
activities that a backend ought to perform.  (Is there such a thing
as an ON EXIT hook in SQL?)

            regards, tom lane

Re: [HACKERS] Proper cleanup at backend exit

От
Massimo Dal Zotto
Дата:
>
> I have been looking into the problem reported by Massimo, and also seen
> by me, that backends sometimes exit without cleaning up entries they
> made in the pg_listener table.  I have fixed several contributing bugs
> (having to do with the limited size of the on_shmem_exit table), but

could you explain this to us ?

> I still see the problem occasionally.
>
> I just discovered one fairly fundamental reason why there's still a
> problem:
>
>     test=> listen test;
>     LISTEN
>     test=> begin;
>     BEGIN
>     test=> \q
>     $
>
>     (start new psql)
>
>     test=> select * from pg_listener;
>     relname|listenerpid|notification
>     -------+-----------+------------
>     test   |      20024|           0
>     (1 row)
>
> Oops.  It would seem that async.c is *trying* to clean up, but the
> changes are thrown away because they are considered part of a
> never-committed transaction :-(.  More generally, I believe that
> a backend killed in the middle of any individual SQL statement will
> fail to remove its LISTEN entries, because it will think that any
> database changes made just before exit ought to be discarded.

This is correct in some sense. Any change made but not commited should
never be seen in the table, so the tuple is not deleted.

> If we want a backend to clean out its pg_listen entries at shutdown,
> it looks to me like backend exit will have to work like this:
>
> 1. Abort the current transaction, if any, and exit the current
>    transaction block, if any.
>
> 2. Perform LISTEN cleanup within a new transaction.  Commit this
>    transaction.
>
> 3. Continue with closing up shop.
>
> I don't know whether there are any other things besides LISTEN cleanup
> that really need to be done, not discarded, during backend exit.  If
> there are, then we probably ought to restructure the exit sequence to
> handle this cleanly.  But I'm worried about doing this much processing
> during backend exit --- if you're trying to kill a backend that's gone
> wrong, the last thing you want is for it to start doing new
> transactions.
>
> (BTW, that last thought also says that we *still* won't be able to get
> rid of pg_listen entries 100% reliably.  In a panic exit scenario,
> ie after one backend has crashed, none of the backends will clean up
> their pg_listen entries.  We dare not tamper with that.  At least
> I dare not.)
>
> I'm half tempted to say that we should rip out the cleanup code and
> not even try to remove pg_listener entries at backend exit.  They'll
> go away anyway over time, because the next notifier to try to notify
> that relation name will delete the entries when he discovers he can't
> signal that backend PID anymore.  So cleanup-at-exit is not at all a
> critical function, and it's hard to justify a bunch of new complexity
> just for this.
>
> Comments?  This depends a lot on whether you think LISTEN cleanup
> is a unique case, or whether there will be other standard cleanup
> activities that a backend ought to perform.  (Is there such a thing
> as an ON EXIT hook in SQL?)

The code should be left if it works in the normal case.
I believe that some tuples left in pg_listener don't do any harm because
they refer to dead backends and the notify code can detect this situation
and delete the dead tuples. Anyway for safety I always truncate the file
to 0 in the shell loop that keeps the postmaster alive. Also a vacuum every
few hours doesn't hurt.

--
Massimo Dal Zotto

+----------------------------------------------------------------------+
|  Massimo Dal Zotto                email:  dz@cs.unitn.it             |
|  Via Marconi, 141                 phone:  ++39-461-534251            |
|  38057 Pergine Valsugana (TN)     www:  http://www.cs.unitn.it/~dz/  |
|  Italy                            pgp:  finger dz@tango.cs.unitn.it  |
+----------------------------------------------------------------------+

Re: [HACKERS] Proper cleanup at backend exit

От
Tom Lane
Дата:
Massimo Dal Zotto <dz@cs.unitn.it> writes:
>> I have fixed several contributing bugs
>> (having to do with the limited size of the on_shmem_exit table),

> could you explain this to us ?

async.c tries to arrange to clean up its entries in pg_listener by
scheduling an exit callback routine via on_shmem_exit.  (I think
Bruce or someone has been changing that stuff, and that on_shmem_exit
used to be called something else, but it's still basically an atexit-
like list of procedures to call.)

The trouble is that on_shmem_exit has a fixed-size, and not very large,
array for the callbacks.  Try to register too many exit callbacks, and
after a while they stop getting registered.

I saw two problems associated with that.  First, async.c was registering
a separate callback for each pg_listen entry it's ever made.  That was
easily dealt with, since we now have an "unlisten all" subroutine:
I got rid of the per-listen callback and now register only a single
callback over the life of the backend.  It just calls UnlistenAll.

Second, the main loop in postgres.c was miscoded so that every time you
had an error, it added a redundant FlushBufferPool callback to the list.
Make enough mistakes before you do your first listen, and you're still
out of luck.  A simple reordering of the code fixes that.  (While I was
at it I tried to improve the comments to make it clear where the main
loop *really* starts, so that future hackers won't misplace code in
PostgresMain the same way again...)

In the long term it might be a good idea to make on_shmem_exit construct
a linked list of registered callbacks (eg using Dllist) rather than
having a fixed maximum number of exit callbacks.  But just at the moment
I don't think it's necessary, given these bug fixes --- and if we did
have a linked list in there, we'd still want to make these changes
because they would be memory leaks otherwise.

            regards, tom lane

PS: I hope to check these changes in shortly.

Re: [HACKERS] Proper cleanup at backend exit

От
Bruce Momjian
Дата:
> I have been looking into the problem reported by Massimo, and also seen
> by me, that backends sometimes exit without cleaning up entries they
> made in the pg_listener table.  I have fixed several contributing bugs
> (having to do with the limited size of the on_shmem_exit table), but
> I still see the problem occasionally.
>
> I just discovered one fairly fundamental reason why there's still a
> problem:
>
>     test=> listen test;
>     LISTEN
>     test=> begin;
>     BEGIN
>     test=> \q
>     $
>
>     (start new psql)
>
>     test=> select * from pg_listener;
>     relname|listenerpid|notification
>     -------+-----------+------------
>     test   |      20024|           0
>     (1 row)
>
> Oops.  It would seem that async.c is *trying* to clean up, but the
> changes are thrown away because they are considered part of a
> never-committed transaction :-(.  More generally, I believe that
> a backend killed in the middle of any individual SQL statement will
> fail to remove its LISTEN entries, because it will think that any
> database changes made just before exit ought to be discarded.
>
> If we want a backend to clean out its pg_listen entries at shutdown,
> it looks to me like backend exit will have to work like this:
>
> 1. Abort the current transaction, if any, and exit the current
>    transaction block, if any.
>
> 2. Perform LISTEN cleanup within a new transaction.  Commit this
>    transaction.
>
> 3. Continue with closing up shop.
>
> I don't know whether there are any other things besides LISTEN cleanup
> that really need to be done, not discarded, during backend exit.  If
> there are, then we probably ought to restructure the exit sequence to
> handle this cleanly.  But I'm worried about doing this much processing
> during backend exit --- if you're trying to kill a backend that's gone
> wrong, the last thing you want is for it to start doing new
> transactions.
>
> (BTW, that last thought also says that we *still* won't be able to get
> rid of pg_listen entries 100% reliably.  In a panic exit scenario,
> ie after one backend has crashed, none of the backends will clean up
> their pg_listen entries.  We dare not tamper with that.  At least
> I dare not.)
>
> I'm half tempted to say that we should rip out the cleanup code and
> not even try to remove pg_listener entries at backend exit.  They'll
> go away anyway over time, because the next notifier to try to notify
> that relation name will delete the entries when he discovers he can't
> signal that backend PID anymore.  So cleanup-at-exit is not at all a
> critical function, and it's hard to justify a bunch of new complexity
> just for this.
>
> Comments?  This depends a lot on whether you think LISTEN cleanup
> is a unique case, or whether there will be other standard cleanup
> activities that a backend ought to perform.  (Is there such a thing
> as an ON EXIT hook in SQL?)

Seems like we should try and clean them out, especially when the backend
is doing a normal exit.  In a panic exit, we don't need to clean it out.
We can install something into proc_exit() and shmem_exit() to clean out
the table.  You can have a global variable that installed the pg_listen
cleaner the first time a notify() is done so as not to install a new
exit handler for each notify() call.


--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026


Re: [HACKERS] Proper cleanup at backend exit

От
Bruce Momjian
Дата:
> Massimo Dal Zotto <dz@cs.unitn.it> writes:
> >> I have fixed several contributing bugs
> >> (having to do with the limited size of the on_shmem_exit table),
>
> > could you explain this to us ?
>
> async.c tries to arrange to clean up its entries in pg_listener by
> scheduling an exit callback routine via on_shmem_exit.  (I think
> Bruce or someone has been changing that stuff, and that on_shmem_exit
> used to be called something else, but it's still basically an atexit-
> like list of procedures to call.)

Yes.  The old code atexit() was called for process exit and
shared-memory reset, and that caused problems as more and more cleanup
had to be done ONLY at shared memory reset time.  Having two separate
queues and proc names helped.

>
> The trouble is that on_shmem_exit has a fixed-size, and not very large,
> array for the callbacks.  Try to register too many exit callbacks, and
> after a while they stop getting registered.
>
> I saw two problems associated with that.  First, async.c was registering
> a separate callback for each pg_listen entry it's ever made.  That was
> easily dealt with, since we now have an "unlisten all" subroutine:
> I got rid of the per-listen callback and now register only a single
> callback over the life of the backend.  It just calls UnlistenAll.

That is why you need some type of global variable that sets the cleanup
the first time, and only once.  You could just call the proc_exit call,
and prevent it from scheduling two of the same exit functions.  That
would be cleaner.

>
> Second, the main loop in postgres.c was miscoded so that every time you
> had an error, it added a redundant FlushBufferPool callback to the list.
> Make enough mistakes before you do your first listen, and you're still
> out of luck.  A simple reordering of the code fixes that.  (While I was
> at it I tried to improve the comments to make it clear where the main
> loop *really* starts, so that future hackers won't misplace code in
> PostgresMain the same way again...)
>
> In the long term it might be a good idea to make on_shmem_exit construct
> a linked list of registered callbacks (eg using Dllist) rather than
> having a fixed maximum number of exit callbacks.  But just at the moment
> I don't think it's necessary, given these bug fixes --- and if we did
> have a linked list in there, we'd still want to make these changes
> because they would be memory leaks otherwise.


--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026


Re: [HACKERS] Proper cleanup at backend exit

От
Tom Lane
Дата:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
>> [ should we work harder to ensure pg_listener entries get cleaned out? ]

> Seems like we should try and clean them out, especially when the backend
> is doing a normal exit.  In a panic exit, we don't need to clean it out.

OK, I will add some more code to try to cope with the exit-during-
uncommitted-transaction case.

> We can install something into proc_exit() and shmem_exit() to clean out
> the table.

The old code inserted the cleanup action into shmem_exit().  Is that the
right list to put it in?  I'm not clear on the difference between
shmem_exit and proc_exit lists...

> You can have a global variable that installed the pg_listen
> cleaner the first time a notify() is done so as not to install a new
> exit handler for each notify() call.

That part's already done.

            regards, tom lane

Re: [HACKERS] Proper cleanup at backend exit

От
Bruce Momjian
Дата:
> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> >> [ should we work harder to ensure pg_listener entries get cleaned out? ]
>
> > Seems like we should try and clean them out, especially when the backend
> > is doing a normal exit.  In a panic exit, we don't need to clean it out.
>
> OK, I will add some more code to try to cope with the exit-during-
> uncommitted-transaction case.
>
> > We can install something into proc_exit() and shmem_exit() to clean out
> > the table.
>
> The old code inserted the cleanup action into shmem_exit().  Is that the
> right list to put it in?  I'm not clear on the difference between
> shmem_exit and proc_exit lists...

proc_exit is triggered on every backend exit.  shmem_exit is for
postmaster exit, or postmaster shared memory reset.

>
> > You can have a global variable that installed the pg_listen
> > cleaner the first time a notify() is done so as not to install a new
> > exit handler for each notify() call.
>
> That part's already done.

Cool.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026