Обсуждение: DROP DATABASE is interruptible

Поиск
Список
Период
Сортировка

DROP DATABASE is interruptible

От
Andres Freund
Дата:
Hi,


Unfortunately DROP DATABASE does not hold interrupt over its crucial steps. If
you e.g. set a breakpoint on DropDatabaseBuffers() and then do a signal
SIGINT, we'll process that interrupt before the transaction commits.

A later connect to that database ends with:
2023-03-14 10:22:24.443 PDT [3439153][client backend][3/2:0][[unknown]] PANIC:  could not open critical system index
2662


It's not entirely obvious how to fix this. We can't just hold interrupts for
the whole transaction - for one, we hang if we do so, because it prevents
ourselves from absorbing our own barrier:
    /* Close all smgr fds in all backends. */
    WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));


ISTM that at the very least dropdb() needs to internally commit *before*
dropping buffers - after that point the database is corrupt.

Greetings,

Andres Freund



Re: DROP DATABASE is interruptible

От
Thomas Munro
Дата:
I tried out the patch you posted over at [1].  For those wanting an
easy way to test it, or test the buggy behaviour in master without
this patch, you can simply kill -STOP the checkpointer, so that DROP
DATABASE hangs in RequestCheckpoint() (or you could SIGSTOP any other
backend so it hangs in the barrier thing instead), and then you can
just press ^C like this:

postgres=# create database db2;
CREATE DATABASE
postgres=# drop database db2;
^CCancel request sent
ERROR:  canceling statement due to user request

After that you get:

$ psql db2
psql: error: connection to server on socket "/tmp/.s.PGSQL.5432"
failed: FATAL:  database "db2" is invalid
DETAIL:  Use DROP DATABASE to drop invalid databases

I suppose it should be a HINT?

+# FIXME: It'd be good to test the actual interruption path. But it's not
+# immediately obvious how.

I wonder if there is some way to incorporate something based on
SIGSTOP signals into the test, but I don't know how to do it on
Windows and maybe that's a bit weird anyway.  For a non-OS-specific
way to do it, I was wondering about having a test module function that
has a wait loop that accepts ^C but deliberately ignores
ProcSignalBarrier, and leaving that running in a background psql for a
similar effect?

Not sure why the test is under src/test/recovery.

While a database exists in this state, we get periodic autovacuum
noise, which I guess we should actually skip?  I suppose someone might
eventually wonder if autovacuum could complete the drop, but it seems
a bit of a sudden weird leap in duties and might be confusing (perhaps
it'd make more sense if 'invalid because creating' and 'invalid
because dropping' were distinguished).

2023-05-09 15:24:10.860 NZST [523191] FATAL:  database "db2" is invalid
2023-05-09 15:24:10.860 NZST [523191] DETAIL:  Use DROP DATABASE to
drop invalid databases
2023-05-09 15:25:10.883 NZST [523279] FATAL:  database "db2" is invalid
2023-05-09 15:25:10.883 NZST [523279] DETAIL:  Use DROP DATABASE to
drop invalid databases
2023-05-09 15:26:10.899 NZST [523361] FATAL:  database "db2" is invalid
2023-05-09 15:26:10.899 NZST [523361] DETAIL:  Use DROP DATABASE to
drop invalid databases
2023-05-09 15:27:10.919 NZST [523408] FATAL:  database "db2" is invalid
2023-05-09 15:27:10.919 NZST [523408] DETAIL:  Use DROP DATABASE to
drop invalid databases
2023-05-09 15:28:10.938 NZST [523456] FATAL:  database "db2" is invalid
2023-05-09 15:28:10.938 NZST [523456] DETAIL:  Use DROP DATABASE to
drop invalid databases

[1] https://www.postgresql.org/message-id/20230509013255.fjrlpitnj3ltur76%40awork3.anarazel.de



Re: DROP DATABASE is interruptible

От
Thomas Munro
Дата:
On Tue, May 9, 2023 at 3:41 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> I tried out the patch you posted over at [1].

I forgot to add, +1, I think this is a good approach.

(I'm still a little embarrassed at how long we spent trying to debug
this in the other thread from the supplied clues, when you'd already
pointed this failure mechanism out including the exact error message a
couple of months ago.  One thing I've noticed is that new threads
posted in the middle of commitfests are hard to see :-D  We were
getting pretty close, though.)



Re: DROP DATABASE is interruptible

От
Andres Freund
Дата:
Hi,

On 2023-05-09 15:41:36 +1200, Thomas Munro wrote:
> I tried out the patch you posted over at [1].

Thanks!


> $ psql db2
> psql: error: connection to server on socket "/tmp/.s.PGSQL.5432"
> failed: FATAL:  database "db2" is invalid
> DETAIL:  Use DROP DATABASE to drop invalid databases
> 
> I suppose it should be a HINT?

Yup.


> +# FIXME: It'd be good to test the actual interruption path. But it's not
> +# immediately obvious how.
> 
> I wonder if there is some way to incorporate something based on
> SIGSTOP signals into the test, but I don't know how to do it on
> Windows and maybe that's a bit weird anyway.  For a non-OS-specific
> way to do it, I was wondering about having a test module function that
> has a wait loop that accepts ^C but deliberately ignores
> ProcSignalBarrier, and leaving that running in a background psql for a
> similar effect?

Seems a bit too complicated.

We really need to work at a framework for this kind of thing.


> Not sure why the test is under src/test/recovery.

Where else? We don't really have a place to put backend specific tests that
aren't about logical replication or recovery right now...

It partially is about dealing with crashes etc in the middle of DROP DATABASE,
so it doesn't seem unreasonble to me anyway.


> While a database exists in this state, we get periodic autovacuum
> noise, which I guess we should actually skip?

Yes, good catch.

Also should either reset datfrozenxid et al when invalidating, or ignore it
when computing horizons.


> I suppose someone might eventually wonder if autovacuum could complete the
> drop, but it seems a bit of a sudden weird leap in duties and might be
> confusing (perhaps it'd make more sense if 'invalid because creating' and
> 'invalid because dropping' were distinguished).

I'm bit hesitant to do so for now. Once it's a bit more settled, maybe?
Although I wonder if there's something better suited to the task than
autovacuum.

Greetings,

Andres Freund



Re: DROP DATABASE is interruptible

От
Andres Freund
Дата:
Hi,

I'm hacking on this bugfix again, thanks to Evgeny's reminder on the other
thread [1].


I've been adding checks for partiall-dropped databases to the following places
so far:
- vac_truncate_clog(), as autovacuum can't process it anymore. Otherwise a
  partially dropped database could easily lead to shutdown-due-to-wraparound.
- get_database_list() - so autovacuum workers don't error out when connecting
- template database used by CREATE DATABASE
- pg_dumpall, so we don't try to connect to the database
- vacuumdb, clusterdb, reindexdb, same

It's somewhat annoying that there is no shared place for the relevant query
for the client-side cases.


I haven't yet added checks to pg_upgrade, even though that's clearly
needed. I'm waffling a bit between erroring out and just ignoring the
database? pg_upgrade already fails when datallowconn is set "wrongly", see
check_proper_datallowconn().  Any opinions?


I'm not sure what should be done for psql. It's probably not a good idea to
change tab completion, that'd just make it appear the database is gone. But \l
could probably show dropped databases more prominently?


We don't really have a good place to for database specific
code. dbcommands.[ch] are for commands (duh), but already contain a bunch of
functions that don't really belong there.  Seems we should add a
catalog/pg_database.c or catalog/database.c (tbh, I don't really know which we
use for what). But that's probably for HEAD only.


dbcommands.c's get_db_info() seems to have gone completely off the deep
end. It returns information in 14 separate out parameters, and the variables
for that need to all exhaustively be declared.  And of course that differs
heavily between releases, making it a pain to backpatch any change.  ISTM we
should just define a struct for the parameters - alternatively we could just
return a copy of the pg_database tuple, but it looks like the variable-width
attributes would make that *just* about a loss.

I guess that's once more something better dealt with on HEAD, but damn, I'm
not relishing having to deal with backpatching anything touching it - I think
it might be reasonable to just open-code fetching datconnlimit :/.


This patch is starting to be a bit big, particularly once adding tests for all
the checks mentioned above - but I haven't heard of or thought of a better
proposal :(.

Greetings,

Andres Freund

[1] https://postgr.es/m/01020188d31d0a86-16af92c0-4466-4cb6-a2e8-0e5898aab800-000000%40eu-west-1.amazonses.com



Re: DROP DATABASE is interruptible

От
Andres Freund
Дата:
Hi,

On 2023-05-09 15:41:36 +1200, Thomas Munro wrote:
> +# FIXME: It'd be good to test the actual interruption path. But it's not
> +# immediately obvious how.
> 
> I wonder if there is some way to incorporate something based on
> SIGSTOP signals into the test, but I don't know how to do it on
> Windows and maybe that's a bit weird anyway.  For a non-OS-specific
> way to do it, I was wondering about having a test module function that
> has a wait loop that accepts ^C but deliberately ignores
> ProcSignalBarrier, and leaving that running in a background psql for a
> similar effect?

I found a way to test it reliably, albeit partially. However, I'm not sure
where to do so / if it's worth doing so.

The problem occurs once remove_dbtablespaces() starts working. The fix does a
heap_inplace_update() before that. So to reproduce the problem one session can
lock pg_tablespace, another can drop a database. Then the second session can
be cancelled by the first.

Waiting for locks to be acquired etc is somewhat cumbersome in a tap
test. It'd be easier in an isolation test. But I don't think we want to do
this as part of the normal isolation schedule?

So just open coding it in a tap test seems to be the best way?

Is it worth doing?

Greetings,

Andres Freund



Re: DROP DATABASE is interruptible

От
Andres Freund
Дата:
Hi,

On 2023-06-21 12:02:04 -0700, Andres Freund wrote:
> I'm hacking on this bugfix again, thanks to Evgeny's reminder on the other
> thread [1].
> 
> 
> I've been adding checks for partiall-dropped databases to the following places
> so far:
> - vac_truncate_clog(), as autovacuum can't process it anymore. Otherwise a
>   partially dropped database could easily lead to shutdown-due-to-wraparound.
> - get_database_list() - so autovacuum workers don't error out when connecting
> - template database used by CREATE DATABASE
> - pg_dumpall, so we don't try to connect to the database
> - vacuumdb, clusterdb, reindexdb, same

Also pg_amcheck.


> It's somewhat annoying that there is no shared place for the relevant query
> for the client-side cases.

Still the case, I looked around, and it doesn't look we do anything smart
anywhere :/


> I haven't yet added checks to pg_upgrade, even though that's clearly
> needed. I'm waffling a bit between erroring out and just ignoring the
> database? pg_upgrade already fails when datallowconn is set "wrongly", see
> check_proper_datallowconn().  Any opinions?

There don't need to be explict checks, because pg_upgrade will fail, because
it connects to every database. Obviously the error could be nicer, but it
seems ok for something hopefully very rare. I did add a test ensuring that the
behaviour is caught.

It's somewhat odd that pg_upgrade prints errors on stdout...


> I'm not sure what should be done for psql. It's probably not a good idea to
> change tab completion, that'd just make it appear the database is gone. But \l
> could probably show dropped databases more prominently?

I have not done that. I wonder if this is something that should be done in the
back branches?

Greetings,

Andres Freund

Вложения

Re: DROP DATABASE is interruptible

От
Daniel Gustafsson
Дата:
> On 25 Jun 2023, at 19:03, Andres Freund <andres@anarazel.de> wrote:
> On 2023-06-21 12:02:04 -0700, Andres Freund wrote:
>> I'm hacking on this bugfix again,

This patch LGTM from reading through and testing (manually and with your
supplied tests in the patch), I think this is a sound approach to deal with
this.

>> I've been adding checks for partiall-dropped databases to the following places
>> so far:
>> - vac_truncate_clog(), as autovacuum can't process it anymore. Otherwise a
>>  partially dropped database could easily lead to shutdown-due-to-wraparound.
>> - get_database_list() - so autovacuum workers don't error out when connecting
>> - template database used by CREATE DATABASE
>> - pg_dumpall, so we don't try to connect to the database
>> - vacuumdb, clusterdb, reindexdb, same
>
> Also pg_amcheck.

That seems like an exhaustive list to me, I was unable to think of any other
place which would need the same treatment.  pg_checksums does come to mind but
it can clearly not see the required info so there doesn't seem like theres a
lot to do about that.

>> I haven't yet added checks to pg_upgrade, even though that's clearly
>> needed. I'm waffling a bit between erroring out and just ignoring the
>> database? pg_upgrade already fails when datallowconn is set "wrongly", see
>> check_proper_datallowconn().  Any opinions?
>
> There don't need to be explict checks, because pg_upgrade will fail, because
> it connects to every database. Obviously the error could be nicer, but it
> seems ok for something hopefully very rare. I did add a test ensuring that the
> behaviour is caught.

I don't see any pg_upgrade test in the patch?

> It's somewhat odd that pg_upgrade prints errors on stdout...

There are many odd things about pg_upgrade logging, updating it to use the
common logging framework of other utils would be nice.

>> I'm not sure what should be done for psql. It's probably not a good idea to
>> change tab completion, that'd just make it appear the database is gone. But \l
>> could probably show dropped databases more prominently?
>
> I have not done that. I wonder if this is something that should be done in the
> back branches?

Possibly, I'm not sure where we usually stand on changing the output format of
\ commands in psql in minor revisions.

A few small comments on the patch:

+ * Max connections allowed (-1=no limit, -2=invalid database). A database
+ * is set to invalid partway through eing dropped.  Using datconnlimit=-2
+ * for this purpose isn't particularly clean, but is backpatchable.
Typo: s/eing/being/.  A limit of -1 makes sense, but the meaning of -2 is less
intuitive IMO.  Would it make sense to add a #define with a more descriptive
name to save folks reading this having to grep around and figure out what -2
means?

+    errhint("Use DROP DATABASE to drop invalid databases"));
Should end with a period as a complete sentence?

+    errmsg("cannot alter invalid database \"%s\"", stmt->dbname),
+    errdetail("Use DROP DATABASE to drop invalid databases"));
Shouldn't this be an errhint() instead? Also ending with a period.

+    if (database_is_invalid_form((Form_pg_database) dbform))
+        continue;
Would it make sense to stick a DEBUG2 log entry in there to signal that such a
database exist? (The same would apply for the similar hunk in autovacuum.c.)

--
Daniel Gustafsson




Re: DROP DATABASE is interruptible

От
Andres Freund
Дата:
Hi,

On 2023-07-07 14:09:08 +0200, Daniel Gustafsson wrote:
> > On 25 Jun 2023, at 19:03, Andres Freund <andres@anarazel.de> wrote:
> > On 2023-06-21 12:02:04 -0700, Andres Freund wrote:
> >> I'm hacking on this bugfix again,
>
> This patch LGTM from reading through and testing (manually and with your
> supplied tests in the patch), I think this is a sound approach to deal with
> this.

Thanks!


> >> I haven't yet added checks to pg_upgrade, even though that's clearly
> >> needed. I'm waffling a bit between erroring out and just ignoring the
> >> database? pg_upgrade already fails when datallowconn is set "wrongly", see
> >> check_proper_datallowconn().  Any opinions?
> >
> > There don't need to be explict checks, because pg_upgrade will fail, because
> > it connects to every database. Obviously the error could be nicer, but it
> > seems ok for something hopefully very rare. I did add a test ensuring that the
> > behaviour is caught.
>
> I don't see any pg_upgrade test in the patch?

Oops, I stashed them alongside some unrelated changes... Included this time.



> > It's somewhat odd that pg_upgrade prints errors on stdout...
>
> There are many odd things about pg_upgrade logging, updating it to use the
> common logging framework of other utils would be nice.

Indeed.


> >> I'm not sure what should be done for psql. It's probably not a good idea to
> >> change tab completion, that'd just make it appear the database is gone. But \l
> >> could probably show dropped databases more prominently?
> >
> > I have not done that. I wonder if this is something that should be done in the
> > back branches?
>
> Possibly, I'm not sure where we usually stand on changing the output format of
> \ commands in psql in minor revisions.

I'd normally be quite careful, people do script psql.

While breaking things when encountering an invalid database doesn't actually
sound like a bad thing, I don't think it fits into any of the existing column
output by psql for \l.


> A few small comments on the patch:
>
> + * Max connections allowed (-1=no limit, -2=invalid database). A database
> + * is set to invalid partway through eing dropped.  Using datconnlimit=-2
> + * for this purpose isn't particularly clean, but is backpatchable.
> Typo: s/eing/being/.

Fixed.


> A limit of -1 makes sense, but the meaning of -2 is less intuitive IMO.
> Would it make sense to add a #define with a more descriptive name to save
> folks reading this having to grep around and figure out what -2 means?

I went back and forth about this one. We don't use defines for such things in
all the frontend code today, so the majority of places won't be improved by
adding this. I added them now, which required touching a few otherwise
untouched places, but not too bad.




> +    errhint("Use DROP DATABASE to drop invalid databases"));
> Should end with a period as a complete sentence?

I get confused about this every time. It's not helped by this example in
sources.sgml:

<programlisting>
Primary:    could not create shared memory segment: %m
Detail:     Failed syscall was shmget(key=%d, size=%u, 0%o).
Hint:       the addendum
</programlisting>

Which notably does not use punctuation for the hint. But indeed, later we say:
   <para>
    Detail and hint messages: Use complete sentences, and end each with
    a period.  Capitalize the first word of sentences.  Put two spaces after
    the period if another sentence follows (for English text; might be
    inappropriate in other languages).
   </para>


> +    errmsg("cannot alter invalid database \"%s\"", stmt->dbname),
> +    errdetail("Use DROP DATABASE to drop invalid databases"));
> Shouldn't this be an errhint() instead? Also ending with a period.

Yep.


> +    if (database_is_invalid_form((Form_pg_database) dbform))
> +        continue;
> Would it make sense to stick a DEBUG2 log entry in there to signal that such a
> database exist? (The same would apply for the similar hunk in autovacuum.c.)

I don't really have an opinion on it. Added.

            elog(DEBUG2,
                 "skipping invalid database \"%s\" while computing relfrozenxid",
                 NameStr(dbform->datname));
and
            elog(DEBUG2,
                 "autovacuum: skipping invalid database \"%s\"",
                 NameStr(pgdatabase->datname));


Updated patches attached.


Not looking forward to fixing all the conflicts.


Does anybody have an opinion about whether we should add a dedicated field to
pg_database for representing invalid databases in HEAD? I'm inclined to think
that it's not really worth the cross-version complexity at this point, and
it's not that bad a misuse to use pg_database.datconnlimit.

Greetings,

Andres Freund

Вложения

Re: DROP DATABASE is interruptible

От
Daniel Gustafsson
Дата:
> On 12 Jul 2023, at 03:59, Andres Freund <andres@anarazel.de> wrote:
> On 2023-07-07 14:09:08 +0200, Daniel Gustafsson wrote:
>>> On 25 Jun 2023, at 19:03, Andres Freund <andres@anarazel.de> wrote:
>>> On 2023-06-21 12:02:04 -0700, Andres Freund wrote:

>>> There don't need to be explict checks, because pg_upgrade will fail, because
>>> it connects to every database. Obviously the error could be nicer, but it
>>> seems ok for something hopefully very rare. I did add a test ensuring that the
>>> behaviour is caught.
>>
>> I don't see any pg_upgrade test in the patch?
>
> Oops, I stashed them alongside some unrelated changes... Included this time.

Looking more at this I wonder if we in HEAD should make this a bit nicer by
extending the --check phase to catch this?  I did a quick hack along these
lines in the 0003 commit attached here (0001 and 0002 are your unchanged
patches, just added for consistency and to be CFBot compatible).  If done it
could be a separate commit to make the 0002 patch backport cleaner of course.

>>>> I'm not sure what should be done for psql. It's probably not a good idea to
>>>> change tab completion, that'd just make it appear the database is gone. But \l
>>>> could probably show dropped databases more prominently?
>>>
>>> I have not done that. I wonder if this is something that should be done in the
>>> back branches?
>>
>> Possibly, I'm not sure where we usually stand on changing the output format of
>> \ commands in psql in minor revisions.
>
> I'd normally be quite careful, people do script psql.
>
> While breaking things when encountering an invalid database doesn't actually
> sound like a bad thing, I don't think it fits into any of the existing column
> output by psql for \l.

Agreed, it doesn't, it would have to be a new column.

>> +    errhint("Use DROP DATABASE to drop invalid databases"));
>> Should end with a period as a complete sentence?
>
> I get confused about this every time. It's not helped by this example in
> sources.sgml:
>
> <programlisting>
> Primary:    could not create shared memory segment: %m
> Detail:     Failed syscall was shmget(key=%d, size=%u, 0%o).
> Hint:       the addendum
> </programlisting>
>
> Which notably does not use punctuation for the hint. But indeed, later we say:
>   <para>
>    Detail and hint messages: Use complete sentences, and end each with
>    a period.  Capitalize the first word of sentences.  Put two spaces after
>    the period if another sentence follows (for English text; might be
>    inappropriate in other languages).
>   </para>

That's not a very helpful example, and one which may give the wrong impression
unless the entire page is read.  I've raised this with a small diff to improve
it on -docs.

> Updated patches attached.

This version of the patchset LGTM.

> Does anybody have an opinion about whether we should add a dedicated field to
> pg_database for representing invalid databases in HEAD? I'm inclined to think
> that it's not really worth the cross-version complexity at this point, and
> it's not that bad a misuse to use pg_database.datconnlimit.

FWIW I think we should use pg_database.datconnlimit for this, it doesn't seem
like a common enough problem to warrant the added complexity and cost.

--
Daniel Gustafsson


Вложения

Re: DROP DATABASE is interruptible

От
Andres Freund
Дата:
Hi,

On 2023-07-12 11:54:18 +0200, Daniel Gustafsson wrote:
> > On 12 Jul 2023, at 03:59, Andres Freund <andres@anarazel.de> wrote:
> > On 2023-07-07 14:09:08 +0200, Daniel Gustafsson wrote:
> >>> On 25 Jun 2023, at 19:03, Andres Freund <andres@anarazel.de> wrote:
> >>> On 2023-06-21 12:02:04 -0700, Andres Freund wrote:
> 
> >>> There don't need to be explict checks, because pg_upgrade will fail, because
> >>> it connects to every database. Obviously the error could be nicer, but it
> >>> seems ok for something hopefully very rare. I did add a test ensuring that the
> >>> behaviour is caught.
> >> 
> >> I don't see any pg_upgrade test in the patch?
> > 
> > Oops, I stashed them alongside some unrelated changes... Included this time.
> 
> Looking more at this I wonder if we in HEAD should make this a bit nicer by
> extending the --check phase to catch this?  I did a quick hack along these
> lines in the 0003 commit attached here (0001 and 0002 are your unchanged
> patches, just added for consistency and to be CFBot compatible).  If done it
> could be a separate commit to make the 0002 patch backport cleaner of course.

I don't really have an opinion on that, tbh...

> >> +    errhint("Use DROP DATABASE to drop invalid databases"));
> >> Should end with a period as a complete sentence?
> > 
> > I get confused about this every time. It's not helped by this example in
> > sources.sgml:
> > 
> > <programlisting>
> > Primary:    could not create shared memory segment: %m
> > Detail:     Failed syscall was shmget(key=%d, size=%u, 0%o).
> > Hint:       the addendum
> > </programlisting>
> > 
> > Which notably does not use punctuation for the hint. But indeed, later we say:
> >   <para>
> >    Detail and hint messages: Use complete sentences, and end each with
> >    a period.  Capitalize the first word of sentences.  Put two spaces after
> >    the period if another sentence follows (for English text; might be
> >    inappropriate in other languages).
> >   </para>
> 
> That's not a very helpful example, and one which may give the wrong impression
> unless the entire page is read.  I've raised this with a small diff to improve
> it on -docs.

Thanks for doing that!


> > Updated patches attached.
> 
> This version of the patchset LGTM.

Backpatching indeed was no fun. Not having BackgroundPsql.pm was the worst
part. But also a lot of other conflicts in tests... Took me 5-6 hours or
so.
But I now finally pushed the fixes. Hope the buildfarm agrees with it...

Thanks for the review!

Greetings,

Andres Freund



Re: DROP DATABASE is interruptible

От
Daniel Gustafsson
Дата:
> On 13 Jul 2023, at 22:52, Andres Freund <andres@anarazel.de> wrote:
> On 2023-07-12 11:54:18 +0200, Daniel Gustafsson wrote:

>> Looking more at this I wonder if we in HEAD should make this a bit nicer by
>> extending the --check phase to catch this?  I did a quick hack along these
>> lines in the 0003 commit attached here (0001 and 0002 are your unchanged
>> patches, just added for consistency and to be CFBot compatible).  If done it
>> could be a separate commit to make the 0002 patch backport cleaner of course.
>
> I don't really have an opinion on that, tbh...

Fair enough.  Thinking more on it I think it has merits, so I will submit that
patch in its own thread on -hackers.

--
Daniel Gustafsson




Re: DROP DATABASE is interruptible

От
Peter Eisentraut
Дата:
I noticed that this patch set introduced this pg_dump test:

On 12.07.23 03:59, Andres Freund wrote:
> +    'CREATE DATABASE invalid...' => {
> +        create_order => 1,
> +        create_sql => q(CREATE DATABASE invalid; UPDATE pg_database SET datconnlimit = -2 WHERE datname =
'invalid'),
> +        regexp => qr/^CREATE DATABASE invalid/m,
> +        not_like => {
> +            pg_dumpall_dbprivs => 1,
> +        },
> +    },

But the key "not_like" isn't used for anything by that test suite. 
Maybe "unlike" was meant?  But even then it would be useless because the 
"like" key is empty, so there is nothing that "unlike" can subtract 
from.  Was there something expected from the mention of 
"pg_dumpall_dbprivs"?

Perhaps it would be better to write out

     like => {},

explicitly, with a comment, like some other tests are doing.



Re: DROP DATABASE is interruptible

От
Andres Freund
Дата:
Hi,

On 2023-09-25 01:48:31 +0100, Peter Eisentraut wrote:
> I noticed that this patch set introduced this pg_dump test:
> 
> On 12.07.23 03:59, Andres Freund wrote:
> > +    'CREATE DATABASE invalid...' => {
> > +        create_order => 1,
> > +        create_sql => q(CREATE DATABASE invalid; UPDATE pg_database SET datconnlimit = -2 WHERE datname =
'invalid'),
> > +        regexp => qr/^CREATE DATABASE invalid/m,
> > +        not_like => {
> > +            pg_dumpall_dbprivs => 1,
> > +        },
> > +    },
> 
> But the key "not_like" isn't used for anything by that test suite. Maybe
> "unlike" was meant?

It's not clear to me either. Invalid databases shouldn't *ever* be dumped, so
explicitly listing pg_dumpall_dbprivs is odd.

TBH, I find this testsuite the most opaque in postgres...


> But even then it would be useless because the "like" key is empty, so there
> is nothing that "unlike" can subtract from.  Was there something expected
> from the mention of "pg_dumpall_dbprivs"?

Not that I can figure out...


> Perhaps it would be better to write out
> 
>     like => {},
> 
> explicitly, with a comment, like some other tests are doing.

Yea, that looks like the right direction.

I'll go and backpatch the adjustment.

Greetings,

Andres Freund



Re: DROP DATABASE is interruptible

От
Alexander Lakhin
Дата:
Hi,

13.07.2023 23:52, Andres Freund wrote:
>
> Backpatching indeed was no fun. Not having BackgroundPsql.pm was the worst
> part. But also a lot of other conflicts in tests... Took me 5-6 hours or
> so.
> But I now finally pushed the fixes. Hope the buildfarm agrees with it...
>
> Thanks for the review!

I've discovered that the test 037_invalid_database, introduced with
c66a7d75e, hangs when a server built with -DCLOBBER_CACHE_ALWAYS or with
debug_discard_caches = 1 set via TEMP_CONFIG:
echo "debug_discard_caches = 1" >/tmp/extra.config
TEMP_CONFIG=/tmp/extra.config make -s check -C src/test/recovery/ PROVE_TESTS="t/037*"
# +++ tap check in src/test/recovery +++
[09:05:48] t/037_invalid_database.pl .. 6/?

regress_log_037_invalid_database ends with:
[09:05:51.622](0.021s) # issuing query via background psql:
#   CREATE DATABASE regression_invalid_interrupt;
#   BEGIN;
#   LOCK pg_tablespace;
#   PREPARE TRANSACTION 'lock_tblspc';
[09:05:51.684](0.062s) ok 8 - blocked DROP DATABASE completion

I see two backends waiting:
law      2420132 2420108  0 09:05 ?        00:00:00 postgres: node: law postgres [local] DROP DATABASE waiting
law      2420135 2420108  0 09:05 ?        00:00:00 postgres: node: law postgres [local] startup waiting

and the latter's stack trace:
#0  0x00007f65c8fd3f9a in epoll_wait (epfd=9, events=0x563c40e15478, maxevents=1, timeout=-1) at 
../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x0000563c3fa9a9fa in WaitEventSetWaitBlock (set=0x563c40e15410, cur_timeout=-1, occurred_events=0x7fff579dda80, 
nevents=1) at latch.c:1570
#2  0x0000563c3fa9a8e4 in WaitEventSetWait (set=0x563c40e15410, timeout=-1, occurred_events=0x7fff579dda80, nevents=1,

wait_event_info=50331648) at latch.c:1516
#3  0x0000563c3fa99b14 in WaitLatch (latch=0x7f65c5e112e4, wakeEvents=33, timeout=0, wait_event_info=50331648) at 
latch.c:538
#4  0x0000563c3fac7dee in ProcSleep (locallock=0x563c40e41e80, lockMethodTable=0x563c4007cba0 <default_lockmethod>) at

proc.c:1339
#5  0x0000563c3fab4160 in WaitOnLock (locallock=0x563c40e41e80, owner=0x563c40ea5af8) at lock.c:1816
#6  0x0000563c3fab2c80 in LockAcquireExtended (locktag=0x7fff579dde30, lockmode=1, sessionLock=false, dontWait=false, 
reportMemoryError=true, locallockp=0x7fff579dde28) at lock.c:1080
#7  0x0000563c3faaf86d in LockRelationOid (relid=1213, lockmode=1) at lmgr.c:116
#8  0x0000563c3f537aff in relation_open (relationId=1213, lockmode=1) at relation.c:55
#9  0x0000563c3f5efde9 in table_open (relationId=1213, lockmode=1) at table.c:44
#10 0x0000563c3fca2227 in CatalogCacheInitializeCache (cache=0x563c40e8fe80) at catcache.c:980
#11 0x0000563c3fca255e in InitCatCachePhase2 (cache=0x563c40e8fe80, touch_index=true) at catcache.c:1083
#12 0x0000563c3fcc0556 in InitCatalogCachePhase2 () at syscache.c:184
#13 0x0000563c3fcb7db3 in RelationCacheInitializePhase3 () at relcache.c:4317
#14 0x0000563c3fce2748 in InitPostgres (in_dbname=0x563c40e54000 "postgres", dboid=5, username=0x563c40e53fe8 "law", 
useroid=0, flags=1, out_dbname=0x0) at postinit.c:1177
#15 0x0000563c3fad90a7 in PostgresMain (dbname=0x563c40e54000 "postgres", username=0x563c40e53fe8 "law") at
postgres.c:4229
#16 0x0000563c3f9f01e4 in BackendRun (port=0x563c40e45360) at postmaster.c:4475

It looks like no new backend can be started due to the pg_tablespace lock,
when a new relcache file is needed during the backend initialization.

Best regards,
Alexander



Re: DROP DATABASE is interruptible

От
Thomas Munro
Дата:
On Tue, Mar 12, 2024 at 9:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
> I see two backends waiting:
> law      2420132 2420108  0 09:05 ?        00:00:00 postgres: node: law postgres [local] DROP DATABASE waiting
> law      2420135 2420108  0 09:05 ?        00:00:00 postgres: node: law postgres [local] startup waiting

Ugh.