Обсуждение: How to destroy your entire Postgres installation

Поиск
Список
Период
Сортировка

How to destroy your entire Postgres installation

От
Tom Lane
Дата:
It's easy: just destroydb one database that has active backends
still running in it.  Then stand back and watch the carnage.
(Hint: back up all your *other* databases first, because you'll
be lucky if you can still access them.)

There really, really needs to be an interlock to prevent this mistake
:-(

            regards, tom lane

Re: [HACKERS] How to destroy your entire Postgres installation

От
Nick Bastin
Дата:
Tom Lane wrote:
>
> It's easy: just destroydb one database that has active backends
> still running in it.  Then stand back and watch the carnage.
> (Hint: back up all your *other* databases first, because you'll
> be lucky if you can still access them.)

So just for fun I tried this.. ;-)  I did a destroydb on a database with 2
active backends and only 90 megs of data, and it managed to munge through the
other running database, which is fast approaching 2 gigs, in short order.  The
first time I did it, I was able to rescue the second database just using the
audit trails that I'd built into it, but the second time it completely trashed
it.  Now I'm no programming neophyte, but can somebody explain to me why this
happened?  What exactly is destroydb doing, or am I missing something obvious here?

--
Nick Bastin
RBB Systems, Inc.

Re: [HACKERS] How to destroy your entire Postgres installation

От
Tom Lane
Дата:
Nick Bastin <nbastin@rbbsystems.com> writes:
> Now I'm no programming neophyte, but can somebody explain to me
> why this happened?  What exactly is destroydb doing, or am I missing
> something obvious here?

Well, the backends for the destroyed database are *still running*;
you didn't kill them off by deleting their current working directory
from under them.  (In fact, the database's top level directory is still
there, because those processes still have it open ... it just has no
links left in the filesystem and will be deleted when the last process
holding it open exits.  Some of the member files of the dead database
are likely still on disk for the same reason.)

That means those backends are still participating in the shared memory
buffer arena used by all the backends.  And, very possibly, have dirty
buffers that should have been written out to files of the destroyed DB.

When I did this I got messages like
    "mdblindwrt: oid of db XYZ is not NNNNN"
which a quick 'glimpse' traces to a routine with this header comment:

/*
 *    mdblindwrt() -- Write a block to disk blind.
 *
 *        We have to be able to do this using only the name and OID of
 *        the database and relation in which the block belongs.  This
 *        is a synchronous write.
 */

The error message is fairly misleading, because it's actually used for
*any* failure to look up the database's info ... like, say, the database
having been deleted.

I got this even from backends that had nothing to do with the dead
database and had been started after it was destroyed.  Killing all the
backends belonging to the dead database didn't help.  I surmise that the
backends communally take responsibility for writing dirty buffers out to
the files where they belong, and thus any backend might try to write out
such an orphaned buffer --- and when it fails, it treats that as a fatal
error.

Perhaps someone with a better understanding of the backend can say more.

Anyway, I felt very lucky that I was able to extract the data I needed
from my non-toy databases.  (BTW, a hint for anyone else who makes the
same mistake: try creating the dead database again.  That seems to be
enough to prevent mdblindwrt from deciding that it has a fatal error
on its hands.)

But, as I said, there ought to be some interlocks in there.  You should
not be able to destroy a database that has connected backends --- and
it'd be a good idea to scan the buffer pool and make darn sure it has
no associated buffers, either.

            regards, tom lane

Re: [HACKERS] How to destroy your entire Postgres installation

От
Bruce Momjian
Дата:
I have modified the destroydb code so it marks all buffers associated
with the database as clean _before_ removing the files.  The old code
marked the buffers as clean after removing the database files.  Any
backend trying to flush dirty buffers for that database after the files
were removed but before the buffers were marked clean would get errors.

This does not fix the problem of someone else having the database open
during the destroy, but should fix the other problem you mentioned.




> Nick Bastin <nbastin@rbbsystems.com> writes:
> > Now I'm no programming neophyte, but can somebody explain to me
> > why this happened?  What exactly is destroydb doing, or am I missing
> > something obvious here?
> 
> Well, the backends for the destroyed database are *still running*;
> you didn't kill them off by deleting their current working directory
> from under them.  (In fact, the database's top level directory is still
> there, because those processes still have it open ... it just has no
> links left in the filesystem and will be deleted when the last process
> holding it open exits.  Some of the member files of the dead database
> are likely still on disk for the same reason.)
> 
> That means those backends are still participating in the shared memory
> buffer arena used by all the backends.  And, very possibly, have dirty
> buffers that should have been written out to files of the destroyed DB.
> 
> When I did this I got messages like
>     "mdblindwrt: oid of db XYZ is not NNNNN"
> which a quick 'glimpse' traces to a routine with this header comment:
> 
> /*
>  *    mdblindwrt() -- Write a block to disk blind.
>  *
>  *        We have to be able to do this using only the name and OID of
>  *        the database and relation in which the block belongs.  This
>  *        is a synchronous write.
>  */
> 
> The error message is fairly misleading, because it's actually used for
> *any* failure to look up the database's info ... like, say, the database
> having been deleted.
> 
> I got this even from backends that had nothing to do with the dead
> database and had been started after it was destroyed.  Killing all the
> backends belonging to the dead database didn't help.  I surmise that the
> backends communally take responsibility for writing dirty buffers out to
> the files where they belong, and thus any backend might try to write out
> such an orphaned buffer --- and when it fails, it treats that as a fatal
> error.
> 
> Perhaps someone with a better understanding of the backend can say more.
> 
> Anyway, I felt very lucky that I was able to extract the data I needed
> from my non-toy databases.  (BTW, a hint for anyone else who makes the
> same mistake: try creating the dead database again.  That seems to be
> enough to prevent mdblindwrt from deciding that it has a fatal error
> on its hands.)
> 
> But, as I said, there ought to be some interlocks in there.  You should
> not be able to destroy a database that has connected backends --- and
> it'd be a good idea to scan the buffer pool and make darn sure it has
> no associated buffers, either.
> 
>             regards, tom lane
> 
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026