Re: pg_stop_backup does not complete

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: pg_stop_backup does not complete
Дата
Msg-id 1266951502.3752.4294.camel@ebony
обсуждение исходный текст
Ответ на pg_stop_backup does not complete  (Josh Berkus <josh@agliodbs.com>)
Ответы Re: pg_stop_backup does not complete  ("Joshua D. Drake" <jd@commandprompt.com>)
Re: pg_stop_backup does not complete  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Re: pg_stop_backup does not complete  (David Fetter <david@fetter.org>)
Re: pg_stop_backup does not complete  (Josh Berkus <josh@agliodbs.com>)
Re: pg_stop_backup does not complete  ("Joshua D. Drake" <jd@commandprompt.com>)
Список pgsql-hackers
On Tue, 2010-02-23 at 09:45 -0800, Josh Berkus wrote:

> 1) Set up a brand new master with an archive-commmand and archive=on.
> 
> 2) Start the master
> 
> 3) Do a pg_start_backup()
> 
> 4) Realize, based on log error messages, that I've misconfigured the
> archive_command.

> 5) Attempt to shut down the master.  Master tells me that pg_stop_backup
> must be run in order to shut down.
> 
> 6) Execute pg_stop_backup.
> 
> 7) pg_stop_backup waits forever without ever stopping backup.  Ever 60
> seconds, it give me a helpful "still waiting" message, but at least in
> the amount of time I was willing to wait (5 minutes), it never completed.
> 
> 8) do an immediate shutdown, as it's the only way I can get the database
> unstuck.
> 
> With some experimentation, the problem seems to occur when you have a
> failing archive_command and a master which currently has no database
> traffic; for example, if I did some database write activity (a createdb)
> then pg_stop_backup would complete after about 60 seconds (which, btw,
> is extremely annoying, but at least tolerable).
> 
> This issue is 100% reproduceable.

IMHO there in no problem in that behaviour. If somebody requests a
backup then we should wait for it to complete. Kevin's suggestion of
pg_fail_backup() is the only sensible conclusion there because it gives
an explicit way out of deadlock.

ISTM the problem is that you didn't test. Steps 3 and 4 should have been
reversed. Perhaps we should put something in the docs to say "and test".
The correct resolution is to put in an archive_command that works.

We can put in an extra step to prevent a pg_start_backup() if there are
a significant number of outstanding files to be archived. Doing that
seems like closing the door after the horse has bolted, since we just
introduced streaming replication that doesn't rely on archived files. In
any case, I don't see many people working on a production system hitting
a problem on an archive_command and then deciding to shut down. 

So I don't see this as something that needs fixing for 9.0. There is
already too much non-essential code there, all of which needs to be
tested. I don't think adding in new corner cases to "help" people makes
any sense until we have automated testing that allows us to rerun the
regression tests to check all this stuff still works.

-- Simon Riggs           www.2ndQuadrant.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Следующее
От: Tom Lane
Дата:
Сообщение: Re: function side effects