Обсуждение: BUG #6170: hot standby wedging on full-WAL disk

Поиск
Список
Период
Сортировка

BUG #6170: hot standby wedging on full-WAL disk

От
"Daniel Farina"
Дата:
The following bug has been logged online:

Bug reference:      6170
Logged by:          Daniel Farina
Email address:      daniel@heroku.com
PostgreSQL version: 9.0.4
Operating system:   GNU/Linux Ubuntu 10.04 x86_64
Description:        hot standby wedging on full-WAL disk
Details:

After seeing this a few times, I think I've found a reproducible way to
prevent Postgres from making progress with hot standby.

1) Set up a WAL disk that will run out of space in a reasonable amount of
time.

2) Run a hot standby with a restore_command and primary_connection_info set
in recovery.conf.  ***Configure it to disable query cancellation***.

3) Begin a transaction, or long-running statement that prevents the
application of WAL records.

When the hot standby falls behind the primary it'll eventually bump out of
streaming mode, and will accumulate WAL until the disk fills.

Eventually the WAL disk will fill, and the hot standby cannot make any
progress until one deletes some WAL segments or otherwise makes a tiny bit
more room to work with.  This state persists past killing the offensive
long-running-transaction backend and even a postgres restart.  In the latter
case, one cannot even become 'hot' again, getting the "database system is
starting up" message, as Postgres wants to run a restore_command
immediately.

Furthermore, it appears that WAL segments from the future part of the
timeline (beyond what is being recovered at the moment) are stored on-disk
at that time.  I also think I have identified some WAL segments that are
from before the prior checkpoint location via pg_controldata, so they
technically could be pruned.  My wal_keep_segments is set, but I am not sure
if this has an effect on a hot standby.

Re: BUG #6170: hot standby wedging on full-WAL disk

От
Heikki Linnakangas
Дата:
On 20.08.2011 03:39, Daniel Farina wrote:
>
> The following bug has been logged online:
>
> Bug reference:      6170
> Logged by:          Daniel Farina
> Email address:      daniel@heroku.com
> PostgreSQL version: 9.0.4
> Operating system:   GNU/Linux Ubuntu 10.04 x86_64
> Description:        hot standby wedging on full-WAL disk
> Details:
>
> After seeing this a few times, I think I've found a reproducible way to
> prevent Postgres from making progress with hot standby.
>
> 1) Set up a WAL disk that will run out of space in a reasonable amount of
> time.
>
> 2) Run a hot standby with a restore_command and primary_connection_info set
> in recovery.conf.  ***Configure it to disable query cancellation***.
>
> 3) Begin a transaction, or long-running statement that prevents the
> application of WAL records.
>
> When the hot standby falls behind the primary it'll eventually bump out of
> streaming mode, and will accumulate WAL until the disk fills.
>
> Eventually the WAL disk will fill, and the hot standby cannot make any
> progress until one deletes some WAL segments or otherwise makes a tiny bit
> more room to work with.  This state persists past killing the offensive
> long-running-transaction backend and even a postgres restart.  In the latter
> case, one cannot even become 'hot' again, getting the "database system is
> starting up" message, as Postgres wants to run a restore_command
> immediately.
>
> Furthermore, it appears that WAL segments from the future part of the
> timeline (beyond what is being recovered at the moment) are stored on-disk
> at that time.  I also think I have identified some WAL segments that are
> from before the prior checkpoint location via pg_controldata, so they
> technically could be pruned.  My wal_keep_segments is set, but I am not sure
> if this has an effect on a hot standby.

So the problem is that walreceiver merrily writes so much future WAL
that it runs out of disk space? A limit on the maximum number of future
WAL files to stream ahead would fix that, but I can't get very excited
about it. Usually you do want to stream as much ahead as you can, to
ensure that the WAL is safely on disk on the standby, in case the master
dies. So the limit would need to be configurable.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: BUG #6170: hot standby wedging on full-WAL disk

От
Robert Haas
Дата:
On Mon, Aug 22, 2011 at 2:57 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> So the problem is that walreceiver merrily writes so much future WAL that it
> runs out of disk space? A limit on the maximum number of future WAL files to
> stream ahead would fix that, but I can't get very excited about it. Usually
> you do want to stream as much ahead as you can, to ensure that the WAL is
> safely on disk on the standby, in case the master dies. So the limit would
> need to be configurable.

It seems like perhaps what we really need is a way to make replaying
WAL (and getting rid of now-unneeded segments) to take priority over
getting new ones.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #6170: hot standby wedging on full-WAL disk

От
Heikki Linnakangas
Дата:
On 25.08.2011 19:11, Robert Haas wrote:
> On Mon, Aug 22, 2011 at 2:57 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com>  wrote:
>> So the problem is that walreceiver merrily writes so much future WAL that it
>> runs out of disk space? A limit on the maximum number of future WAL files to
>> stream ahead would fix that, but I can't get very excited about it. Usually
>> you do want to stream as much ahead as you can, to ensure that the WAL is
>> safely on disk on the standby, in case the master dies. So the limit would
>> need to be configurable.
>
> It seems like perhaps what we really need is a way to make replaying
> WAL (and getting rid of now-unneeded segments) to take priority over
> getting new ones.

With the defaults we start to kill queries after a while that get in the
way of WAL replay. Daniel had specifically disabled that. Of course,
even with the query-killer disabled, it's possible for the WAL replay to
fall so badly behind that you fill the disk, so a backstop might be
useful anyway, although that seems a lot less likely in practice and if
your standby can't keep up you're in trouble anyway.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: BUG #6170: hot standby wedging on full-WAL disk

От
Daniel Farina
Дата:
On Thu, Aug 25, 2011 at 10:16 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 25.08.2011 19:11, Robert Haas wrote:
>>
>> On Mon, Aug 22, 2011 at 2:57 AM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> =A0wrote:
>>>
>>> So the problem is that walreceiver merrily writes so much future WAL th=
at
>>> it
>>> runs out of disk space? A limit on the maximum number of future WAL fil=
es
>>> to
>>> stream ahead would fix that, but I can't get very excited about it.
>>> Usually
>>> you do want to stream as much ahead as you can, to ensure that the WAL =
is
>>> safely on disk on the standby, in case the master dies. So the limit
>>> would
>>> need to be configurable.
>>
>> It seems like perhaps what we really need is a way to make replaying
>> WAL (and getting rid of now-unneeded segments) to take priority over
>> getting new ones.
>
> With the defaults we start to kill queries after a while that get in the =
way
> of WAL replay. Daniel had specifically disabled that. Of course, even with
> the query-killer disabled, it's possible for the WAL replay to fall so ba=
dly
> behind that you fill the disk, so a backstop might be useful anyway,
> although that seems a lot less likely in practice and if your standby can=
't
> keep up you're in trouble anyway.

I do think it's not a bad idea to have postgres prune unnecessary WAL
at least enough so it can get the WAL segment it wants -- basically
unsticking the recovery command so progress can be made. Right now
someone (like me) has to go and trim away what appear to be
unnecessary wal in (what is currently) a manual process.

Also, I'm not sure if the segments that are downloaded via
restore_command during the fall-behind time are "counted" towards
replay when un-sticking after a restart of postgres: in particular, I
believe that PG will want to copy the segments a second time, although
I'm not 100% sure right now.  Regardless, not being able to restart
properly or make progress after killing the offensive backend are
unhappy things.

More thoughts?

--=20
fdr