Обсуждение: spurious wrap-around shutdown

Поиск

Список

Период

Сортировка

spurious wrap-around shutdown

От

Jeff Janes

Дата:

16 июня 2013 г., 21:54:32

In 9.3 HEAD I am getting what seems to be spurious wrap-around shutdowns.

postgres=# SELECT datname, datfrozenxid, age(datfrozenxid) FROM pg_database;

datname | datfrozenxid | age

-----------+--------------+-----------

template1 | 2621759843 | 0

template0 | 2621759843 | 0

postgres | 2571759843 | 50000000

jjanes | 2437230921 | 184528922

postgres=# select txid_current();

ERROR: database is not accepting commands to avoid wraparound data loss in database "jjanes"

HINT: Stop the postmaster and use a standalone backend to vacuum that database.

You might also need to commit or roll back old prepared transactions.

184,528,922 is well short of 2 billion, so what is going on?

I thought maybe the ShmemVariableCache were not getting updated when vacuum finished, but if I restart the server (forcing shared memory to get rebuilt from disk) the condition continues.

I tried setting a breakpoint on SetTransactionIdLimit, but that seems to get executed on startup before the -W flag takes effect, so I can't find it.

Any tips on how to debug this? I figure the next step is running git bisect, but that is sure to be tedious.

I'm using a variant of the below to reach wraparound quicker, perhaps that is introducing a bug?

http://www.postgresql.org/message-id/20130207203216.GE5172@alvh.no-ip.org

Cheers,

Jeff

Re: spurious wrap-around shutdown

От

Andres Freund

Дата:

16 июня 2013 г., 23:38:21

On 2013-06-16 11:54:24 -0700, Jeff Janes wrote:
> In 9.3 HEAD I am getting what seems to be spurious wrap-around shutdowns.
> 
> 
> postgres=# SELECT datname, datfrozenxid, age(datfrozenxid) FROM pg_database;
> 
>   datname  | datfrozenxid |    age
> -----------+--------------+-----------
>  template1 |   2621759843 |         0
>  template0 |   2621759843 |         0
>  postgres  |   2571759843 |  50000000
>  jjanes    |   2437230921 | 184528922
> 
> 
> postgres=# select txid_current();
> ERROR:  database is not accepting commands to avoid wraparound data loss in
> database "jjanes"
> HINT:  Stop the postmaster and use a standalone backend to vacuum that
> database.
> You might also need to commit or roll back old prepared transactions.
> 
> 
> 184,528,922  is well short of 2 billion, so what is going on?

I guess you're sure you don't have any old prepared xacts running
around?

> I thought maybe the ShmemVariableCache were not getting updated when vacuum
> finished, but if I restart the server (forcing shared memory to get rebuilt
> from disk) the condition continues.
> 
> I tried setting a breakpoint on SetTransactionIdLimit, but that seems to
> get executed on startup before the -W flag takes effect, so I can't find it.
> 
> Any tips on how to debug this?  I figure the next step is running git
> bisect, but that is sure to be tedious.

I'd first add the actual xids limits that are assumed to be dangerous to
the error messages in GetNewTransactionId(). That already might give a
hint.

> I'm using a variant of the below to reach wraparound quicker, perhaps that
> is introducing a bug?
> 
> http://www.postgresql.org/message-id/20130207203216.GE5172@alvh.no-ip.org

I don't really trust that patch because it skips loads of checks since
it only repeats part of the work that GetNewTransactionId does. I don't
immediately see what the problem that could cause though.
IIRC I had postes a patch in that thread that looped around
GetNewTransactionId() in that thread. It might be worthwile to test
whether that also reproduces the issue.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

spurious wrap-around shutdown

От

Jeff Janes

Дата:

18 июня 2013 г., 05:55:55

On Sun, Jun 16, 2013 at 11:54 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

In 9.3 HEAD I am getting what seems to be spurious wrap-around shutdowns.

postgres=# SELECT datname, datfrozenxid, age(datfrozenxid) FROM pg_database;

datname | datfrozenxid | age
-----------+--------------+-----------
template1 | 2621759843 | 0
template0 | 2621759843 | 0
postgres | 2571759843 | 50000000
jjanes | 2437230921 | 184528922

While the behavior is weird, it is not a regression (also present in 9.2 with suitable changes in timing) and the shutdown is not spurious.

If I execute the above query immediately after the shutdown, I see what I would expect, jjanes has an age of about 2^31.

The one table that is holding everything back is already getting autovac for wraparound at that point, and eventually that vacuum finishes. When done, pg_class and pg_database are updated (I don't know how they get updated without trying to assign another transaction), and then I get the above query results.

I would think the database would re-allow new transactions at this point, but it does not. I don't know why.

Since this isn't a regression in 9.3, I probably won't pursue it any more at this time, unless encouraged to.

Cheers,

Jeff

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: spurious wrap-around shutdown

spurious wrap-around shutdown

Re: spurious wrap-around shutdown

spurious wrap-around shutdown