Обсуждение: spurious wrap-around shutdown
In 9.3 HEAD I am getting what seems to be spurious wrap-around shutdowns.
postgres=# SELECT datname, datfrozenxid, age(datfrozenxid) FROM pg_database;
datname | datfrozenxid | age
-----------+--------------+-----------
template1 | 2621759843 | 0
template0 | 2621759843 | 0
postgres | 2571759843 | 50000000
jjanes | 2437230921 | 184528922
postgres=# select txid_current();
ERROR: database is not accepting commands to avoid wraparound data loss in database "jjanes"
HINT: Stop the postmaster and use a standalone backend to vacuum that database.
You might also need to commit or roll back old prepared transactions.
184,528,922 is well short of 2 billion, so what is going on?
I thought maybe the ShmemVariableCache were not getting updated when vacuum finished, but if I restart the server (forcing shared memory to get rebuilt from disk) the condition continues.
I tried setting a breakpoint on SetTransactionIdLimit, but that seems to get executed on startup before the -W flag takes effect, so I can't find it.
Any tips on how to debug this? I figure the next step is running git bisect, but that is sure to be tedious.
I'm using a variant of the below to reach wraparound quicker, perhaps that is introducing a bug?
Cheers,
Jeff
On 2013-06-16 11:54:24 -0700, Jeff Janes wrote: > In 9.3 HEAD I am getting what seems to be spurious wrap-around shutdowns. > > > postgres=# SELECT datname, datfrozenxid, age(datfrozenxid) FROM pg_database; > > datname | datfrozenxid | age > -----------+--------------+----------- > template1 | 2621759843 | 0 > template0 | 2621759843 | 0 > postgres | 2571759843 | 50000000 > jjanes | 2437230921 | 184528922 > > > postgres=# select txid_current(); > ERROR: database is not accepting commands to avoid wraparound data loss in > database "jjanes" > HINT: Stop the postmaster and use a standalone backend to vacuum that > database. > You might also need to commit or roll back old prepared transactions. > > > 184,528,922 is well short of 2 billion, so what is going on? I guess you're sure you don't have any old prepared xacts running around? > I thought maybe the ShmemVariableCache were not getting updated when vacuum > finished, but if I restart the server (forcing shared memory to get rebuilt > from disk) the condition continues. > > I tried setting a breakpoint on SetTransactionIdLimit, but that seems to > get executed on startup before the -W flag takes effect, so I can't find it. > > Any tips on how to debug this? I figure the next step is running git > bisect, but that is sure to be tedious. I'd first add the actual xids limits that are assumed to be dangerous to the error messages in GetNewTransactionId(). That already might give a hint. > I'm using a variant of the below to reach wraparound quicker, perhaps that > is introducing a bug? > > http://www.postgresql.org/message-id/20130207203216.GE5172@alvh.no-ip.org I don't really trust that patch because it skips loads of checks since it only repeats part of the work that GetNewTransactionId does. I don't immediately see what the problem that could cause though. IIRC I had postes a patch in that thread that looped around GetNewTransactionId() in that thread. It might be worthwile to test whether that also reproduces the issue. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Jun 16, 2013 at 11:54 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
In 9.3 HEAD I am getting what seems to be spurious wrap-around shutdowns.postgres=# SELECT datname, datfrozenxid, age(datfrozenxid) FROM pg_database;datname | datfrozenxid | age-----------+--------------+-----------template1 | 2621759843 | 0template0 | 2621759843 | 0postgres | 2571759843 | 50000000jjanes | 2437230921 | 184528922
While the behavior is weird, it is not a regression (also present in 9.2 with suitable changes in timing) and the shutdown is not spurious.
If I execute the above query immediately after the shutdown, I see what I would expect, jjanes has an age of about 2^31.
The one table that is holding everything back is already getting autovac for wraparound at that point, and eventually that vacuum finishes. When done, pg_class and pg_database are updated (I don't know how they get updated without trying to assign another transaction), and then I get the above query results.
I would think the database would re-allow new transactions at this point, but it does not. I don't know why.
Since this isn't a regression in 9.3, I probably won't pursue it any more at this time, unless encouraged to.
Cheers,
Jeff