Обсуждение: [ADMIN] Recovery from Transaction wraparound failure

Поиск
Список
Период
Сортировка

[ADMIN] Recovery from Transaction wraparound failure

От
Ganesh Kannan
Дата:


Hi All,


One of the production databases that we own is having transactions wraparound issue. This is a large database with more than 2000 tables, 70TB in size. Most of the tables run between 14 to 20GB in size. This database is predominantly read-only work load, but gets millions of new rows each day in batches. There are about 20 tables that were never vacuumed manually or picked up by auto vacuum, and I suspect those may be the ones causing the issue. 


At first, I started seeing this error for all writes:

sqlerrm:database is not accepting commands to avoid wraparound data loss in database "db1" , sqlstate:54000


1) After reading the documentation, I thought I could try to vacuum those 20 tables that were never vacuumed individually ( with relfrozenid more than 2 billion) in single-user mode, but it is not working (error copied below). Is there a way to do vacuum of individual tables in single-user mode? I would rather not do vacuum of all tables ("vacuum" command) as that would probably take several days.


backend>  vacuum freeze schema.wd_p51_y2015;       

2017-02-07 23:25:15 EST [54045]: [8-1] user=,db=,app=,client= WARNING:  database "db1" must be vacuumed within 999999 transactions

2017-02-07 23:25:15 EST [54045]: [9-1] user=,db=,app=,client= HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.

You might also need to commit or roll back old prepared transactions.

2)  If "vacuum" ing the whole db is my only option, how can i execute single-user mode in background with vacuum running? could someone please share an example? I can only execute "postgres --single DBNAME" in interactive mode and could not figure out sending it to background without the server terminating on me. I only have remote access to this db server, and needs a way to kick off the vacuum in single-user mode in the background. 


3) Restoring from the backup is an option for us, but given the db size and time to build indexes, but hoping to hear opinions from an expert or two. 


Thanks for your help in advance,

Ganesh Kannan 

Re: [ADMIN] Recovery from Transaction wraparound failure

От
Ganesh Kannan
Дата:

Update...


I let the vacuum db for the night and i see there are about 10 tables that were never vacuumed (reduced from 20 tables.)  


select t.relname, t.last_vacuum, t.last_autovacuum, c.relfrozenxid, age(c.relfrozenxid)
 from pg_stat_user_tables t, pg_class  c
where schemaname='weather_data'
and c.relkind='r' 
and t.relid=c.oid
and last_vacuum is null and last_autovacuum is null
order by last_vacuum desc, last_autovacuum desc;

Adding that there are more than 1000 tables (each 14 GB or so) that match age(relfrozenxid)  > 

I could again take the database into single-user mode and let it run, but not sure if vacuum would get to those 10 tables first or not. can someone shed some light on the order in which manual vacuum or auto vacuum would pick up tables for vacuum? 


Regarding running the postgres --single in background, I used disown to let it run through overnight, but I had to kill the process to open the DB for read-only sessions. When i checked the pg_class.last_vacuum field, I don't see any table was vacuumed though the process was running for 8 hours. Does killing the single user mode affect "vacuum" from registering the fact it did vacuum on tables?


Thanks again, and I hope that there is enough intrigue and information here to get someone willing to help me out.

Ganesh



From: Ganesh Kannan
Sent: Wednesday, February 8, 2017 1:48:10 AM
To: pgsql-admin@postgresql.org
Subject: Recovery from Transaction wraparound failure
 


Hi All,


One of the production databases that we own is having transactions wraparound issue. This is a large database with more than 2000 tables, 70TB in size. Most of the tables run between 14 to 20GB in size. This database is predominantly read-only work load, but gets millions of new rows each day in batches. There are about 20 tables that were never vacuumed manually or picked up by auto vacuum, and I suspect those may be the ones causing the issue. 


At first, I started seeing this error for all writes:

sqlerrm:database is not accepting commands to avoid wraparound data loss in database "db1" , sqlstate:54000


1) After reading the documentation, I thought I could try to vacuum those 20 tables that were never vacuumed individually ( with relfrozenid more than 2 billion) in single-user mode, but it is not working (error copied below). Is there a way to do vacuum of individual tables in single-user mode? I would rather not do vacuum of all tables ("vacuum" command) as that would probably take several days.


backend>  vacuum freeze schema.wd_p51_y2015;       

2017-02-07 23:25:15 EST [54045]: [8-1] user=,db=,app=,client= WARNING:  database "db1" must be vacuumed within 999999 transactions

2017-02-07 23:25:15 EST [54045]: [9-1] user=,db=,app=,client= HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.

You might also need to commit or roll back old prepared transactions.

2)  If "vacuum" ing the whole db is my only option, how can i execute single-user mode in background with vacuum running? could someone please share an example? I can only execute "postgres --single DBNAME" in interactive mode and could not figure out sending it to background without the server terminating on me. I only have remote access to this db server, and needs a way to kick off the vacuum in single-user mode in the background. 


3) Restoring from the backup is an option for us, but given the db size and time to build indexes, but hoping to hear opinions from an expert or two. 


Thanks for your help in advance,

Ganesh Kannan 

Re: [ADMIN] Recovery from Transaction wraparound failure

От
Ganesh Kannan
Дата:

sorry I missed this: 

Adding that there are more than 1000 tables (each 14 GB or so) that match age(relfrozenxid)  >  (autovacuum_freeze_max_age (200Million) minus vacuum_freeze_min_age (50Million)





From: Ganesh Kannan
Sent: Wednesday, February 8, 2017 9:37 AM
To: pgsql-admin@postgresql.org
Subject: Re: Recovery from Transaction wraparound failure
 

Update...


I let the vacuum db for the night and i see there are about 10 tables that were never vacuumed (reduced from 20 tables.)  


select t.relname, t.last_vacuum, t.last_autovacuum, c.relfrozenxid, age(c.relfrozenxid)
 from pg_stat_user_tables t, pg_class  c
where schemaname='weather_data'
and c.relkind='r' 
and t.relid=c.oid
and last_vacuum is null and last_autovacuum is null
order by last_vacuum desc, last_autovacuum desc;

Adding that there are more than 1000 tables (each 14 GB or so) that match age(relfrozenxid)  > 

I could again take the database into single-user mode and let it run, but not sure if vacuum would get to those 10 tables first or not. can someone shed some light on the order in which manual vacuum or auto vacuum would pick up tables for vacuum? 


Regarding running the postgres --single in background, I used disown to let it run through overnight, but I had to kill the process to open the DB for read-only sessions. When i checked the pg_class.last_vacuum field, I don't see any table was vacuumed though the process was running for 8 hours. Does killing the single user mode affect "vacuum" from registering the fact it did vacuum on tables?


Thanks again, and I hope that there is enough intrigue and information here to get someone willing to help me out.

Ganesh



From: Ganesh Kannan
Sent: Wednesday, February 8, 2017 1:48:10 AM
To: pgsql-admin@postgresql.org
Subject: Recovery from Transaction wraparound failure
 


Hi All,


One of the production databases that we own is having transactions wraparound issue. This is a large database with more than 2000 tables, 70TB in size. Most of the tables run between 14 to 20GB in size. This database is predominantly read-only work load, but gets millions of new rows each day in batches. There are about 20 tables that were never vacuumed manually or picked up by auto vacuum, and I suspect those may be the ones causing the issue. 


At first, I started seeing this error for all writes:

sqlerrm:database is not accepting commands to avoid wraparound data loss in database "db1" , sqlstate:54000


1) After reading the documentation, I thought I could try to vacuum those 20 tables that were never vacuumed individually ( with relfrozenid more than 2 billion) in single-user mode, but it is not working (error copied below). Is there a way to do vacuum of individual tables in single-user mode? I would rather not do vacuum of all tables ("vacuum" command) as that would probably take several days.


backend>  vacuum freeze schema.wd_p51_y2015;       

2017-02-07 23:25:15 EST [54045]: [8-1] user=,db=,app=,client= WARNING:  database "db1" must be vacuumed within 999999 transactions

2017-02-07 23:25:15 EST [54045]: [9-1] user=,db=,app=,client= HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.

You might also need to commit or roll back old prepared transactions.

2)  If "vacuum" ing the whole db is my only option, how can i execute single-user mode in background with vacuum running? could someone please share an example? I can only execute "postgres --single DBNAME" in interactive mode and could not figure out sending it to background without the server terminating on me. I only have remote access to this db server, and needs a way to kick off the vacuum in single-user mode in the background. 


3) Restoring from the backup is an option for us, but given the db size and time to build indexes, but hoping to hear opinions from an expert or two. 


Thanks for your help in advance,

Ganesh Kannan 

Re: [ADMIN] Recovery from Transaction wraparound failure

От
Tom Lane
Дата:
Ganesh Kannan <ganesh.kannan@weatheranalytics.com> writes:
> 1) After reading the documentation, I thought I could try to vacuum
> those 20 tables that were never vacuumed individually ( with relfrozenid
> more than 2 billion) in single-user mode, but it is not working (error
> copied below).

That is not an error, it's just a warning:

> backend>  vacuum freeze schema.wd_p51_y2015;
> 2017-02-07 23:25:15 EST [54045]: [8-1] user=,db=,app=,client= WARNING:  database "db1" must be vacuumed within 999999
transactions
> 2017-02-07 23:25:15 EST [54045]: [9-1] user=,db=,app=,client= HINT:  To avoid a database shutdown, execute a
database-wideVACUUM in that database. 

It's telling you that you haven't finished fixing the problem, not that
this command didn't do anything.

> Is there a way to do vacuum of individual tables in
> single-user mode?

You're doing it.  I don't think you really need the "freeze" modifier
though, and that may be forcing more I/O than is needed to get out of
trouble.

            regards, tom lane


Re: [ADMIN] Recovery from Transaction wraparound failure

От
Ganesh Kannan
Дата:

Hi Tom,

Thank you much for the timely help. I continued executing vacuum for the select tables in single user mode (ignoring warning) and the database is back to accepting writes. 


Thanks

Ganesh





From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Wednesday, February 8, 2017 11:13:47 AM
To: Ganesh Kannan
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Recovery from Transaction wraparound failure
 
Ganesh Kannan <ganesh.kannan@weatheranalytics.com> writes:
> 1) After reading the documentation, I thought I could try to vacuum
> those 20 tables that were never vacuumed individually ( with relfrozenid
> more than 2 billion) in single-user mode, but it is not working (error
> copied below).

That is not an error, it's just a warning:

> backend>  vacuum freeze schema.wd_p51_y2015;
> 2017-02-07 23:25:15 EST [54045]: [8-1] user=,db=,app=,client= WARNING:  database "db1" must be vacuumed within 999999 transactions
> 2017-02-07 23:25:15 EST [54045]: [9-1] user=,db=,app=,client= HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that database.

It's telling you that you haven't finished fixing the problem, not that
this command didn't do anything.

> Is there a way to do vacuum of individual tables in
> single-user mode?

You're doing it.  I don't think you really need the "freeze" modifier
though, and that may be forcing more I/O than is needed to get out of
trouble.

                        regards, tom lane