Обсуждение: full vacuum of a very large table
Got a situation where a 130 mil rows (137GB) table needs to be brought down in size to 10 mil records (most recent) with the least amount of downtime. Doing a full vacuum would be faster on: - 120 mil rows deleted and 10 mil active (delete most of them then full vacuum) - 10 mil deleted and 120 mil active. (delete small batches and full vacuum after each delete). Any other suggestions? Thanks, Nic
Personally, provided you have the room, I would build a new table off to the side and then migrate what you need to keepto the new table, when done, and satisfied that you have all of the candidate rows, ranem the original to table to "x_tablename"and rename the newly created table into place to take over.... if all is good .. simply drop the x_tablenametable. -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Nic Chidu Sent: Tuesday, March 29, 2011 10:56 AM To: pgsql-admin@postgresql.org Subject: [ADMIN] full vacuum of a very large table Got a situation where a 130 mil rows (137GB) table needs to be brought down in size to 10 mil records (most recent) withthe least amount of downtime. Doing a full vacuum would be faster on: - 120 mil rows deleted and 10 mil active (delete most of them then full vacuum) - 10 mil deleted and 120 mil active. (delete small batches and full vacuum after each delete). Any other suggestions? Thanks, Nic -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
On Tue, Mar 29, 2011 at 9:26 PM, Nic Chidu <nic@chidu.net> wrote:
Got a situation where a 130 mil rows (137GB) table needs to be brought down in size to 10 mil records (most recent)
with the least amount of downtime.
Doing a full vacuum would be faster on:
- 120 mil rows deleted and 10 mil active (delete most of them then full vacuum)
- 10 mil deleted and 120 mil active. (delete small batches and full vacuum after each delete).
Any other suggestions?
Best recommended way is, take the dump of the table after dropping un-used rows from the table and restored back to the database. Dump and reload would be faster than a VACUUM FULL.
--Raghu Ram
Thanks,
Nic
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
On Tue, Mar 29, 2011 at 9:26 PM, Nic Chidu <nic@chidu.net> wrote:
Got a situation where a 130 mil rows (137GB) table needs to be brought down in size to 10 mil records (most recent)
with the least amount of downtime.
What version of PG are you with ?
If PG>8.2 and have disk space twice the size of target table and indexes, you can reorganize the table using pg_reorg without locking table with concurrent selections and updations.
Doing a full vacuum would be faster on:
- 120 mil rows deleted and 10 mil active (delete most of them then full vacuum)
- 10 mil deleted and 120 mil active. (delete small batches and full vacuum after each delete).
Any other suggestions?
Thanks,
Nic
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
--
Thanks,
Ashish Nauriyal
EnterpriseDB Corporation
The Enterprise Postgres Company
On 03/29/2011 09:04 AM, Plugge, Joe R. wrote: > Personally, provided you have the room, I would build a new table off to the side and then migrate what you need to keepto the new table, when done, and satisfied that you have all of the candidate rows, ranem the original to table to "x_tablename"and rename the newly created table into place to take over.... if all is good .. simply drop the x_tablenametable. This looks attractive but can cause issues if there are views, foreign-keys, etc. that depend on this table. > -----Original Message----- > From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Nic Chidu > Sent: Tuesday, March 29, 2011 10:56 AM > To: pgsql-admin@postgresql.org > Subject: [ADMIN] full vacuum of a very large table > > Got a situation where a 130 mil rows (137GB) table needs to be brought down in size to 10 mil records (most recent) withthe least amount of downtime. > > Doing a full vacuum would be faster on: > - 120 mil rows deleted and 10 mil active (delete most of them then full vacuum) > - 10 mil deleted and 120 mil active. (delete small batches and full vacuum after each delete). > > Any other suggestions? > The phrase "most recent" leads me to believe this is time-based data. This might be a good time to partition your table to avoid this issue in the future. If you are clever about it (and try it out on a test environment), you might be able to create inherited tables off your main table and then move the "live" data from the parent to the child in unobtrusive sized chunks. When no live data remains in the parent table, truncate the parent table (only). Moving forward, set up your application/processes to put the data in to the appropriately sized (day?, week?, month? year?) child table. When the time comes, just archive and drop the child. Cheers, Steve
--- On Tue, 3/29/11, Nic Chidu <nic@chidu.net> wrote: > From: Nic Chidu <nic@chidu.net> > Subject: [ADMIN] full vacuum of a very large table > To: pgsql-admin@postgresql.org > Date: Tuesday, March 29, 2011, 11:56 AM > Got a situation where a 130 mil rows > (137GB) table needs to be brought down in size to 10 > mil records (most recent) > with the least amount of downtime. > > Doing a full vacuum would be faster on: > - 120 mil rows deleted and 10 mil active (delete most of > them then full vacuum) > - 10 mil deleted and 120 mil active. (delete small batches > and full vacuum after each delete). > > Any other suggestions? > > Thanks, > > Nic > > -- Nic, Since you know the where clause to delete the 120 mil rows why not use the converse of that to select the 10 mil rows toretain into another table, then drop the original table? No vacuum required! Be sure to use the "create table as select..." syntax to avoid WAL during creation of the new table, and use a transactionto drop the original table and rename the new one. That way users will be querying the original table right upuntil the switch over, when they will start using the new table. Foreign keys and other constraints may complicate things a bit, so check those out first. Also, don't forget to index andanalyze the new table before the switch over. Good luck! Bob Lunney
yeah , This is the best solution, Where you dont need much resources and DB to do more work, You will save a lot here.
Just simple inserts from base table to new table with your condition,
verify ,
rename base table to some other, new table to base tablename.
analyze this table.
On Tue, Mar 29, 2011 at 9:34 PM, Plugge, Joe R. <JRPlugge@west.com> wrote:
Personally, provided you have the room, I would build a new table off to the side and then migrate what you need to keep to the new table, when done, and satisfied that you have all of the candidate rows, ranem the original to table to "x_tablename" and rename the newly created table into place to take over.... if all is good .. simply drop the x_tablename table.
-----Original Message-----
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Nic Chidu
Sent: Tuesday, March 29, 2011 10:56 AM
To: pgsql-admin@postgresql.org
Subject: [ADMIN] full vacuum of a very large table
Got a situation where a 130 mil rows (137GB) table needs to be brought down in size to 10 mil records (most recent) with the least amount of downtime.
Doing a full vacuum would be faster on:
- 120 mil rows deleted and 10 mil active (delete most of them then full vacuum)
- 10 mil deleted and 120 mil active. (delete small batches and full vacuum after each delete).
Any other suggestions?
Thanks,
Nic
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
On 03/29/2011 08:56 AM, Nic Chidu wrote: > Got a situation where a 130 mil rows (137GB) table needs to be brought down in size to 10 mil records (most recent) > with the least amount of downtime. > > Doing a full vacuum would be faster on: > - 120 mil rows deleted and 10 mil active (delete most of them then full vacuum) > - 10 mil deleted and 120 mil active. (delete small batches and full vacuum after each delete). > > Any other suggestions? I didn't really ask earlier, but what amount of downtime is acceptable and how much space is available on your drive? If a few minutes is acceptable and you aren't completely against the wall in disk-space, the easiest is probably to delete all the old rows and cluster the table. Cluster requires an exclusive lock, but so does vacuum full. And in my experience with larger tables a cluster is an order or magnitude or more faster (and you get shiny new indexes as well). Analyze the table immediately afterward. The exact solution will depend on the nature of your operation (what depends on this table, what is the nature of data inserts and queries, etc.). If it is critical that you be able to continuously insert data, you might copy the live rows up to some recent point in time to a holding table, then lock the table, copy the recently added rows, truncate and unlock the table and backfill it from the other table. This will minimize the unavailability for inserts but may not be feasible if you have constant query requirements, foreign-key constraints or the like. If availability is truly critical, be sure to test whatever approach you take in advance. Cheers, Steve