Re: [GENERAL] efficiently migrating 'old' data from one table toanother
От | Adrian Klaver |
---|---|
Тема | Re: [GENERAL] efficiently migrating 'old' data from one table toanother |
Дата | |
Msg-id | cda6e688-ad35-096e-24ab-3d5c2d789a55@aklaver.com обсуждение исходный текст |
Ответ на | [GENERAL] efficiently migrating 'old' data from one table to another (Jonathan Vanasco <postgres@2xlp.com>) |
Ответы |
Re: [GENERAL] efficiently migrating 'old' data from one table to another
|
Список | pgsql-general |
On 01/12/2017 12:06 PM, Jonathan Vanasco wrote: > I'm just wondering if there's a more efficient way of handling a certain periodic data migration. > > We have a pair of tables with this structure: > > table_a__live > column_1 INT > column_2 INT > record_timestamp TIMESTAMP > > table_a__archive > column_1 INT > column_2 INT > record_timestamp TIMESTAMP > > periodically, we must migrate items that are 'stale' from `table_a__live ` to `table_a__archive`. The entries are copiedover to the archive, then deleted. > > The staleness is calculated based on age-- so we need to use INTERVAL. the "live" table can have anywhere from 100k to20MM records. > > the primary key on `table_a__live` is a composite of column_1 & column_2, > > In order to minimize scanning the table, we opted to hint migrations with a dedicated column: > > ALTER TABLE table_a__live ADD is_migrate BOOLEAN DEFAULT NULL; > CREATE INDEX idx_table_a__live_migrate ON table_a__live(is_migrate) WHERE is_migrate IS NOT NULL; > > so our migration is then based on that `is_migrate` column: > > BEGIN; > UPDATE table_a__live SET is_migrate = TRUE WHERE record_timestamp < transaction_timestamp() AT TIME ZONE 'UTC' - INTERVAL'1 month'; > INSERT INTO table_a__archive (column_1, column_2, record_timestamp) SELECT column_1, column_2, record_timestamp FROMtable_a__live WHERE is_migrate IS TRUE; > DELETE FROM table_a__live WHERE is_migrate IS TRUE; > COMMIT; > > The inserts & deletes are blazing fast, but the UPDATE is a bit slow from postgres re-writing all the rows. Maybe I am missing something, but why do the UPDATE? Why not?: BEGIN; INSERT INTO table_a__archive (column_1, column_2, record_timestamp) SELECT column_1, column_2, record_timestamp FROM table_a__live WHERE record_timestamp < transaction_timestamp() AT TIME ZONE 'UTC' - INTERVAL '1 month'; DELETE FROM table_a__live WHERE record_timestamp < transaction_timestamp() AT TIME ZONE 'UTC' - INTERVAL '1 month'; COMMIT; With an index on record_timestamp. > > can anyone suggest a better approach? > > I considered copying everything to a tmp table then inserting/deleting based on that table -- but there's a lot of disk-ioon that approach too. > > > fwiw we're on postgres9.6.1 > -- Adrian Klaver adrian.klaver@aklaver.com
В списке pgsql-general по дате отправления: