Re: Seeking Suggestions for Best Practices: Archiving and Migrating Historical Data in PostgreSQL
От | Andy Hartman |
---|---|
Тема | Re: Seeking Suggestions for Best Practices: Archiving and Migrating Historical Data in PostgreSQL |
Дата | |
Msg-id | CAEZv3cpESEGDUu-W5WSDo=LqORjk122YR7UOEdui6ujpTU-eAQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Seeking Suggestions for Best Practices: Archiving and Migrating Historical Data in PostgreSQL (Ron Johnson <ronljohnsonjr@gmail.com>) |
Ответы |
Re: Seeking Suggestions for Best Practices: Archiving and Migrating Historical Data in PostgreSQL
|
Список | pgsql-admin |
what was the duration start to finish of the migration of the 6tb of data. then what do you use for a quick backup after archived PG data
Thanks.
Thanks.
On Fri, May 30, 2025 at 11:29 AM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
On Fri, May 30, 2025 at 3:51 AM Motog Plus <mplus7535@gmail.com> wrote:--Hi Team,We are currently planning a data archival initiative for our production PostgreSQL databases and would appreciate suggestions or insights from the community regarding best practices and proven approaches.**Scenario:**- We have a few large tables (several hundred million rows) where we want to archive historical data (e.g., older than 1 year).- The archived data should be moved to a separate PostgreSQL database (on a same or different server).- Our goals are: efficient data movement, minimal downtime, and safe deletion from the source after successful archival.- PostgreSQL version: 15.12- Both source and target databases are PostgreSQL.We explored using `COPY TO` and `COPY FROM` with CSV files, uploaded to a SharePoint or similar storage system. However, our infrastructure team raised concerns around the computational load of large CSV processing and potential security implications with file transfers.We’d like to understand:- What approaches have worked well for you in practice?This is how I migrated 6TB of data from an Oracle database to Postgresql, and then implemented quarterly archiving of the PG database:- COPY FROM (SELECT * FROM live_table WHERE date_fld in some_manageable_date_range) TO STDOUT.- Compress- scp- COPY TO archive_table.- Index- DELETE FROM live_table WHERE date_fld in some_manageable_date_range (This I only did in the PG archive process(Naturally, the Oracle migration used Oracle-specific commands.)- Are there specific tools or strategies you’d recommend for ongoing archival?I write generic bash loops to which you pass an array that contains the table name, PK, date column and date range.Given a list of tables, it did the COPY FROM, lz4 and scp. Once that finished successfully, another script dropped archive indices on the current table, COPY TO and CREATE INDEX statements. A third script did the deletes.This works even when the live database tables are all connected via FK. You just need to carefully order the tables in your script.- Any performance or consistency issues we should watch out for?My rules for scripting are "bite-sized pieces" and "check those return codes!".Your insights or any relevant documentation/pointers would be immensely helpful.Index support uber alles. When deleting from a table which relies on a foreign key link to a table which _does_ have a date field, don't hesitate to join on that table.And DELETE of bite-sized chunks is faster than people give it credit for.Death to <Redacted>, and butter sauce.Don't boil me, I'm still alive.<Redacted> lobster!
В списке pgsql-admin по дате отправления: