Re: Shortest offline window on database migration
От | Tomas Vondra |
---|---|
Тема | Re: Shortest offline window on database migration |
Дата | |
Msg-id | 20190530152914.l2h5f4fomro37ieh@development обсуждение исходный текст |
Ответ на | Shortest offline window on database migration (Haroldo Kerry <hkerry@callix.com.br>) |
Ответы |
RE: Shortest offline window on database migration
|
Список | pgsql-performance |
On Thu, May 30, 2019 at 12:08:04PM -0300, Haroldo Kerry wrote: >Hello, > >We are migrating our PostgreSQL 9.6.10 database (with streaming replication >active) to a faster disk array. >We are using this opportunity to enable checksums, so we will have to do a >full backup-restore. >The database size is about 500GB, it takes about 2h:30min for a full >backup, and then about 1h to fully restore it with checksum enabled on the >new array, plus 2h to recreate the replica on the old array. > >Although all synthetic tests (pgbench) indicate the new disk array is >faster, we will only be 100% confident once we see its performance on >production, so our backup plan is using our replica database on the older >array. If the new array performance is poor during production ramp up, we >can switch to the replica with little impact to our customers. > >Problem is the offline window for backup, restore the full database with >checksum and recreate the replica is about 5h:30m. > >One thing that occurred to us to shorten the offline window was restoring >the database to both the master and replica in parallel (of course we would >configure the replica as master do restore the database), that would shave >1h of the total time. Although this is not documented we thought that >restoring the same database to identical servers would result in binary >identical data files. > >We tried this in lab. As this is not a kosher way to create a replica, we >ran a checksum comparison of all data files, and we ended up having a lot >of differences. Bummer. Both master and replica worked (no errors on logs), >but we ended up insecure about this path because of the binary differences >on data files. >But in principle it should work, right? What should work? Backup using pg_dump and restore certainly won't give you the same binary files - the commit timestamps will be different, operations may happen in a different order (esp. with parallel restore), and so on. And the instances don't start as a copy anyway, so there will be different system IDs, etc. So no, this is not a valid way to provision master/standby cluster. >Has anyone been through this type of problem? > Unfortunately, I don't think there's a much better solution that what you initially described - dump/restore, and then creating a replica. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-performance по дате отправления: