Re: Streaming a base backup from master
От | Greg Stark |
---|---|
Тема | Re: Streaming a base backup from master |
Дата | |
Msg-id | AANLkTimVrLsH=ox4=WnxwYAsy4LSKYRjAKkmRW=nFOJ8@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Streaming a base backup from master (Martijn van Oosterhout <kleptog@svana.org>) |
Ответы |
Re: Streaming a base backup from master
|
Список | pgsql-hackers |
On Sun, Sep 5, 2010 at 4:51 PM, Martijn van Oosterhout <kleptog@svana.org> wrote: > If you're working from a known good version of the database at some > point, yes you are right you have more interesting options. If you > don't you want something that will fix it. Sure, in that case you want to restore from backup. Whatever you use to do that is the same net result. I'm not sure rsync is actually going to be much faster though since it still has to read all of the existing database which a normal restore doesn't have to. If the database has changed significantly that's a lot of extra I/O and you're probably on a local network with a lot of bandwidth available. What I'm talking about is how you *take* backups. Currently you have to take a full backup which if you have a large data warehouse could be a big job. If only a small percentage of the database is changing then you could use rsync to reduce the network bandwidth to transfer your backup but you still have to read the entire database and write out the entire backup. Incremental backups mean being able to read just the data blocks that have been modified and write out a backup file with just those blocks. When it comes time to restore then you restore the last full backup, then any incremental backups since then, then replay any logs needed to bring it to a consistent state. I think that description pretty much settles the question in my mind. The implementation choice of scanning the WAL to find all the changed blocks is more relevant to the use cases where incremental backups are useful. If you still have to read the entire database then there's not all that much to be gained except storage space. If you scan the WAL then you can avoid reading most of your large data warehouse to generate the incremental and only read the busy portion. In the use case where the database is extremely busy but writing and rewriting the same small number of blocks over and over even scanning the WAL might not be ideal. For that use case it might be more useful to generate a kind of wal-summary which lists all the blocks touched since the last checkpoint every checkpoint. But that could be a later optimization. -- greg
В списке pgsql-hackers по дате отправления: