Re: File based Incremental backup v8

Поиск
Список
Период
Сортировка
От Gabriele Bartolini
Тема Re: File based Incremental backup v8
Дата
Msg-id CAHNtfO6urAVT22U2vaaY540Fs4RmiPnbygnLBhJ8Gk5g-u92aA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: File based Incremental backup v8  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: File based Incremental backup v8  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi Robert,

2015-03-06 3:10 GMT+11:00 Robert Haas <robertmhaas@gmail.com>:
But I agree with Fujii to the extent that I see little value in
committing this patch in the form proposed.  Being smart enough to use
the LSN to identify changed blocks, but then sending the entirety of
every file anyway because you don't want to go to the trouble of
figuring out how to revise the wire protocol to identify the
individual blocks being sent and write the tools to reconstruct a full
backup based on that data, does not seem like enough of a win.

I believe the main point is to look at a user interface point of view. If/When we switch to a block level incremental support, this will be completely transparent to the end user, even if we start with a file-level approach with LSN check.

The win is already determined by the average space/time gained by users of VLDB with a good chunk of read-only data. Our Barman users with incremental backup (released recently - its algorithm can be compared to the one of file-level backup proposed by Marco) can benefit on average of a data deduplication ratio ranging between 50 to 70% of the cluster size.

A tangible example is depicted here, with Navionics saving 8.2TB a week thanks to this approach (and 17 hours instead of 50 for backup time): http://blog.2ndquadrant.com/incremental-backup-barman-1-4-0/

However, even smaller databases will benefit. It is clear that very small databases as well as frequently updated ones won't be interested in incremental backup, but that is never been the use case for this feature.

I believe that if we still think that this approach is not worth it, we are making a big mistake. The way I see it, this patch follows an agile approach and it is an important step towards incremental backup on a block basis.
 
As Fujii says, if we ship this patch as written, people will just keep
using the timestamp-based approach anyway.

I think that allowing users to be able to backup in an incremental way through streaming replication (even though based on files) will give more flexibility to system and database administrators for their disaster recovery solutions.

Thanks,
Gabriele

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Rethinking pg_dump's function sorting code
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: [PATCH] Add transforms feature