Re: Proposal: Incremental Backup
От | Claudio Freire |
---|---|
Тема | Re: Proposal: Incremental Backup |
Дата | |
Msg-id | CAGTBQpaMV6+BvL_N66BuyBD8JsauW-p5o-FajNyVoKvsshiv2Q@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Proposal: Incremental Backup (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
On Fri, Jul 25, 2014 at 3:44 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jul 25, 2014 at 2:21 PM, Claudio Freire <klaussfreire@gmail.com> wrote: >> On Fri, Jul 25, 2014 at 10:14 AM, Marco Nenciarini >> <marco.nenciarini@2ndquadrant.it> wrote: >>> 1. Proposal >>> ================================= >>> Our proposal is to introduce the concept of a backup profile. The backup >>> profile consists of a file with one line per file detailing tablespace, >>> path, modification time, size and checksum. >>> Using that file the BASE_BACKUP command can decide which file needs to >>> be sent again and which is not changed. The algorithm should be very >>> similar to rsync, but since our files are never bigger than 1 GB per >>> file that is probably granular enough not to worry about copying parts >>> of files, just whole files. >> >> That wouldn't nearly as useful as the LSN-based approach mentioned before. >> >> I've had my share of rsyncing live databases (when resizing >> filesystems, not for backup, but the anecdotal evidence applies >> anyhow) and with moderately write-heavy databases, even if you only >> modify a tiny portion of the records, you end up modifying a huge >> portion of the segments, because the free space choice is random. >> >> There have been patches going around to change the random nature of >> that choice, but none are very likely to make a huge difference for >> this application. In essence, file-level comparisons get you only a >> mild speed-up, and are not worth the effort. >> >> I'd go for the hybrid file+lsn method, or nothing. The hybrid avoids >> the I/O of inspecting the LSN of entire segments (necessary >> optimization for huge multi-TB databases) and backups only the >> portions modified when segments do contain changes, so it's the best >> of both worlds. Any partial implementation would either require lots >> of I/O (LSN only) or save very little (file only) unless it's an >> almost read-only database. > > I agree with much of that. However, I'd question whether we can > really seriously expect to rely on file modification times for > critical data-integrity operations. I wouldn't like it if somebody > ran ntpdate to fix the time while the base backup was running, and it > set the time backward, and the next differential backup consequently > omitted some blocks that had been modified during the base backup. I was thinking the same. But that timestamp could be saved on the file itself, or some other catalog, like a "trusted metadata" implemented by pg itself, and it could be an LSN range instead of a timestamp really.
В списке pgsql-hackers по дате отправления: