Re: trying again to get incremental backup

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: trying again to get incremental backup
Дата
Msg-id 20230614194717.jyuw3okxup4cvtbt@awork3.anarazel.de
обсуждение исходный текст
Ответ на trying again to get incremental backup  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: trying again to get incremental backup  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi,

On 2023-06-14 14:46:48 -0400, Robert Haas wrote:
> A few years ago, I sketched out a design for incremental backup, but
> no patch for incremental backup ever got committed. Instead, the whole
> thing evolved into a project to add backup manifests, which are nice,
> but not as nice as incremental backup would be. So I've decided to
> have another go at incremental backup itself. Attached are some WIP
> patches. Let me summarize the design and some open questions and
> problems with it that I've discovered. I welcome problem reports and
> test results from others, as well.

Cool!


> I originally had the idea of summarizing a certain number of MB of WAL
> per WAL summary file, and so I added a GUC wal_summarize_mb for that
> purpose. But then I realized that actually, you really want WAL
> summary file boundaries to line up with possible redo points, because
> when you do an incremental backup, you need a summary that stretches
> from the redo point of the checkpoint written at the start of the
> prior backup to the redo point of the checkpoint written at the start
> of the current backup. The block modifications that happen in that
> range of WAL records are the ones that need to be included in the
> incremental.

I assume this is "solely" required for keeping the incremental backups as
small as possible, rather than being required for correctness?


> Unfortunately, there's no indication in the WAL itself
> that you've reached a redo point, but I wrote code that tries to
> notice when we've reached the redo point stored in shared memory and
> stops the summary there. But I eventually realized that's not good
> enough either, because if summarization zooms past the redo point
> before noticing the updated redo point in shared memory, then the
> backup sat around waiting for the next summary file to be generated so
> it had enough summaries to proceed with the backup, while the
> summarizer was in no hurry to finish up the current file and just sat
> there waiting for more WAL to be generated. Eventually the incremental
> backup would just time out. I tried to fix that by making it so that
> if somebody's waiting for a summary file to be generated, they can let
> the summarizer know about that and it can write a summary file ending
> at the LSN up to which it has read and then begin a new file from
> there. That seems to fix the hangs, but now I've got three
> overlapping, interconnected systems for deciding where to end the
> current summary file, and maybe that's OK, but I have a feeling there
> might be a better way.

Could we just recompute the WAL summary for the [redo, end of chunk] for the
relevant summary file?


> Dilip had an interesting potential solution to this problem, which was
> to always emit a special WAL record at the redo pointer. That is, when
> we fix the redo pointer for the checkpoint record we're about to
> write, also insert a WAL record there. That way, when the summarizer
> reaches that sentinel record, it knows it should stop the summary just
> before. I'm not sure whether this approach is viable, especially from
> a performance and concurrency perspective, and I'm not sure whether
> people here would like it, but it does seem like it would make things
> a whole lot simpler for this patch set.

FWIW, I like the idea of a special WAL record at that point, independent of
this feature. It wouldn't be a meaningful overhead compared to the cost of a
checkpoint, and it seems like it'd be quite useful for debugging. But I can
see uses going beyond that - we occasionally have been discussing associating
additional data with redo points, and that'd be a lot easier to deal with
during recovery with such a record.

I don't really see a performance and concurrency angle right now - what are
you wondering about?


> Another thing that I'm not too sure about is: what happens if we find
> a relation file on disk that doesn't appear in the backup_manifest for
> the previous backup and isn't mentioned in the WAL summaries either?

Wouldn't that commonly happen for unlogged relations at least?

I suspect there's also other ways to end up with such additional files,
e.g. by crashing during the creation of a new relation.


> A few less-serious problems with the patch:
> 
> - We don't have an incremental JSON parser, so if you have a
> backup_manifest>1GB, pg_basebackup --incremental is going to fail.
> That's also true of the existing code in pg_verifybackup, and for the
> same reason. I talked to Andrew Dunstan at one point about adapting
> our JSON parser to support incremental parsing, and he had a patch for
> that, but I think he found some problems with it and I'm not sure what
> the current status is.

As a stopgap measure, can't we just use the relevant flag to allow larger
allocations?


> - The patch does support differential backup, aka an incremental atop
> another incremental. There's no particular limit to how long a chain
> of backups can be. However, pg_combinebackup currently requires that
> the first backup is a full backup and all the later ones are
> incremental backups. So if you have a full backup a and an incremental
> backup b and a differential backup c, you can combine a b and c to get
> a full backup equivalent to one you would have gotten if you had taken
> a full backup at the time you took c. However, you can't combine b and
> c with each other without combining them with a, and that might be
> desirable in some situations. You might want to collapse a bunch of
> older differential backups into a single one that covers the whole
> time range of all of them. I think that the file format can support
> that, but the tool is currently too dumb.

That seems like a feature for the future...


> - We only know how to operate on directories, not tar files. I thought
> about that when working on pg_verifybackup as well, but I didn't do
> anything about it. It would be nice to go back and make that tool work
> on tar-format backups, and this one, too. I don't think there would be
> a whole lot of point trying to operate on compressed tar files because
> you need random access and that seems hard on a compressed file, but
> on uncompressed files it seems at least theoretically doable. I'm not
> sure whether anyone would care that much about this, though, even
> though it does sound pretty cool.

I don't know the tar format well, but my understanding is that it doesn't have
a "central metadata" portion. I.e. doing something like this would entail
scanning the tar file sequentially, skipping file contents?  And wouldn't you
have to create an entirely new tar file for the modified output? That kind of
makes it not so incremental ;)

IOW, I'm not sure it's worth bothering about this ever, and certainly doesn't
seem worth bothering about now. But I might just be missing something.


Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Hannu Krosing
Дата:
Сообщение: Re: Let's make PostgreSQL multi-threaded
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Let's make PostgreSQL multi-threaded