Re: trying again to get incremental backup

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: trying again to get incremental backup
Дата
Msg-id CA+TgmoafkMay-O7KFyxwYxaT5pZ+ku+qM8hLM88ZzEyq2g-LtQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: trying again to get incremental backup  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Ответы Re: trying again to get incremental backup  (Dilip Kumar <dilipbalaut@gmail.com>)
Re: trying again to get incremental backup  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Re: trying again to get incremental backup  (Jakub Wartak <jakub.wartak@enterprisedb.com>)
Re: trying again to get incremental backup  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Список pgsql-hackers
On Mon, Nov 13, 2023 at 11:25 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Great stuff you got here.  I'm doing a first pass trying to grok the
> whole thing for more substantive comments, but in the meantime here are
> some cosmetic ones.

Thanks, thanks, and thanks.

I've fixed some things that you mentioned in the attached version.
Other comments below.

> In blkreftable.c, I think the definition of SH_EQUAL should have an
> outer layer of parentheses.  Also, it would be good to provide and use a
> function to initialize a BlockRefTableKey from the RelFileNode and
> forknum components, and ensure that any padding bytes are zeroed.
> Otherwise it's not going to be a great hash key.  On my platform there
> aren't any (padding bytes), but I think it's unwise to rely on that.

I'm having trouble understanding the second part of this suggestion.
Note that in a frontend context, SH_RAW_ALLOCATOR is pg_malloc0, and
in a backend context, we get the default, which is
MemoryContextAllocZero. Maybe there's some case this doesn't cover,
though?

> These forward struct declarations are not buying you anything, I'd
> remove them:

I've had problems from time to time when I don't do this. I'll remove
it here, but I'm not convinced that it's always useless.

> I don't much like the way header files in src/bin/pg_combinebackup files
> are structured.  Particularly, causing a simplehash to be "instantiated"
> just because load_manifest.h is included seems poised to cause pain.  I
> think there should be a file with the basic struct declarations (no
> simplehash); and then maybe since both pg_basebackup and
> pg_combinebackup seem to need the same simplehash, create a separate
> header file containing just that..  But, did you notice that anything
> that includes reconstruct.h will instantiate the simplehash stuff,
> because it includes load_manifest.h?  It may be unwise to have the
> simplehash in a header file.  Maybe just declare it in each .c file that
> needs it.  The duplicity is not that large.

I think that I did this correctly. AIUI, if you're defining a
simplehash that only one source file needs, you make the scope
"static" and do both SH_DECLARE and SH_DEFILE it in that file. If you
need it to be shared by multiple files, you make it "extern" in the
header file, do SH_DECLARE there, and SH_DEFINE in one of those source
files. Or you could make the scope "static inline" in the header file
and then you'd both SH_DECLARE and SH_DEFINE it in the header file.

If I were to do as you suggest here, I think I'd end up with 2 copies
of the compiled code for this instead of one, and if they ever got out
of sync everything would break silently.

> Why leave unnamed arguments in function declarations?  For example, in
>
> static void manifest_process_file(JsonManifestParseContext *,
>                                   char *pathname,
>                                   size_t size,
>                                   pg_checksum_type checksum_type,
>                                   int checksum_length,
>                                   uint8 *checksum_payload);
> the first argument lacks a name.  Is this just an oversight, I hope?

I mean, I've changed it now, but I don't think it's worth getting too
excited about. "int checksum_length" is much better documentation than
just "int," but "JsonManifestParseContext *context" is just noise,
IMHO. You can argue that it's better for consistency that way, but
whatever.

> In GetFileBackupMethod(), which arguments are in and which are out?
> The comment doesn't say, and it's not obvious why we pass both the file
> path as well as the individual constituent pieces for it.

The header comment does document which values are potentially set on
return. I guess I thought it was clear enough that the stuff not
documented to be output parameters was input parameters. Most of them
aren't even pointers, so they have to be input parameters. The only
exception is 'path', which I have some difficulty thinking that anyone
is going to imagine to be an input pointer.

Maybe you could propose a more specific rewording of this comment?
FWIW, I'm not altogether sure whether this function is going to get
more heavily adjusted in a rev or three of the patch set, so maybe we
want to wait to sort this out until this is closer to final, but OTOH
if I know what you have in mind for the current version, I might be
more likely to keep it in a good place if I end up changing it.

> DO_NOT_BACKUP_FILE appears not to be set anywhere.  Do you expect to use
> this later?  If not, maybe remove it.

Woops, that was a holdover from an earlier version.

> There are two functions named record_manifest_details_for_file() in
> different programs.  I think this sort of arrangement is not great, as
> it is confusing confusing to follow.  It would be better if those two
> routines were called something like, say, verifybackup_perfile_cb and
> combinebackup_perfile_cb instead; then in the function comment say
> something like
> /*
>  * JsonManifestParseContext->perfile_cb implementation for pg_combinebackup.
>  *
>  * Record details extracted from the backup manifest for one file,
>  * because we like to keep things tracked or whatever.
>  */
> so it's easy to track down what does what and why.  Same with
> perwalrange_cb.  "perfile" looks bothersome to me as a name entity.  Why
> not per_file_cb? and per_walrange_cb?

I had trouble figuring out how to name this stuff. I did notice the
awkwardness, but surely nobody can think that two functions with the
same name in different binaries can be actually the same function.

If we want to inject more underscores here, my vote is to go all the
way and make it per_wal_range_cb.

> In walsummarizer.c, HandleWalSummarizerInterrupts is called in
> summarizer_read_local_xlog_page but SummarizeWAL() doesn't do that.
> Maybe it should?

I replaced all the CHECK_FOR_INTERRUPTS() in that file with
HandleWalSummarizerInterrupts(). Does that seem right?

> I think this path is not going to be very human-likeable.
>                 snprintf(final_path, MAXPGPATH,
>                                  XLOGDIR "/summaries/%08X%08X%08X%08X%08X.summary",
>                                  tli,
>                                  LSN_FORMAT_ARGS(summary_start_lsn),
>                                  LSN_FORMAT_ARGS(summary_end_lsn));
> Why not add a dash between the TLI and between both LSNs, or something
> like that?  (Also, are we really printing TLIs as 8-byte hexs?)

Dealing with the last part first, we already do that in every WAL file
name. I actually think these file names are easier to work with than
WAL file names, because 000000010000000000000020 is not the WAL
starting at 0/20, but rather the WAL starting at 0/20000000. To know
at what LSN a WAL file starts, you have to mentally delete characters
17 through 22, which will always be zero, and instead add six zeroes
at the end. I don't think whoever came up with that file naming
convention deserves an award, unless it's a raspberry award. With
these names, you get something like
0000000100000000015125B800000000015128F0.summary and you can sort of
see that 1512 repeats so the LSN went from something ending in 5B8 to
something ending in 8F0. I actually think it's way better.

But I have a hard time arguing that it wouldn't be more readable still
if we put some separator characters in there. I didn't do that because
then they'd look less like WAL file names, but maybe that's not really
a problem. A possible reason not to bother is that these files are
less necessary for humans to care about than WAL files, since they
don't need to be archived or transported between nodes in any way.
Basically I think this is probably fine the way it is, but if you or
others think it's really important to change it, I can do that. Just
as long as we don't spend 50 emails arguing about which separator
character to use.

--
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Christensen
Дата:
Сообщение: Re: [PATCHES] Post-special page storage TDE support
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Split index and table statistics into different types of stats