Re: trying again to get incremental backup

Поиск
Список
Период
Сортировка
От Jakub Wartak
Тема Re: trying again to get incremental backup
Дата
Msg-id CAKZiRmwTF8mj24f92mh+fiGiceO+myX6CnuYo6u2EJ+ok+Hy4Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: trying again to get incremental backup  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: trying again to get incremental backup  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi Robert,

On Tue, Dec 19, 2023 at 9:36 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Dec 15, 2023 at 5:36 AM Jakub Wartak
> <jakub.wartak@enterprisedb.com> wrote:
> > I've played with with initdb/pg_upgrade (17->17) and i don't get DBID
> > mismatch (of course they do differ after initdb), but i get this
> > instead:
> >
> >  $ pg_basebackup -c fast -D /tmp/incr2.after.upgrade -p 5432
> > --incremental /tmp/incr1.before.upgrade/backup_manifest
> > WARNING:  aborting backup due to backend exiting before pg_backup_stop
> > was called
> > pg_basebackup: error: could not initiate base backup: ERROR:  timeline
> > 2 found in manifest, but not in this server's history
> > pg_basebackup: removing data directory "/tmp/incr2.after.upgrade"
> >
> > Also in the manifest I don't see DBID ?
> > Maybe it's a nuisance and all I'm trying to see is that if an
> > automated cronjob with pg_basebackup --incremental hits a freshly
> > upgraded cluster, that error message without errhint() is going to
> > scare some Junior DBAs.
>
> Yeah. I think we should add the system identifier to the manifest, but
> I think that should be left for a future project, as I don't think the
> lack of it is a good reason to stop all progress here. When we have
> that, we can give more reliable error messages about system mismatches
> at an earlier stage. Unfortunately, I don't think that the timeline
> messages you're seeing here are going to apply in every case: suppose
> you have two unrelated servers that are both on timeline 1. I think
> you could use a base backup from one of those servers and use it as
> the basis for the incremental from the other, and I think that if you
> did it right you might fail to hit any sanity check that would block
> that. pg_combinebackup will realize there's a problem, because it has
> the whole cluster to work with, not just the manifest, and will notice
> the mismatching system identifiers, but that's kind of late to find
> out that you made a big mistake. However, right now, it's the best we
> can do.
>

OK, understood.

> > The incrementals are being generated , but just for the first (0)
> > segment of the relation?
>
> I committed the first two patches from the series I posted yesterday.
> The first should fix this, and the second relocates parse_manifest.c.
> That patch hasn't changed in a while and seems unlikely to attract
> major objections. There's no real reason to commit it until we're
> ready to move forward with the main patches, but I think we're very
> close to that now, so I did.
>
> Here's a rebase for cfbot.

the v15 patchset (posted yesterday) test results are GOOD:

1. make check-world - GOOD
2. cfbot was GOOD
3. the devel/master bug present in
parse_filename_for_nontemp_relation() seems to be gone (in local
testing)
4. some further tests:
test_across_wallevelminimal.sh - GOOD
test_incr_after_timelineincrease.sh - GOOD
test_incr_on_standby_after_promote.sh - GOOD
test_many_incrementals_dbcreate.sh - GOOD
test_many_incrementals.sh - GOOD
test_multixact.sh - GOOD
test_pending_2pc.sh - GOOD
test_reindex_and_vacuum_full.sh - GOOD
test_repro_assert.sh
test_standby_incr_just_backup.sh - GOOD
test_stuck_walsum.sh - GOOD
test_truncaterollback.sh - GOOD
test_unlogged_table.sh - GOOD
test_full_pri__incr_stby__restore_on_pri.sh - GOOD
test_full_pri__incr_stby__restore_on_stby.sh - GOOD
test_full_stby__incr_stby__restore_on_pri.sh - GOOD
test_full_stby__incr_stby__restore_on_stby.sh - GOOD

5. the more real-world pgbench test with localized segment writes
usigng `\set aid random_exponential...` [1] indicates much greater
efficiency in terms of backup space use now, du -sm shows:

210229  /backups/backups/full
250     /backups/backups/incr.1
255     /backups/backups/incr.2
[..]
348     /backups/backups/incr.13
408     /backups/backups/incr.14 // latest(20th of Dec on 10:40)
6673    /backups/archive/

The DB size was as reported by \l+ 205GB.
That pgbench was running for ~27h (19th Dec 08:39 -> 20th Dec 11:30)
with slow 100 TPS (-R), so no insane amounts of WAL.
Time to reconstruct 14 chained incremental backups was 45mins
(pg_combinebackup -o /var/lib/postgres/17/data /backups/backups/full
/backups/backups/incr.1 (..)  /backups/backups/incr.14).
DB after recovering was OK and working fine.

-J.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Borisov
Дата:
Сообщение: Re: Table AM Interface Enhancements
Следующее
От: Daniel Gustafsson
Дата:
Сообщение: Re: Remove MSVC scripts from the tree