Re: BUG #15323: wal_keep_segments must be >= 1 for WAL archiving +streaming to work
От | Stephen Frost |
---|---|
Тема | Re: BUG #15323: wal_keep_segments must be >= 1 for WAL archiving +streaming to work |
Дата | |
Msg-id | 20180813161904.GR3326@tamriel.snowman.net обсуждение исходный текст |
Ответ на | Re: BUG #15323: wal_keep_segments must be >= 1 for WAL archiving +streaming to work (Andres Freund <andres@anarazel.de>) |
Список | pgsql-bugs |
Greetings, * Andres Freund (andres@anarazel.de) wrote: > On 2018-08-13 11:42:47 -0400, Stephen Frost wrote: > > > > This should really work even without replication slots though. > > > > > > Why? I fail to see what'd be gained by adding "always retain one > > > segment" rule. It'd not make the setup any more reliable. If anything > > > it'd make it harder to spot issues in test setups. > > > > What exactly is wrong with the setup where this should be failing? > > If you want to rely on archiving, you either need to be ok with > arbitrary delays in low activity periods, or use archive timeout. > > If you want to rely on streaming, you need an appropriate WAL retention > policy, i.e. wal_keep_segments or replication slots. > > The setup at hand does doesn't want arbitrary delay in archiving > situations but doesn't use archive_timeout and it retain the necessary > WAL for streaming. The setup doesn't rely on *only* archiving or *only* streaming though- it's set up specifically to work with both and what's happening is that PG is failing to ensure that, when both are used, a replica is able to fully catch up and follow a primary, and there's zero excuse for that as far as I'm concerned. We document explicitly that it should work and it does in almost all cases except this one, and not because we don't know what it'd take to make it work but simply because we fail to account for the primary possibly having removed a WAL file that a replica following the archive will want to start from because that's the last WAL file which was archived. I don't think there's any particular concern about arbitrary delay in archiving situations nor does it have any need for archive_timeout to be set in this case. Setting an archive_timeout likely would also have "solved" this issue, but, again, not because an archive_timeout really needs to be set but because it'd end up papering over this issue. Instead, having archive_timeout would have resulted in a lot of additional WAL ending up being archived than would be actually necessary. I simply don't buy off on these excuses that we're doing the right thing here because there's ways to get around it by using wal_keep_segments or setting archive_timeout or replication slots or just hoping that more data will be written to force another WAL out which will let the replica get a bit farther ahead and then be able to connect to the primary and get access to a WAL file that's still available and just hasn't happened to be removed yet. This could happen when building out a new replica too, btw, as far as I can tell- just restore from a backup and then let the system replay from the WAL archive until it gets to the last WAL segment there and have it try to connect to the primary- if the primary has removed that last WAL segment already and has moved on to the next WAL segment and isn't generating much WAL itself. Thanks! Stephen
Вложения
В списке pgsql-bugs по дате отправления: