Re: Replication failure, slave requesting old segments

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: Replication failure, slave requesting old segments
Дата
Msg-id 20180813135903.GL3326@tamriel.snowman.net
обсуждение исходный текст
Ответ на Re: Replication failure, slave requesting old segments  (Adrian Klaver <adrian.klaver@aklaver.com>)
Список pgsql-general
Greetings,

* Adrian Klaver (adrian.klaver@aklaver.com) wrote:
> On 08/13/2018 05:39 AM, Stephen Frost wrote:
> >* Phil Endecott (spam_from_pgsql_lists@chezphil.org) wrote:
> >>Adrian Klaver wrote:
> >>>On 08/12/2018 02:56 PM, Phil Endecott wrote:
> >>>>Anyway.  Do others agree that my issue was the result of
> >>>>wal_keep_segments=0 ?
> >>>
> >>>Only as a sub-issue of the slave losing contact with the master. The basic
> >>>problem is maintaining two separate operations, archiving and streaming,
> >>>in sync. If either or some combination of both lose synchronization then
> >>>it is anyone's guess on what is appropriate for wal_keep_segments.
> >
> >Uh, no, having an archive_command and a restore_command configures
> >exactly should remove the need to worry about what wal_keep_segments is
> >set to because anything not on the primary really should be available
> >through what's been archived and PG shouldn't have any trouble figuring
> >that out and working with it.
> >
> >If all you've got is streaming replication then, sure, you have no idea
> >what to set wal_keep_segments to because the replica could be offline
> >for an indeterminate amount of time, but as long as you're keeping track
> >of all the WAL through archive_command, that shouldn't be an issue.
>
> Therein lies the rub. As I stated previously the bigger issue is syncing two
> different operations, archiving and streaming.

That's not correct though, there isn't a big issue regarding syncing of
those two operations.

> The OP got caught short
> assuming the archiving would handle the situation where the streaming was
> down for a period. In his particular setup and for this particular situation
> a wal_keep_segments of 1 would have helped. I do not see this as a default
> value though as it depends on too many variables outside the reach of the
> database, mostly notably the success of the archive command.

What's been pointed out here is that even if everything is working
(archive_command, restore_command, etc) there's a possible gap in the
transisition from replay-from-archive and starting streaming replication
where the replica might not be able to start streaming.

That's an entirely *PostgreSQL* issue, as far as I see it, and hasn't
got anything to do with his particular setup except that he managed to
expose the issue.

> First is the
> command even valid, two is the network link reliable, three is there even a
> network link, is there more then one network link, four is the restore
> command valid? That is just of the top of my head, more caffeine and I could
> come up with more. Saying that having archiving, streaming and a
> wal_keep_segments=1 has you covered, is misleading. I don't see it as
> detrimental to performance but I do see more posts down the road from folks
> who are surprised when it does not cover their case. Personally I think it
> better to be up front that this requires more thought or a third party
> solution that has done the thinking.

This is all down to "and you should also monitor to make sure things
continue working" which I certainly agree with but that doesn't mean we
shouldn't fix this issue.

This entire side-discussion feels like it's really off in the weeds.
The next steps which I outlined a while ago seem to still be entirely
appropriate and we should figure out a way to solve this issue so that,
when everything else is working (archive command, restore command,
replica is able to connect to the primary, etc), that PG behaves sanely
and is able to catch up with and connect to the primary and resume
streaming.

Thanks!

Stephen

Вложения

В списке pgsql-general по дате отправления:

Предыдущее
От: Adrian Klaver
Дата:
Сообщение: Re: Replication failure, slave requesting old segments
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: is there any adverse effect on DB if I set autovacuum scalefactor to zero?