Re: Loss of replication after simple misconfiguration

Поиск
Список
Период
Сортировка
От hubert depesz lubaczewski
Тема Re: Loss of replication after simple misconfiguration
Дата
Msg-id 20200410072651.GA16098@depesz.com
обсуждение исходный текст
Ответ на Re: Loss of replication after simple misconfiguration  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Loss of replication after simple misconfiguration  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-bugs
On Fri, Apr 10, 2020 at 01:14:34PM +0900, Michael Paquier wrote:
> Hmm.  We have a gap in tests here as we don't have any tests stressing
> switchovers when it comes to track_commit_timestamps.  Anyway, could
> you confirm that I got the problem right?  Here is the flow I am getting
> from the information of upthread, roughly:
> 1) Primary/standby cluster, both using max_worker_processes = 8, and
> track_commit_timestamp = off.
> 2) In order to begin the switchover, first stop cleanly the primary.
> 3) Update configuration of the standby as follows, promote it and
> restart it:
> track_commit_timestamp = on
> max_worker_processes = 50
> 4) Enable streaming on the old primary to make it a standby, starting
> it fails because of the unmatching setting for max_worker_processes.
> 5) Re-adjust max_worker_processes correctly on the new standby, start
> it.  Then this startup should fail at the lookup of pg_commit_ts/.

Well, no.

In our case it was *at least* this scenario:

1. master and slave both with max_worker_processes and
track_commit_timestamp off.
2. config files get changed on both to include track_commit_timestamp on
3. slave gets restarted
4. config files get changed on both to include max_worker_processes = 50
5. master gets stopped by "power outage"
6. after master re-starts, replication to slave dies.

but it could have been also different scenario

1. master and slave both with max_worker_processes and
track_commit_timestamp off.
2. config files get changed on both to include track_commit_timestamp on
3. slave gets restarted (or maybe not, we can't be sure)
4. config files get changed on both to include max_worker_processes = 50
5. set of 2 new slaves (slave2 and slave3) are setup off slave, both
   with max_worker_processes = 50, and track_commit_timestamps = on
6. slave3 is modified to stream off slave2
7. master crash
8. after restars one of slaves (many?) lost its replication

Andrew suggested yesterday on IRC that it could be timing issue, so
testing for it might be complicated - hence my inability to replicate
the problem in test environment.

I will try to do the tests using extended scenarios with slave2 and
slave3, but I'm not overly optimistic about replicating this particular
case.

Best regards,

depesz




В списке pgsql-bugs по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Loss of replication after simple misconfiguration
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Loss of replication after simple misconfiguration