Re: Synchronous commit behavior during network outage
От | Andrey Borodin |
---|---|
Тема | Re: Synchronous commit behavior during network outage |
Дата | |
Msg-id | 8848B234-F534-44BE-9EE8-43BC6D28B297@yandex-team.ru обсуждение исходный текст |
Ответ на | Re: Synchronous commit behavior during network outage (Jeff Davis <pgsql@j-davis.com>) |
Ответы |
Re: Synchronous commit behavior during network outage
|
Список | pgsql-hackers |
> 29 июня 2021 г., в 23:35, Jeff Davis <pgsql@j-davis.com> написал(а): > > On Tue, 2021-06-29 at 11:48 +0500, Andrey Borodin wrote: >>> 29 июня 2021 г., в 03:56, Jeff Davis <pgsql@j-davis.com> >>> написал(а): >>> >>> The patch may be somewhat controversial, so I'll wait for feedback >>> before documenting it properly. >> >> The patch seems similar to [0]. But I like your wording :) >> I'd be happy if we go with any version of these idea. > > Thank you, somehow I missed that one, we should combine the CF entries. > > My patch also covers the backend termination case. Is there a reason > you left that case out? Yes, backend termination is used by HA tool before rewinding the node. Initially I was considering termination as PANIC andgot a ton of coredumps during failovers on drills. There is one more caveat we need to fix: we should prevent instant recovery from happening. HA tool must know that our processwas restarted. Consider following scenario: 1. Node A is primary with sync rep. 2. A is going through network partitioning, somewhere node B is promoted. 3. All backends of A are stuck in sync rep, until HA tool discovers A is failed node. 4. One backend crashes with segfault in some buggy extension or OOM or whatever 5. Postgres server is doing restartless crash recovery making local-but-not-replicated data visible. We should prevent 5 also as we prevent cancels. HA tool will discover postmaster fail and will recheck in coordinatino systemthat it can raise up Postgres locally. Thanks! Best regards, Andrey Borodin.
В списке pgsql-hackers по дате отправления: