Re: Synchronous commit behavior during network outage

Поиск

Список

Период

Сортировка

От	Andrey Borodin
Тема	Re: Synchronous commit behavior during network outage
Дата	30 июня 2021 г. 12:28:28
Msg-id	8848B234-F534-44BE-9EE8-43BC6D28B297@yandex-team.ru обсуждение исходный текст
Ответ на	Re: Synchronous commit behavior during network outage (Jeff Davis <pgsql@j-davis.com>)
Ответы	Re: Synchronous commit behavior during network outage
Список	pgsql-hackers

Дерево обсуждения

> 29 июня 2021 г., в 23:35, Jeff Davis <pgsql@j-davis.com> написал(а):
>
> On Tue, 2021-06-29 at 11:48 +0500, Andrey Borodin wrote:
>>> 29 июня 2021 г., в 03:56, Jeff Davis <pgsql@j-davis.com>
>>> написал(а):
>>>
>>> The patch may be somewhat controversial, so I'll wait for feedback
>>> before documenting it properly.
>>
>> The patch seems similar to [0]. But I like your wording :)
>> I'd be happy if we go with any version of these idea.
>
> Thank you, somehow I missed that one, we should combine the CF entries.
>
> My patch also covers the backend termination case. Is there a reason
> you left that case out?
Yes, backend termination is used by HA tool before rewinding the node. Initially I was considering termination as PANIC
andgot a ton of coredumps during failovers on drills. 

There is one more caveat we need to fix: we should prevent instant recovery from happening. HA tool must know that our
processwas restarted.  
Consider following scenario:
1. Node A is primary with sync rep.
2. A is going through network partitioning, somewhere node B is promoted.
3. All backends of A are stuck in sync rep, until HA tool discovers A is failed node.
4. One backend crashes with segfault in some buggy extension or OOM or whatever
5. Postgres server is doing restartless crash recovery making local-but-not-replicated data visible.

We should prevent 5 also as we prevent cancels. HA tool will discover postmaster fail and will recheck in coordinatino
systemthat it can raise up Postgres locally. 

Thanks!

Best regards, Andrey Borodin.

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Synchronous commit behavior during network outage