Re: Commit to primary with unavailable sync standby

Поиск
Список
Период
Сортировка
От Fabio Ugo Venchiarutti
Тема Re: Commit to primary with unavailable sync standby
Дата
Msg-id ff343599-1fa2-9b85-f1bd-257275c477bb@ocado.com
обсуждение исходный текст
Ответ на Re: Commit to primary with unavailable sync standby  (Maksim Milyutin <milyutinma@gmail.com>)
Ответы Re: Commit to primary with unavailable sync standby  (Maksim Milyutin <milyutinma@gmail.com>)
Список pgsql-general

On 19/12/2019 13:58, Maksim Milyutin wrote:
> On 19.12.2019 14:04, Andrey Borodin wrote:
>
>> Hi!
>
>
> Hi!
>
> FYI, this topic was up recently in -hackers
> https://www.postgresql.org/message-id/CAEET0ZHG5oFF7iEcbY6TZadh1mosLmfz1HLm311P9VOt7Z+jeg@mail.gmail.com
>
>
>
>> I cannot figure out proper way to implement safe HA upsert. I will be
>> very grateful if someone would help me.
>>
>> Imagine we have primary server after failover. It is
>> network-partitioned. We are doing INSERT ON CONFLICT DO NOTHING; that
>> eventually timed out.
>>
>> az1-grx88oegoy6mrv2i/db1 M > WITH new_doc AS (
>>      INSERT INTO t(
>>          pk,
>>          v,
>>          dt
>>      )
>>      VALUES
>>      (
>>          5,
>>          'text',
>>          now()
>>      )
>>      ON CONFLICT (pk) DO NOTHING
>>      RETURNING pk,
>>                v,
>>                dt)
>>     SELECT new_doc.pk from new_doc;
>> ^CCancel request sent
>> WARNING:  01000: canceling wait for synchronous replication due to
>> user request
>> DETAIL:  The transaction has already committed locally, but might not
>> have been replicated to the standby.
>> LOCATION:  SyncRepWaitForLSN, syncrep.c:264
>> Time: 2173.770 ms (00:02.174)
>>
>> Here our driver decided that something goes wrong and we retry query.
>>
>> az1-grx88oegoy6mrv2i/db1 M > WITH new_doc AS (
>>      INSERT INTO t(
>>          pk,
>>          v,
>>          dt
>>      )
>>      VALUES
>>      (
>>          5,
>>          'text',
>>          now()
>>      )
>>      ON CONFLICT (pk) DO NOTHING
>>      RETURNING pk,
>>                v,
>>                dt)
>>     SELECT new_doc.pk from new_doc;
>>   pk
>> ----
>> (0 rows)
>>
>> Time: 4.785 ms
>>
>> Now we have split-brain, because we acknowledged that row to client.
>> How can I fix this?
>>
>> There must be some obvious trick, but I cannot see it... Or maybe
>> cancel of sync replication should be disallowed and termination should
>> be treated as system failure?
>>
>
> I think the most appropriate way to handle such issues is to catch by
> client driver such warnings (with message about local commit) and mark
> the status of posted transaction as undetermined. If connection with
> sync replica will come back then this transaction eventually commits but
> after triggering of autofailover and *not replicating this commit to
> replica* this commit aborts. Therefore client have to wait some time
> (that exceeds the duration of autofailover) and check (logically based
> on committed data) the status of commit.
>
> The problem here is the locally committed data becomes visible to future
> transactions (before autofailover) that violates the property of
> consistent reading from master. IMO the more correct behavior for
> PostgreSQL here is to ignore any cancel / termination queries when
> backend is in status of waiting response from sync replicas.
>
> However, there is another way to get locally applied commits via restart
> of master after initial recovery. This case is described in doc
> https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION-HA
> . But here HA orchestrator agent can close access from external users
> (via pg_hba.conf manipulations) until PostgreSQL instance synchronizes


And this is where the unsafety lies: that assumes that the isolated
master is in enough of a sane state to apply a self-ban (and that can do
it in near-zero time).


Although the retry logic in Andrey's case is probably not ideal (and you
offered a more correct approach to synchronous commit), there are many
"grey area" failure modes that in his scenario would either prevent a
given node from sealing up fast enuogh if at all (eg: PID congestion
causing fork()/system() to fail while backends are already up and
happily flushing WAL).


This is particularly relevant to situations when only a subset of
critical transactions set synchronous_commit to remote_*: it'd still be
undesirable to sink "tier 2" data in a stale primary for any significant
length of time).


Distributed systems like Etcd and Cassandra have a notion of
"coordination node" in the context of a request (not having to deal with
an "authoritative" transaction makes it easier).


In the case of postgres (or any RDBMS, really), all I can think of is
either an inline proxy performing some validation as part of the
forwarding (which is what we did internally but that has not been green
lit for FOSS :( ) or some logic in the backend that rejects asynchronous
commits too if some condition is not met (eg: <quorum - 1> synchronous
standby nodes not present - a builtin version of the pg_stat_replication
look-aside CTE I suggested earlier).





> its changes with all sync replicas as it's implemented in Stolon
>
https://github.com/sorintlab/stolon/blob/master/doc/syncrepl.md#handling-postgresql-sync-repl-limits-under-such-circumstances

> .
>
>
> Best regards,
> Maksim Milyutin
>
>



--
Regards

Fabio Ugo Venchiarutti
OSPCFC Network Engineering Dpt.
Ocado Technology

--


Notice:
This email is confidential and may contain copyright material of
members of the Ocado Group. Opinions and views expressed in this message
may not necessarily reflect the opinions and views of the members of the
Ocado Group.

If you are not the intended recipient, please notify us
immediately and delete all copies of this message. Please note that it is
your responsibility to scan this message for viruses.

References to the
"Ocado Group" are to Ocado Group plc (registered in England and Wales with
number 7098618) and its subsidiary undertakings (as that expression is
defined in the Companies Act 2006) from time to time. The registered office
of Ocado Group plc is Buildings One & Two, Trident Place, Mosquito Way,
Hatfield, Hertfordshire, AL10 9UL.



В списке pgsql-general по дате отправления:

Предыдущее
От: Andrew Gierth
Дата:
Сообщение: Re: Max locks
Следующее
От: Steven Winfield
Дата:
Сообщение: RE: Row locks, SKIP LOCKED, and transactions