Re: logical decoding and replication of sequences, take 2
От | Tomas Vondra |
---|---|
Тема | Re: logical decoding and replication of sequences, take 2 |
Дата | |
Msg-id | b6f908c8-1b39-3367-dea8-d71093669155@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: logical decoding and replication of sequences, take 2 (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: logical decoding and replication of sequences, take 2
Re: logical decoding and replication of sequences, take 2 |
Список | pgsql-hackers |
Hi, Here's a rebased version of the sequence decoding patch. 0001 is a fix for the pre-existing issue in logicalmsg_decode, attempting to build a snapshot before getting into a consistent state. AFAICS this only affects assert-enabled builds and is otherwise harmless, because we are not actually using the snapshot (apply gets a valid snapshot from the transaction). This is mostly the fix I shared in November, except that I kept the call in decode.c (per comment from Andres). I haven't added any argument to SnapBuildProcessChange because we may need to backpatch this (and it didn't seem much simpler, IMHO). 0002 is a rebased version of the original approach, committed as 0da92dc530 (and then reverted in 2c7ea57e56). This includes the same fix as 0001 (for the sequence messages), the primary reason for the revert. The rebase was not quite straightforward, due to extensive changes in how publications deal with tables/schemas, and so on. So this adopts them, but other than that it behaves just like the original patch. So this abandons the approach with COMMIT-time logging for sequences accessed/modified by the transaction, proposed in response to the revert. It seemed like a good (and simpler) alternative, but there were far too many issues - higher overhead, ordering of records for concurrent transactions, making it reliable, etc. I think the main remaining question is what's the goal of this patch, or rather what "guarantees" we expect from it - what we expect to see on the replica after incrementing a sequence on the primary. Robert described [1] a model and argued the standby should not "invent" new states. It's a long / detailed explanation, I'm not going to try to shorten in here because that'd inevitably omit various details. So better read it whole ... Anyway, I don't think this approach (essentially treating most sequence increments as non-transactional) breaks any consistency guarantees or introduces any "new" states that would not be observable on the primary. In a way, this treats non-transactional sequence increments as separate transactions, and applies them directly. If you read the sequence in between two commits, you might see any "intermediate" state of the sequence - that's the nature of non-transactional changes. We could "postpone" applying the decoded changes until the first next commit, which might improve performance if a transaction is long enough to cover many sequence increments. But that's more a performance optimization than a matter of correctness, IMHO. One caveat is that because of how WAL works for sequences, we're actually decoding changes "ahead" so if you read the sequence on the subscriber it'll actually seem to be slightly ahead (up to ~32 values). This could be eliminated by setting SEQ_LOG_VALS to 0, which however increases the sequence costs, of course. This however brings me to the original question what's the purpose of this patch - and that's essentially keeping sequences up to date to make them usable after a failover. We can't generate values from the sequence on the subscriber, because it'd just get overwritten. And from this point of view, it's also fine that the sequence is slightly ahead, because that's what happens after crash recovery anyway. And we're not guaranteeing the sequences to be gap-less. regards [1] https://www.postgresql.org/message-id/CA%2BTgmoaYG7672OgdwpGm5cOwy8_ftbs%3D3u-YMvR9fiJwQUzgrQ%40mail.gmail.com -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления: