Re: logical decoding and replication of sequences, take 2
От | Tomas Vondra |
---|---|
Тема | Re: logical decoding and replication of sequences, take 2 |
Дата | |
Msg-id | 09613730-5ee9-4cc3-82d8-f089be90aa64@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: logical decoding and replication of sequences, take 2 (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Ответы |
Re: logical decoding and replication of sequences, take 2
|
Список | pgsql-hackers |
Hi, Here's new version of this patch series. It rebases the 2023/12/03 version, and there's a couple improvements to address the performance and correctness questions. Since the 2023/12/03 version was posted, there were a couple off-list discussions with several people - with Amit, as mentioned in [1], and then also internally and at pgconf.eu. My personal (very brief) takeaway from these discussions is this: 1) desirability: We want a built-in way to handle sequences in logical replication. I think everyone agrees this is not a way to do distributed sequences in an active-active setups, but that there are other use cases that need this feature - typically upgrades / logical failover. Multiple approaches were discussed (support in logical replication or a separate tool to be executed on the logical replica). Both might work, people usually end up with some sort of custom tool anyway. But it's cumbersome, and the consensus seems the logical rep feature is better. 2) performance: There was concern about the performance impact, and that it affects everyone, including those who don't replicate sequences (as the overhead is mostly incurred before calls to output plugin etc.). I do agree with this, but I don't think sequences can be decoded in a much cheaper way. There was a proposal [2] that maybe we could batch the non-transactional sequences changes in the "next" transaction, and distribute them similarly to SnapBuildDistributeNewCatalogSnapshot() distributes catalog snapshots. But I doubt that'd actually work. Or more precisely - if we can make the code work, I think it would not solve the issue for some common cases. Consider for example a case with many concurrent top-level transactions, making this quite expensive. And I'd bet sequence changes are far more common than catalog changes. However, I think we ultimately agreed that the overhead is acceptable if it only applies to use cases that actually need to decode sequences. So if there was a way to skip sequence decoding when not necessary, that would work. Unfortunately, that can't be based on simply checking which callbacks are defined by the output plugin, because e.g. pgoutput needs to handle both cases (so the callbacks need to be defined). Nor it can be determined based on what's included in the publication (as that's not available that early). The agreement was that the best way is to have a CREATE SUBSCRIPTION option that would instruct the upstream to decode sequences. By default this option is 'off' (because that's the no-overhead case), but it can be enabled for each subscription. This is what 0005 implements, and interestingly enough, this is what an earlier version [3] from 2023/04/02 did. This means that if you add a sequence to the publication, but leave "sequences=off" in CREATE SUBSCRIPTION, the sequence won't be replicated after all. That may seems a bit surprising, and I don't like it, but I don't think there's a better way to do this. 3) correctness: The last point is about making "transactional" flag correct when the snapshot state changes mid-transaction, originally pointed out by Dilip [4]. Per [5] this however happens to work correctly, because while we identify the change as 'non-transactional' (which is incorrect), we immediately throw it again (so we don't try to apply it, which would error-out). One option would be to document/describe this in the comments, per 0006. This means that when ReorderBufferSequenceIsTransactional() returns true, it's correct. But if it returns 'false', it means 'maybe'. I agree it seems a bit strange, but with the extra comments I think it's OK. It simply means that if we get transactional=false incorrectly, we're guaranteed to not process it. Maybe we could rename the function to make this clear from the name. The other solution proposed in the thread [6] was to always decode the relfilenode, and add it to the hash table. 0007 does this, and it works. But I agree this seems possibly worse than 0006 - it means we may be adding entries to the hash table, and it's not clear when exactly we'll clean them up etc. It'd be the only place processing stuff before the snapshots reaches FULL. I personally would go with 0006, i.e. just explaining why doing it this way is correct. regards [1] https://www.postgresql.org/message-id/12822961-b7de-9d59-dd27-2e3dc3980c7e%40enterprisedb.com [2] https://www.postgresql.org/message-id/CAFiTN-vm3-bGfm-uJdzRLERMHozW8xjZHu4rdmtWR-rP-SJYMQ%40mail.gmail.com [3] https://www.postgresql.org/message-id/1f96b282-cb90-8302-cee8-7b3f5576a31c%40enterprisedb.com [4] https://www.postgresql.org/message-id/CAFiTN-vAx-Y%2B19ROKOcWnGf7ix2VOTUebpzteaGw9XQyCAeK6g%40mail.gmail.com [5] https://www.postgresql.org/message-id/CAA4eK1LFise9iN%2BNN%3Dagrk4prR1qD%2BebvzNjKAWUog2%2Bhy3HxQ%40mail.gmail.com [6] https://www.postgresql.org/message-id/CAFiTN-sYpyUBabxopJysqH3DAp4OZUCTi6m_qtgt8d32vDcWSA%40mail.gmail.com -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
- v20240111-0001-Logical-decoding-of-sequences.patch
- v20240111-0002-Add-decoding-of-sequences-to-test_decoding.patch
- v20240111-0003-Add-decoding-of-sequences-to-built-in-repl.patch
- v20240111-0004-global-hash-table-of-sequence-relfilenodes.patch
- v20240111-0005-CREATE-SUBSCRIPTION-flag-to-enable-sequenc.patch
- v20240111-0006-add-comment-about-SNAPBUILD_FULL.patch
- v20240111-0007-decode-all-sequence-relfilenodes.patch
В списке pgsql-hackers по дате отправления: