Re: logical decoding and replication of sequences
От | Tomas Vondra |
---|---|
Тема | Re: logical decoding and replication of sequences |
Дата | |
Msg-id | 2cd38bab-c874-8e0b-98e7-d9abaaf9806a@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: logical decoding and replication of sequences (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Ответы |
Re: logical decoding and replication of sequences
Re: logical decoding and replication of sequences |
Список | pgsql-hackers |
Hi, Here's a slightly improved version of the patch, fixing a couple issues I failed to notice. It also addresses a couple of the issues described in the last message, although mostly to show what would need to be done. 1) Handle sequences dropped in the xact by calling try_relation_open, and just doing nothing if not found. Otherwise we'd get a failure in reorderbuffer, when decoding the change. 2) Fixed nextval_internal() to log the correct last_value (the one we write into WAL). 3) Reread the sequence state in AtEOXact_Sequences, to deal with the ordering issue described before. This makes (2) somewhat pointless, because we just read whatever is on disk at that point. But having both makes it easier to experiment / see what'd happen. 4) Log the stats in DefineSequence() - Without this we'd not have the initial sequence state in the WAL, because only nextval/setval etc. do the logging. The old approach (decoding the sequence tuple) does not have this issue. The (3) changes the behavior in a somewhat strange way. Consider this case with two concurrent transactions: T1: BEGIN; T2: BEGIN; T1: SELECT nextval('s') FROM generate_series(1,100) s(i); T2: SELECT nextval('s') FROM generate_series(1,100) s(i); T1: COMMIT; T2: COMMIT; The result is that both transactions have used the same sequence, and so will re-read the state from disk. But at that point the state is exactly the same, so we'll log the same thing twice. There's a much deeper issue, though. The current patch only logs the sequence if the session generated WAL when incrementing the sequence (which happens every 32 values). But other sessions may already use values from this range, so consider for example this: T1: BEGIN; T1: SELECT nextval('s') FROM generate_series(1,100) s(i); T2: BEGIN; T2: SELECT nextval('s'); T2: COMMIT; T1: ROLLBACK; Which unfortunately means T2 already used a value, but the increment may not be logged at that time (or ever). This seems like a fatal issue, because it means we need to log *all* sequences the transaction touches, not just those that wrote the increment to WAL. That might still work for large transactions consuming many sequence values, but it's pretty inefficient for small OLTP transactions that only need one or two values from the sequence. So I think just decoding the sequence tuples is a better solution - for large transactions (consuming many values from the sequence) it may be more expensive (i.e. send more records to replica). But I doubt that matters too much - it's likely negligible compared to other data for large transactions. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления: