Re: sequences vs. synchronous replication
От | Tomas Vondra |
---|---|
Тема | Re: sequences vs. synchronous replication |
Дата | |
Msg-id | 7869761b-fdc7-6f7d-dc95-9ba8315a8ecf@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: sequences vs. synchronous replication (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Ответы |
Re: sequences vs. synchronous replication
|
Список | pgsql-hackers |
On 12/18/21 07:00, Tomas Vondra wrote: > > > On 12/18/21 05:52, Tom Lane wrote: >> Tomas Vondra <tomas.vondra@enterprisedb.com> writes: >>> The problem is exactly the same as in [1] - the aborted transaction >>> generated WAL, but RecordTransactionAbort() ignores that and does not >>> update LogwrtResult.Write, with the reasoning that aborted transactions >>> do not matter. But sequences violate that, because we only write WAL >>> once every 32 increments, so the following nextval() gets "committed" >>> without waiting for the replica (because it did not produce WAL). >> >> Ugh. >> >>> I'm not sure this is a clear data corruption bug, but it surely walks >>> and quacks like one. My proposal is to fix this by tracking the lsn of >>> the last LSN for a sequence increment, and then check that LSN in >>> RecordTransactionCommit() before calling XLogFlush(). >> >> (1) Does that work if the aborted increment was in a different >> session? I think it is okay but I'm tired enough to not be sure. >> > > Good point - it doesn't :-( At least not by simply storing LSN in a > global variable or something like that. > > The second backend needs to know the LSN of the last WAL-logged sequence > increment, but only the first backend knows that. So we'd need to share > that between backends somehow. I doubt we want to track LSN for every > individual sequence (because for clusters with many dbs / sequences that > may be a lot). > > Perhaps we could track just a fixed number o LSN values in shared memory > (say, 1024), and update/read just the element determined by hash(oid). > That is, the backend WAL-logging sequence with given oid would set the > current LSN to array[hash(oid) % 1024], and backend doing nextval() > would simply remember the LSN in that slot. Yes, if there are conflicts > that'll flush more than needed. > Here's a PoC demonstrating this idea. I'm not convinced it's the right way to deal with this - it surely seems more like a duct tape fix than a clean solution. But it does the trick. I wonder if storing this in shmem is good enough - we lose the LSN info on restart, but the checkpoint should trigger FPI which makes it OK. A bigger question is whether sequences are the only thing affected by this. If you look at RecordTransactionCommit() then we skip flush/wait in two cases: 1) !wrote_xlog - if the xact did not produce WAL 2) !markXidCommitted - if the xact does not have a valid XID Both apply to sequences, and the PoC patch tweaks them. But maybe there are other places where we don't generate WAL and/or assign XID in some cases, to save time? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления: