Re: [HACKERS] SERIALIZABLE on standby servers
От | Thomas Munro |
---|---|
Тема | Re: [HACKERS] SERIALIZABLE on standby servers |
Дата | |
Msg-id | CA+hUKGLRUwKNs0w+YzfsNF62O0L_gfbTFOJND1yMTorg95rJOw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] SERIALIZABLE on standby servers (Michael Paquier <michael@paquier.xyz>) |
Список | pgsql-hackers |
On Mon, Jun 15, 2020 at 5:00 PM Michael Paquier <michael@paquier.xyz> wrote: > On Fri, Dec 28, 2018 at 02:21:44PM +1300, Thomas Munro wrote: > > Just to be clear, although this patch is registered in the commitfest > > and currently applies and tests pass, it is prototype/WIP code with > > significant problems that remain to be resolved. I sort of wish there > > were a way to indicate that in the CF but since there isn't, I'm > > saying that here. What I hope to get from Kevin, Simon or other > > reviewers is some feedback on the general approach and problems > > discussed upthread (and other problems and better ideas I might have > > missed). So it's not seriously proposed for commit in this CF. > > No feedback has actually come, so I have moved it to next CF. Having been nerd-sniped by SSI again, I spent some time this weekend rebasing this old patch, making a few improvements, and reformulating the problems to be solved as I see them. It's very roughly based on Kevin Grittner and Dan Ports' description of how you could give SERIALIZABLE a useful meaning on hot standbys. The short version of the theory is that you can make it work like SERIALIZABLE READ ONLY DEFERRABLE by adding a bit of extra information into the WAL stream. Problems: 1. As a prerequisite, we'd need to teach primary servers to make transactions visible in the same order that they log commits. Otherwise, we permit nonsense like seeing TX1 but not TX2 on the primary, and TX2 but not TX1 on the replica. You can probably argue that our read replicas don't satisfy the lower isolation levels, let alone serializable. 2. Similarly, it's probably not OK that PreCommit_CheckForSerializationFailure() determines MySerializableXact->snapshotSafetyAfterThisCommit. That may not happen in exactly the same order as commits are logged. Or maybe there is some argument for why that is OK, based on what we're doing with prepareSeqNo, or maybe we can do something with that to detect disorder. 3. The patch doesn't yet attempt to checkpoint the snapshot safety state. That's needed to start up in a sane state, without having to wait for WAL activity. 4. XactLogSnapshotSafetyRecord() flushes the WAL an extra time after a commit is flushed, which I put in for testing; that's silly... somehow it needs to be better integrated so we don't generate two sync I/Os in a row. 5. You probably want a way to turn off the extra WAL records and SERIALIZABLEXACT consumption if you're using SERIALIZABLE on a primary but not on the standby. Or maybe there is some way to make it come on automatically. I think I have cleared up the matter of xmin tracking for "hypothetical" SERIALIZABLEXACTs mentioned earlier. It's not needed, so should be set to InvalidTransactionId, and I added a comment to explain why. I also wrote a TAP test to exercise this thing. It is the same schedule as src/test/isolation/specs/read-only-anomaly-3.spec, except that transaction 3 runs on a streaming replica. One thing to point out is that this patch only aims to make it so that streaming replicas can't observe a state that would have caused a transaction to abort if it had been observed on the primary. The TAP test still has to insert its own wait-for-LSN loop to make sure step "s1c" is replayed before "s3r" runs. We could use synchronous_commit=remote_apply, and that'd probably work just as well for this particular test, but I'm not sure how to square that with fixing problem #1 above. The perl hackery I used to do overlapping transactions in a TAP test is pretty crufty. I guess we'd ideally have the isolation tester support per-session connection strings, and somehow get some perl code to orchestrate the cluster setup but then run the real isolation tester. Or something like that.
Вложения
В списке pgsql-hackers по дате отправления: