Re: [PATCH] Replica sends an incorrect epoch in its hot standbyfeedback to the Master
От | Thomas Munro |
---|---|
Тема | Re: [PATCH] Replica sends an incorrect epoch in its hot standbyfeedback to the Master |
Дата | |
Msg-id | CA+hUKG+eCrs7kzpdGWDXLaGtrvRsk-a-bwTKY=wT=-oxfr-Wtw@mail.gmail.com обсуждение исходный текст |
Ответ на | [PATCH] Replica sends an incorrect epoch in its hot standby feedbackto the Master ("Palamadai, Eka" <ekanatha@amazon.com>) |
Ответы |
Re: [PATCH] Replica sends an incorrect epoch in its hot standbyfeedback to the Master
|
Список | pgsql-hackers |
On Fri, Feb 7, 2020 at 1:03 PM Palamadai, Eka <ekanatha@amazon.com> wrote: > The below problem occurs in Postgres versions 11, 10, and 9.6. However, it doesn’t occur since Postgres version 12, sincethe commit [6] to add basic infrastructure for 64-bit transaction IDs indirectly fixed it. I'm happy that that stuff is already fixing bugs we didn't know we had, but, yeah, it looks like it really only fixed it incidentally by moving all the duplicated "assign if higher" code into a function, not through the magical power of 64 bit xids. > The replica sends an incorrect epoch in its hot standby feedback to the master in the scenario outlined below, where acheckpoint is interleaved with the execution of 2 transactions at the master. The incorrect epoch in the feedback causesthe master to ignore the “oldest Xmin” X sent by the replica. If a heap page prune[1] or vacuum were executed at themaster immediately thereafter, they may use a newer “oldest Xmin” Y > X, and prematurely delete a tuple T such that X< t_xmax (T) < Y, which is still in use at the replica as part of a long running read query Q. Subsequently, when the replicareplays the deletion of T as part of its WAL replay, it cancels the long running query Q causing unnecessary painto customers. Ouch. Thanks for this analysis! > The variable “ShmemVariableCache->nextXid” (or “nextXid” for short) should be monotonically increasing unless it wrapsaround to the next epoch. However, in the above sequence, this property is violated on the replica in the function “RecordKnownAssignedTransactionIds”[3],when the WAL replay for the insertion at step 6 is executed at the replica. I haven't tried your repro or studied this closely yet, but yes, that assignment to nextXid does indeed look pretty fishy. Other similar code elsewhere always does a check like in your patch, before clobbering nextXid.
В списке pgsql-hackers по дате отправления: