Re: prevent immature WAL streaming
От | Alvaro Herrera |
---|---|
Тема | Re: prevent immature WAL streaming |
Дата | |
Msg-id | 202111232040.fzyfnrdwtxu6@alvherre.pgsql обсуждение исходный текст |
Ответ на | Re: prevent immature WAL streaming (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: prevent immature WAL streaming
|
Список | pgsql-hackers |
On 2021-Nov-23, Tom Lane wrote: > We're *still* not out of the woods with 026_overwrite_contrecord.pl, > as we are continuing to see occasional "mismatching overwritten LSN" > failures, further down in the test where it tries to start up the > standby: Augh. > Looking at adjacent successful runs, it seems that the exact point > where the "missing contrecord" starts varies substantially, even after > our previous fix to disable autovacuum in this test. How could that be? Well, there is intentionally some variability. Maybe not as much as one would wish, but I expect that that should explain why that point is not always the same. > It's probably for the best though, because I think this is exposing > an actual bug that we would not have seen if the start point were > completely consistent. I have not dug into the code, but it looks to > me like if the "consistent recovery state" is reached exactly at a > page boundary (0/1FFE000 in all these cases), then the standby expects > that to be what the OVERWRITE_CONTRECORD record will point at. But > actually it points to the first WAL record on that page, resulting > in a bogus failure. So what is happening is that we set state->overwrittenRecPtr to the LSN of page start, ignoring the page header. Is that the LSN of the first record in a page? I'll see if I can reproduce the problem. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ "La persona que no quería pecar / estaba obligada a sentarse en duras y empinadas sillas / desprovistas, por cierto de blandos atenuantes" (Patricio Vogel)
В списке pgsql-hackers по дате отправления: