Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
От | Melanie Plageman |
---|---|
Тема | Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae |
Дата | |
Msg-id | CAAKRu_Z50WSPWLYg-2NC4TDBSyTLMRL_jG=K+txByTAeu5nNXA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae (Melanie Plageman <melanieplageman@gmail.com>) |
Список | pgsql-bugs |
On Thu, Jun 20, 2024 at 11:49 AM Melanie Plageman <melanieplageman@gmail.com> wrote: > > On Tue, Jun 18, 2024 at 6:51 PM Melanie Plageman > <melanieplageman@gmail.com> wrote: > > > > Finally, upthread there is discussion of how we could end up doing a > > catalog lookup after vacuum_get_cutoffs() and before the tuple > > visibility check on 16. Assuming this is true, we would want to > > backport the fix to 16 as well. I could use some help getting a repro > > (using btree index deletion for example) of the infinite loop on 16. > > So, I ended up working on a new repro that works by forcing a round of > index vacuuming after the standby reconnects and before pruning a dead > tuple whose xmax is older than OldestXmin. > > At the end of the round of index vacuuming, _bt_pendingfsm_finalize() > calls GetOldestNonRemovableTransactionId(), thereby updating the > backend's GlobalVisState and moving maybe_needed backwards. > > Then vacuum's first pass will continue with pruning and find our later > inserted and updated tuple HEAPTUPLE_RECENTLY_DEAD when compared to > maybe_needed but HEAPTUPLE_DEAD when compared to OldestXmin. > > I make sure that the standby reconnects between vacuum_get_cutoffs() > (vacuum_set_xid_limits() on 14/15) and pruning because I have a cursor > on the page keeping VACUUM FREEZE from getting a cleanup lock. > > See the repros for step-by-step explanations of how it works. > > With this, I can repro the infinite loop on 14-16. > > Backporting 1ccc1e05ae fixes 16 but, with the new repro, 14 and 15 > error out with "cannot freeze committed xmax". I'm going to > investigate further why this is happening. It definitely makes me > wonder about the fix. It turns out it was also erroring out on 16 (i.e. backporting 1ccc1e05ae did not fix anything), but I didn't notice it because the perl TAP test passed. I also discovered we can hit this error in master, so I started a thread about that here [1]. - Melanie [1] https://www.postgresql.org/message-id/CAAKRu_bDD7oq9ZwB2OJqub5BovMG6UjEYsoK2LVttadjEqyRGg%40mail.gmail.com
В списке pgsql-bugs по дате отправления: