Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
От | Noah Misch |
---|---|
Тема | Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae |
Дата | |
Msg-id | 20240416193402.9a.nmisch@google.com обсуждение исходный текст |
Ответ на | Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae (Andres Freund <andres@anarazel.de>) |
Список | pgsql-bugs |
On Tue, Apr 16, 2024 at 11:01:08AM -0700, Andres Freund wrote: > On 2024-04-15 20:58:25 -0700, Noah Misch wrote: > > On Mon, Apr 15, 2024 at 02:10:20PM -0700, Andres Freund wrote: > > > On 2024-04-15 13:52:04 -0700, Noah Misch wrote: > > > > I have observed the infinite loop in production with v15.5, so that > > > > non-reproduce outcome is a limitation in the test procedure. (v14.2 added > > > > those two commits.) > > > > > > How closely have you analyzed those production occurences? It's not too hard > > > to imagine some form of corruption that leads to such a loop, but which isn't > > > related to the horizon going backwards? E.g. a corrupted HOT chain can lead > > > to heap_page_prune() not acting on a DEAD tuple, but lazy_scan_prune() would > > > then encounter a DEAD tuple. I've not seen this recur for any one table, so I think we can rule out corruption modes that would reach the loop every time. (If a hypothesized loop explanation calls for both corruption and horizon movement, that could still apply.) > > One occurrence had these facts: > > > > HeapTupleHeaderGetXmin = 95271613 > > HeapTupleHeaderGetUpdateXid = 95280147 > > vacrel->OldestXmin = 95317451 > > vacrel->vistest->definitely_needed = 95318928 > > vacrel->vistest->maybe_needed = 93624425 > > > > How compatible are those with the corruption vectors you have in view? > > Do you have more information about the page this was on? E.g. pageinspect > output? Or at least the infomasks of that tuple? No, unfortunately. > I assume this was a normal > data table (i.e. not a [shared|user] catalog table or temp table)? Normal data table > Do you know what ComputeXidHorizonsResultLastXmin, RecentXmin were set to? No. > > I tried briefly to understand > > https://postgr.es/m/flat/20240415173913.4zyyrwaftujxthf2@awork3.anarazel.de > > but I felt verifying its argument was going to be a big job for me. Would > > those errors happen transiently, like the infinite loop, or would they > > persist until something resets the tuple fields (e.g. ATRewriteTables())? > > I think they'd be transient, because the visibility information during the > next vacuum would presumably not be "skewed" anymore? That is good. > Of course it's possible > you'd re-encounter the problem, if you constantly have horizons going back and > forth. But I'd still classify that as transient. Certainly.
В списке pgsql-bugs по дате отправления: