Re: [BUG] failed assertion in EnsurePortalSnapshotExists()
От | Drouvot, Bertrand |
---|---|
Тема | Re: [BUG] failed assertion in EnsurePortalSnapshotExists() |
Дата | |
Msg-id | 71af37bc-fc25-ed86-acf2-176d153ffb9b@amazon.com обсуждение исходный текст |
Ответ на | Re: [BUG] failed assertion in EnsurePortalSnapshotExists() (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: [BUG] failed assertion in EnsurePortalSnapshotExists()
|
Список | pgsql-hackers |
Hi, On 9/28/21 6:50 PM, Tom Lane wrote: > "Drouvot, Bertrand" <bdrouvot@amazon.com> writes: >> Does it make sense (as it is currently) to set the ActiveSnapshot to >> NULL and not ensuring the same is done for ActivePortal->portalSnapshot? > I think that patch is just a kluge :-( right, sounded too simple to be "true". > > After tracing through this I've figured out what is happening, and > why you need to put the exception block inside a loop to make it > happen. The first iteration goes fine, and we end up with an empty > ActiveSnapshot stack (and no portalSnapshots) because that's how > the COMMIT leaves it. In the second iteration, we try to > re-establish the portal snapshot, but at the point where > EnsurePortalSnapshotExists is called (from the first INSERT > command) we are already inside a subtransaction thanks to the > plpgsql exception block. This means that the stacked ActiveSnapshot > has as_level = 2, although the Portal owning it belongs to the > outer transaction. So at the next exception, AtSubAbort_Snapshot > zaps the stacked ActiveSnapshot, but the Portal stays alive and > now it has a dangling portalSnapshot pointer. Thanks for the explanation! > Basically there seem to be two ways to fix this, both requiring > API additions to snapmgr.c (hence, cc'ing Alvaro for opinion): > > 1. Provide a variant of PushActiveSnapshot that allows the caller > to specify the as_level to use, and then have > EnsurePortalSnapshotExists, as well as other places that create > Portal-associated snapshots, use this with as_level equal to the > Portal's createSubid. The main hazard I see here is that this could > theoretically allow the ActiveSnapshot stack to end up with > out-of-order as_level values, causing issues. For the moment we > could probably just add assertions to check that the passed as_level > is >= next level, or maybe even that this variant is only called with > empty ActiveSnapshot stack. I think we may get a non empty ActiveSnapshot stack with prepared statements, so tempted to do the assertion on the levels. > > 2. Provide a way for AtSubAbort_Portals to detect whether a > portalSnapshot pointer points to a snap that's going to be > deleted by AtSubAbort_Snapshot, and then just have it clear any > portalSnapshots that are about to become dangling. (This'd amount > to accepting the possibility that portalSnapshot is of a different > subxact level from the portal, and dealing with the situation.) > > I initially thought #2 would be the way to go, but it turns out > to be a bit messy since what we have is a Snapshot pointer not an > ActiveSnapshotElt pointer. We'd have to do something like search the > ActiveSnapshot stack looking for pointer equality to the caller's > Snapshot pointer, which seems fragile --- do we assume as_snap is > unique for any other purpose? > > That being the case, I'm now leaning to #1. Thoughts? I'm also inclined to #1. Please find attached a patch proposal for #1 that also adds a new test. Thanks Bertrand
Вложения
В списке pgsql-hackers по дате отправления: