Обсуждение: Insufficient locking for ALTER DEFAULT PRIVILEGES
I came across the following bug this week: Session 0: begin; create schema bug; alter default privileges in schema bug grant all on tables to postgres; commit; Session 1: begin; alter default privileges in schema bug grant all on tables to postgres; Session 2: alter default privileges in schema bug grant all on tables to postgres; <hangs> Session 1: commit; Session 2: ERROR: tuple concurrently updated -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Vik Fearing wrote: > Session 1: > begin; > alter default privileges in schema bug grant all on tables to postgres; > > Session 2: > alter default privileges in schema bug grant all on tables to postgres; > <hangs> > > Session 1: > commit; > > Session 2: > ERROR: tuple concurrently updated So it turns out we don't have any locking here at all. I don't believe we have it for all object types, but in most cases it's not as obnoxious as this one. But at least for relations we have some nice coding in RangeVarGetRelidExtended and RangeVarGetAndCheckCreationNamespace that protect things. I was thinking of adding some similar locking-and-looping logic in StoreDefaultACL: grab the tuple from catalogs, LockDatabaseObject() using the OID of the tuple so obtained; check the sinval counter like RangeVarGetRelidExtended, if no change we're okay; if it changed, go grab the OID once again, and if it changed, restart from the top; if the OID did not change, then we're done. This sounds complicated, but it's actually reasonably straightforward and contained within a single routine. But then, it doesn't handle the case that two transactions try to start a row for the same combination at the same time. One of them is going to get the heap_insert() call to succeed and the other one is going to get an ugly error message. I wonder if I'm over-thinking this. Other thoughts? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera wrote: > So it turns out we don't have any locking here at all. I don't believe > we have it for all object types, but in most cases it's not as obnoxious > as this one. But at least for relations we have some nice coding in > RangeVarGetRelidExtended and RangeVarGetAndCheckCreationNamespace that > protect things. Now that I actually check with a non-relation object, I see pretty much the same error. So probably if instead of some narrow bug fix what we need is some general solution for all object types. I know this has been discussed a number of times ... Anyway I see now that we should not consider this a backpatchable bug fix, and I'm not doing the coding either, at least not now. Session 1: alvherre=# begin; BEGIN alvherre=# create or replace function f() returns int language plpgsql strict as $$ begin return 2; end; $$; CREATE FUNCTION Session 2: alvherre=# create or replace function f() returns int language plpgsql strict as $$ begin return 3; end; $$; <blocks> Session 1: alvherre=# commit; COMMIT Session 2: ERROR: tuple concurrently updated -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera wrote: > Now that I actually check with a non-relation object, I see pretty much > the same error. So probably if instead of some narrow bug fix what we > need is some general solution for all object types. I know this has > been discussed a number of times ... Anyway I see now that we should > not consider this a backpatchable bug fix, and I'm not doing the coding > either, at least not now. Discussed this with a couple of 2ndQ colleagues and it became evident that MVCC catalog scans probably make this problem much more prominent. So historical branches are not affected all that much, but it's a real issue on 9.4+. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2015-06-21 11:45:24 -0300, Alvaro Herrera wrote: > Alvaro Herrera wrote: > > > Now that I actually check with a non-relation object, I see pretty much > > the same error. So probably if instead of some narrow bug fix what we > > need is some general solution for all object types. I know this has > > been discussed a number of times ... Anyway I see now that we should > > not consider this a backpatchable bug fix, and I'm not doing the coding > > either, at least not now. > > Discussed this with a couple of 2ndQ colleagues and it became evident > that MVCC catalog scans probably make this problem much more prominent. > So historical branches are not affected all that much, but it's a real > issue on 9.4+. Hm. I don't see how those would make a marked difference. The snapshot for catalogs scan are taken afresh for each scan (unless cached). There'll probably be some difference, but it'll be small. Andres
On Sat, Jun 20, 2015 at 10:57 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> Vik Fearing wrote:
>
> > Session 1:
> > begin;
> > alter default privileges in schema bug grant all on tables to postgres;
> >
> > Session 2:
> > alter default privileges in schema bug grant all on tables to postgres;
> > <hangs>
> >
> > Session 1:
> > commit;
> >
> > Session 2:
> > ERROR: tuple concurrently updated
>
> So it turns out we don't have any locking here at all. I don't believe
> we have it for all object types, but in most cases it's not as obnoxious
> as this one. But at least for relations we have some nice coding in
> RangeVarGetRelidExtended and RangeVarGetAndCheckCreationNamespace that
> protect things.
>
> I was thinking of adding some similar locking-and-looping logic in
> StoreDefaultACL: grab the tuple from catalogs, LockDatabaseObject()
> using the OID of the tuple so obtained; check the sinval counter like
> RangeVarGetRelidExtended, if no change we're okay; if it changed, go
> grab the OID once again, and if it changed, restart from the top; if
> the OID did not change, then we're done.
>
> This sounds complicated, but it's actually reasonably straightforward
> and contained within a single routine.
>
> But then, it doesn't handle the case that two transactions try to start
> a row for the same combination at the same time. One of them is going
> to get the heap_insert() call to succeed and the other one is going to
> get an ugly error message.
>
> I wonder if I'm over-thinking this. Other thoughts?
>
This problem won't occur for dml statements (2 sessions trying to update
>
> Vik Fearing wrote:
>
> > Session 1:
> > begin;
> > alter default privileges in schema bug grant all on tables to postgres;
> >
> > Session 2:
> > alter default privileges in schema bug grant all on tables to postgres;
> > <hangs>
> >
> > Session 1:
> > commit;
> >
> > Session 2:
> > ERROR: tuple concurrently updated
>
> So it turns out we don't have any locking here at all. I don't believe
> we have it for all object types, but in most cases it's not as obnoxious
> as this one. But at least for relations we have some nice coding in
> RangeVarGetRelidExtended and RangeVarGetAndCheckCreationNamespace that
> protect things.
>
> I was thinking of adding some similar locking-and-looping logic in
> StoreDefaultACL: grab the tuple from catalogs, LockDatabaseObject()
> using the OID of the tuple so obtained; check the sinval counter like
> RangeVarGetRelidExtended, if no change we're okay; if it changed, go
> grab the OID once again, and if it changed, restart from the top; if
> the OID did not change, then we're done.
>
> This sounds complicated, but it's actually reasonably straightforward
> and contained within a single routine.
>
> But then, it doesn't handle the case that two transactions try to start
> a row for the same combination at the same time. One of them is going
> to get the heap_insert() call to succeed and the other one is going to
> get an ugly error message.
>
> I wonder if I'm over-thinking this. Other thoughts?
>
This problem won't occur for dml statements (2 sessions trying to update
the tuple in same way as described in above example) and the reason is
that ExecUpdate() has EvalPlanQual() mechanism to avoid this, I am
talking about below code:
ExecUpdate()
{
..
heap_update();
case HeapTupleUpdated:
..
if (!ItemPointerEquals(tupleid, &hufd.ctid))
..
epqslot = EvalPlanQual(estate,
if (!ItemPointerEquals(tupleid, &hufd.ctid))
..
epqslot = EvalPlanQual(estate,
..
}
Now for the similar case in simple_heap_update(), we throw error, can't
we use similar coding pattern (as in ExecUpdate) in simple_heap_update()?
On Sun, Jun 21, 2015 at 11:11 AM, Andres Freund <andres@anarazel.de> wrote: > On 2015-06-21 11:45:24 -0300, Alvaro Herrera wrote: >> Alvaro Herrera wrote: >> > Now that I actually check with a non-relation object, I see pretty much >> > the same error. So probably if instead of some narrow bug fix what we >> > need is some general solution for all object types. I know this has >> > been discussed a number of times ... Anyway I see now that we should >> > not consider this a backpatchable bug fix, and I'm not doing the coding >> > either, at least not now. >> >> Discussed this with a couple of 2ndQ colleagues and it became evident >> that MVCC catalog scans probably make this problem much more prominent. >> So historical branches are not affected all that much, but it's a real >> issue on 9.4+. > > Hm. I don't see how those would make a marked difference. The snapshot > for catalogs scan are taken afresh for each scan (unless > cached). There'll probably be some difference, but it'll be small. Yeah, I think the same. If those changes introduced a problem we didn't have before, I'd like to see a reproducible test case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company