Обсуждение: Problems with ALTER DOMAIN patch

Поиск
Список
Период
Сортировка

Problems with ALTER DOMAIN patch

От
Tom Lane
Дата:
I've been looking at the recently-committed ALTER DOMAIN patch, and I
think it's got some serious if not fatal problems.  Specifically, the
approach to adding/dropping constraints associated with domains doesn't
work.

1. Insufficient locking, guise 1: there's no protection against someone
else dropping a column or whole table between the time you find a
pg_attribute entry in get_rels_with_domain and the time you actually
process it in AlterDomainNotNull or AlterDomainAddConstraint.  The code
appears to think that holding RowExclusiveLock on pg_attribute will
protect it somehow, but that doesn't (and shouldn't) do any such thing.
This will result in at best an elog and at worst coredump when you try
to scan the no-longer-present table or column.

2. Insufficient locking, guise 2: there's no protection against someone
else adding a column or table while you're processing an ALTER DOMAIN,
either.  This means that constraint checks will be missed.  Example:

<< backend 1 >>

regression=# create domain mydom int4;
CREATE DOMAIN
regression=# begin;
BEGIN
regression=# alter domain mydom set not null;
ALTER DOMAIN

<< don't commit yet; in backend 2 do >>

regression=# create table foo (f1 mydom);
CREATE TABLE
regression=# insert into foo values(null);
INSERT 149688 1

<< now in backend 1: >>

regression=# commit;
COMMIT

<< now in backend 2: >>

regression=# insert into foo values(null);
ERROR:  Domain mydom does not allow NULL values
regression=# select * from foo;f1
----

(1 row)

Not a very watertight domain constraint, is it?  The begin/commit is not
necessary to cause a failure, it just makes it easy to make the window
for failure wide enough to hit in a manually entered example.

3. Too much locking, guise 1: the ALTER DOMAIN command will acquire
exclusive lock on every table that it scans, and will hold all these
locks until it commits.  This can easily result in deadlocks --- against
other ALTER DOMAIN commands, or just against any random other
transaction that is unlucky enough to try to write any two tables
touched by the ALTER DOMAIN.  AFAICS you don't need an exclusive lock,
you just want to prevent updates of the table until the domain changes
are committed, so ShareLock would be sufficient; that would reduce but
not eliminate the risk of deadlock.

4. Too much locking, guise 2: the ExclusiveLock acquired on pg_class by
get_rels_with_domain has no useful effect, since it's released again
at the end of the scan; it does manage to shut down most sorts of schema
changes while get_rels_with_domain runs, however.  This is bad enough,
but:

5. Performance sucks.  In the regression database on my machine, "alter
domain mydom set not null" takes over six seconds --- that's for a
freshly created domain that's not used *anywhere*.  This can be blamed
entirely on the inefficient implementation of get_rels_with_domain.
In a database with more tables performance would get much worse; it's
basically O(N^2).  And it's holding ExclusiveLock on pg_class the whole
time :-(.  (A reasonably efficient way to make the same search would be
to use pg_depend to look for columns that depend on the domain type ---
this might find a few indirect dependencies, but it would certainly be
lots faster than repeated seqscans over pg_attribute.)

6. Permission bogosity: as per discussion yesterday, ownership of a
schema does not grant ownership rights on contained objects.

7. No mechanism for causing constraint changes to actually propagate
after they are made.  This is more a fault of the design of the domain
constraint patch than it is of the alter patch, but nonetheless alter is
what exposes it.  The problem is particularly acute because you chose to
insert a domain's constraint expressions into coercion operations at
expression parsing time, which is far too early.  A stored rule that has
a coerce-to-domain operation in it will have a frozen idea of what
constraints it should be enforcing.  Probably the expression tree should
just have a "CoerceToDomain foo" node in it, and at executor startup
this node would have to look to the pg_type entry for foo to see exactly
what it should be enforcing at the moment.


Some of these are fixable, but I don't actually see any fix for point 2
short of creating some entirely new locking convention.  Currently, only
relations can be locked, but you'd really need an enforceable lock on
the type itself to make a watertight solution, I think.  Since we've
never had any sort of supported ALTER TYPE operation before, the issue
hasn't come up before ...
        regards, tom lane


Re: Problems with ALTER DOMAIN patch

От
Rod Taylor
Дата:
On Tue, 2002-12-10 at 12:39, Tom Lane wrote:
> I've been looking at the recently-committed ALTER DOMAIN patch, and I
> think it's got some serious if not fatal problems.  Specifically, the
> approach to adding/dropping constraints associated with domains doesn't
> work.
>
> 1. Insufficient locking, guise 1: there's no protection against someone
> else dropping a column or whole table between the time you find a

Ok.. I obviously have to spend some time to figure out how locking works
and exactly what it affects.

I had incorrectly assumed that since dropping a column required removal
of the pg_attribute entry, that holding a RowExclusive on it would
prevent that.

> 2. Insufficient locking, guise 2: there's no protection against someone
> else adding a column or table while you're processing an ALTER DOMAIN,
> either.  This means that constraint checks will be missed.  Example:

Locking the entry in pg_type doesn't prevent that?  Afterall, something
does a test to see if the type exists prior to allowing the client to
add it.

> 3. Too much locking, guise 1: the ALTER DOMAIN command will acquire
> exclusive lock on every table that it scans, and will hold all these
> locks until it commits.  This can easily result in deadlocks --- against
> other ALTER DOMAIN commands, or just against any random other
> transaction that is unlucky enough to try to write any two tables
> touched by the ALTER DOMAIN.  AFAICS you don't need an exclusive lock,
> you just want to prevent updates of the table until the domain changes
> are committed, so ShareLock would be sufficient; that would reduce but
> not eliminate the risk of deadlock.

I noticed a completed TODO item that allows multiple locks to be
obtained simultaneously, and had intended on using that for this -- but
was having a hard time tracking down an example.

> 4. Too much locking, guise 2: the ExclusiveLock acquired on pg_class by
> get_rels_with_domain has no useful effect, since it's released again
> at the end of the scan; it does manage to shut down most sorts of schema
> changes while get_rels_with_domain runs, however.  This is bad enough,
> but:

Yeah... Trying to transfer the lock to the attributes -- which as you've
shown doesn't do what I thought.

> 5. Performance sucks.  In the regression database on my machine, "alter
> domain mydom set not null" takes over six seconds --- that's for a
> freshly created domain that's not used *anywhere*.  This can be blamed
> entirely on the inefficient implementation of get_rels_with_domain.

Yes, I need to (and intend to) redo this with dependencies, but hadn't
figured out how.   I'm surprised it took 6 seconds though.  I hardly
notice any delay on a database with ~30 tables in it.

> 6. Permission bogosity: as per discussion yesterday, ownership of a
> schema does not grant ownership rights on contained objects.

Patch submitted yesterday to correct this.

> 7. No mechanism for causing constraint changes to actually propagate
> after they are made.  This is more a fault of the design of the domain
> constraint patch than it is of the alter patch, but nonetheless alter is
> what exposes it.  The problem is particularly acute because you chose to
> insert a domain's constraint expressions into coercion operations at
> expression parsing time, which is far too early.  A stored rule that has
> a coerce-to-domain operation in it will have a frozen idea of what
> constraints it should be enforcing.  Probably the expression tree should
> just have a "CoerceToDomain foo" node in it, and at executor startup
> this node would have to look to the pg_type entry for foo to see exactly
> what it should be enforcing at the moment.

Thanks for the explanations.  I'll see if I can 1) fix my poor knowledge
of locking, 2) Add to my notes that I need to test stuff with Rules from
now on, and 3) correct the above items.
--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Re: Problems with ALTER DOMAIN patch

От
Tom Lane
Дата:
Rod Taylor <rbt@rbt.ca> writes:
>> 2. Insufficient locking, guise 2: there's no protection against someone
>> else adding a column or table while you're processing an ALTER DOMAIN,
>> either.  This means that constraint checks will be missed.  Example:

> Locking the entry in pg_type doesn't prevent that?

If there were such a thing as "locking the entry in pg_type", it might
prevent that, but (a) there isn't, and (b) your code wouldn't invoke it 
if there were.  Reading a row should surely not be tantamount to
invoking an exclusive lock on it.

In any case, other backends might have the pg_type entry in their
syscaches, in which case their references to the type would be quite
free of any actual read of the pg_type row that might fall foul of
your hypothetical lock.

To make this work in a reliable way, there needs to be some concept
of acquiring a lock on the type as an entity, in the same way that
LockRelation acquires a lock on a relation as an entity --- which has
only the loosest possible connection to the notion of a lock on the
relation's pg_class row.  We have no such locks on types at present,
but I think it may be time to invent 'em.
        regards, tom lane


Re: Problems with ALTER DOMAIN patch

От
Rod Taylor
Дата:
On Tue, 2002-12-10 at 22:56, Tom Lane wrote:
> Rod Taylor <rbt@rbt.ca> writes:
> >> 2. Insufficient locking, guise 2: there's no protection against someone
> >> else adding a column or table while you're processing an ALTER DOMAIN,
> >> either.  This means that constraint checks will be missed.  Example:
>
> > Locking the entry in pg_type doesn't prevent that?
>
> If there were such a thing as "locking the entry in pg_type", it might
> prevent that, but (a) there isn't, and (b) your code wouldn't invoke it
> if there were.  Reading a row should surely not be tantamount to
> invoking an exclusive lock on it.

Hrm...  Yes.. I came to that conclusion while walking home. My concepts
of locking, and what actually happens in PostgreSQL are two completely
different things.

> In any case, other backends might have the pg_type entry in their
> syscaches, in which case their references to the type would be quite
> free of any actual read of the pg_type row that might fall foul of
> your hypothetical lock.

So... Basically I'm cooked.

> relation's pg_class row.  We have no such locks on types at present,
> but I think it may be time to invent 'em.

I'd be happy to use them once created.

Thanks again for the help.

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Re: Problems with ALTER DOMAIN patch

От
Bruce Momjian
Дата:
Rod Taylor wrote:
> > relation's pg_class row.  We have no such locks on types at present,
> > but I think it may be time to invent 'em.
> 
> I'd be happy to use them once created.
> 
> Thanks again for the help.

Where does that leave the patch _until_ they are created?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Problems with ALTER DOMAIN patch

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Where does that leave the patch _until_ they are created?

I'd say "it's under death sentence unless fixed before 7.4 release".
I don't want to back it out in toto right now, because that will
interfere with other edits I'm in process of making (and also Rod
included some necessary fixes to the domain-constraint patch in the
alter-domain patch; which wasn't too clean of him but it's done).

For now, please put "fix or disable ALTER DOMAIN" on the must-do-
before-7.4 part of TODO.
        regards, tom lane


Re: Problems with ALTER DOMAIN patch

От
Tom Lane
Дата:
Rod Taylor <rbt@rbt.ca> writes:
> On Tue, 2002-12-10 at 22:56, Tom Lane wrote:
>> relation's pg_class row.  We have no such locks on types at present,
>> but I think it may be time to invent 'em.

> I'd be happy to use them once created.

I think you misunderstood me ;=) ... that was a none-too-subtle
suggestion that *you* should go invent 'em, seeing as how you're the
one pushing the feature that makes 'em necessary.

The lock manager itself deals with lock tags that could be almost
anything.  We currently only use lock tags that represent relations or
specific pages in relations, but I see no reason that there couldn't
also be lock tags representing types --- or other basic catalog entries.
(I am trying hard to repress the thought that we may already need
locking on other classes of entities as well.)  What we need now is a
little thought about exactly how to represent these different lock tags
(should be easy), and about what semantics to assign to different lock
modes applied to pg_type entities (perhaps not so easy).
        regards, tom lane


Re: Problems with ALTER DOMAIN patch

От
Rod Taylor
Дата:
On Wed, 2002-12-11 at 00:05, Tom Lane wrote:
> Rod Taylor <rbt@rbt.ca> writes:
> > On Tue, 2002-12-10 at 22:56, Tom Lane wrote:
> >> relation's pg_class row.  We have no such locks on types at present,
> >> but I think it may be time to invent 'em.
>
> > I'd be happy to use them once created.
>
> I think you misunderstood me ;=) ... that was a none-too-subtle

Nope... I understood.  But it could take quite a while.

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Re: Problems with ALTER DOMAIN patch

От
Bruce Momjian
Дата:
Rod Taylor wrote:
-- Start of PGP signed section.
> On Wed, 2002-12-11 at 00:05, Tom Lane wrote:
> > Rod Taylor <rbt@rbt.ca> writes:
> > > On Tue, 2002-12-10 at 22:56, Tom Lane wrote:
> > >> relation's pg_class row.  We have no such locks on types at present,
> > >> but I think it may be time to invent 'em.
> > 
> > > I'd be happy to use them once created.
> > 
> > I think you misunderstood me ;=) ... that was a none-too-subtle
> 
> Nope... I understood.  But it could take quite a while.

I have an idea.  Rather than doing some complex locking for types, why
don't we just restrict ALTER DOMAIN to cases where we are the only one
attached to the database, as seen in dropdb().  Seems like an easy way
to get the feature in without adding a new locking method.  Also, it
would allow the regression test to work too because no one is attached
to 'regression' at the time of the test.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Problems with ALTER DOMAIN patch

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I have an idea.  Rather than doing some complex locking for types, why
> don't we just restrict ALTER DOMAIN to cases where we are the only one
> attached to the database, as seen in dropdb().

Yech!

> would allow the regression test to work too because no one is attached
> to 'regression' at the time of the test.

No, we'd just get random reports of unreproducible failures.

FWIW, I frequently attach to the regression database with a manual psql
while the regression tests are running.  It's a useful thing to do ---
I've more than once found bugs that way.
        regards, tom lane


Re: Problems with ALTER DOMAIN patch

От
Bruce Momjian
Дата:
It is an idea if no better one can be found, unless we don't want ALTER
DOMAIN at all, which doesn't seem good.

---------------------------------------------------------------------------

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I have an idea.  Rather than doing some complex locking for types, why
> > don't we just restrict ALTER DOMAIN to cases where we are the only one
> > attached to the database, as seen in dropdb().
> 
> Yech!
> 
> > would allow the regression test to work too because no one is attached
> > to 'regression' at the time of the test.
> 
> No, we'd just get random reports of unreproducible failures.
> 
> FWIW, I frequently attach to the regression database with a manual psql
> while the regression tests are running.  It's a useful thing to do ---
> I've more than once found bugs that way.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Problems with ALTER DOMAIN patch

От
Rod Taylor
Дата:
On Wed, 2002-12-11 at 19:00, Bruce Momjian wrote:
> It is an idea if no better one can be found, unless we don't want ALTER
> DOMAIN at all, which doesn't seem good.

I'll make a proposal for 'Object' locks as suggested, and we'll see
where we go from there.

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc