Обсуждение: Add CREATE support to event triggers

Поиск

Список

Период

Сортировка

Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

08 ноября 2013 г., 15:33:38

Hello,

Attached you can find a very-much-WIP patch to add CREATE info support
for event triggers (normalized commands).  This patch builds mainly on
two things:

1. Dimitri's "DDL rewrite" patch he submitted way back, in
   http://www.postgresql.org/message-id/m2zk1j9c44.fsf@2ndQuadrant.fr

I borrowed the whole ddl_rewrite.c code, and tweaked it a bit.  There
are several things still wrong with it and which will need to be fixed
before a final patch can even be contemplated; but there are some
questions that require a consensus answer before I go and fix it all,
because what it will look like will depend on said answers.


2. The ideas we used to build DROP support.  Mainly, the interesting
   thing here is the fact that we use a SRF to report, at
   ddl_command_end, all the objects that were created during execution
   of that command.  We do this by collecting them in a list in some raw
   form somewhere during ProcessUtility, and then spitting them out if
   the SRF is called.  I think the general idea is sound, although of
   course I admit there might be bugs in the implementation.

Note this patch doesn't try to add any kind of ALTER support.  I think
this is fine in principle, because we agreed that we would attack each
kind of command separately (divide to conquer and all that); but there
is a slight problem for some kind of objects that are represented partly
as ALTER state during creation; for example creating a table with a
sequence uses ALTER SEQ/OWNED BY internally at some point.  There might
be other cases I'm missing, also.  (The REFRESH command is nominally
also supported.)


Now about the questions I mentioned above:

a) It doesn't work to reverse-parse the statement nodes in all cases;
there are several unfixable bugs if we only do that.  In order to create
always-correct statements, we need access to the catalogs for the
created objects.  But if we are doing catalog access, then it seems to
me that we can do away with the statement parse nodes completely and
just reconstruct the objects from catalog information.  Shall we go that
route?

b) What's the best design of the SRF output?  This patch proposes two
columns, object identity and create statement.  Is there use for
anything else?  Class/object OIDs perhaps, schema OIDs for objects types
that have it?  I don't see any immediate need to that info, but perhaps
someone does.

c) The current patch stashes all objects in a list, whenever there's an
event trigger function.  But perhaps some users want to have event
triggers and not have any use for the CREATE statements.  So one idea is
to require users that want the objects reported to call a special
function in a ddl_command_start event trigger which enables collection;
if it's not called, objects are not reported.  This eliminates
performance impact for people not using it, but then maybe it will be
surprising for people that call the SRF and find that it always returns
empty.

d) There's a new routine uncookConstraintOrDefault.  This takes a raw
expression, runs transformExpr() on it, and then deparses it (possibly
setting up a deparse context based on some relation).  This is a
somewhat funny thing to be doing, so if there are other ideas on how to
handle this, I'm all ears.


This patch doesn't include doc changes or regression tests.  Those will
be added in a later version.  For now, you can see this code in action
by running installing an event trigger like this:

CREATE OR REPLACE FUNCTION snitch() RETURNS event_trigger LANGUAGE plpgsql AS $$
DECLARE
        r RECORD;
BEGIN
        FOR r IN SELECT * FROM pg_event_trigger_get_normalized_commands()
        LOOP
                RAISE NOTICE 'object created: id %, statement %',
                r.identity, r.command;
        END LOOP;
END;
$$;
CREATE EVENT TRIGGER snitch ON ddl_command_end EXECUTE PROCEDURE snitch();

And then running the DDL of your preference.  Be warned that there are
many rough edges, so some objects might be incompletely reported (or
bogusly so, in particular with regards to objects in search_path but not
in the first position of it); but if you see any crashes, please let me
know.  Also, many commands are not supported yet by the ddl_rewrite.c
code and they return "unsupported FOOBAR".  I didn't want to go to the
effort of writing code for them that might end up being ripped out if we
wanted to go the route of using only syscache to build CREATE
statements.  That part is pretty mechanical, and not much code anyway.
I just left whatever Dimitri already had in his patch.


I would have continued working some more on this patch before
submitting, except that I'm going on vacations for a few days starting
tomorrow and we have a rule that no complex patches can go in at CF4
without it having been discussed in previous commitfests.  My intention
is that the patch that arrives at CF4 is free of design surprises and
ready for a final review before commit, so if there's any disagreement
with the direction this is going, please speak up now.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Вложения

event-trigger-create.patch

Re: Add CREATE support to event triggers

От

Robert Haas

Дата:

12 ноября 2013 г., 14:33:28

On Fri, Nov 8, 2013 at 10:33 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Attached you can find a very-much-WIP patch to add CREATE info support
> for event triggers (normalized commands).  This patch builds mainly on
> two things:
>
> 1. Dimitri's "DDL rewrite" patch he submitted way back, in
>    http://www.postgresql.org/message-id/m2zk1j9c44.fsf@2ndQuadrant.fr
>
> I borrowed the whole ddl_rewrite.c code, and tweaked it a bit.  There
> are several things still wrong with it and which will need to be fixed
> before a final patch can even be contemplated; but there are some
> questions that require a consensus answer before I go and fix it all,
> because what it will look like will depend on said answers.

I'm still unhappy with this whole concept.  It adds a significant
maintenance burden that must be carried by everyone who adds new DDL
syntax in the future, and it's easy to imagine this area of the code
ending up very poorly tested and rife with bugs.  After all, event
triggers, as nifty as they are, are only going to be used by a small
minority of users.  And new DDL features are typically going to be
things that are fairly obscure, so again they will only be used by a
small minority of users.  I think we need to avoid the situation where
we have bugs that can only get found when those minorities happen to
intersect.  If we're going to have DDL rewrite code, then we need a
way of making sure that it gets tested in a comprehensive way on a
regular basis.

Here's one idea: create a contrib module that (somehow, via APIs to be
invented) runs every DDL command that gets executed through the
deparsing code, and then parses the result and executes *that* instead
of the original command.  Then, add a build target that runs the
regression test suite in that mode, and get the buildfarm configured
to run that build target regularly on at least some machines.  That
way, adding syntax to the regular regression test suite also serves to
test that the deparsing logic for that syntax is working.  If we do
this, there's still some maintenance burden associated with having DDL
deparsing code, but at least our chances of noticing when we've failed
to maintain it should be pretty good.

The other thing that bothers me here is that, while a normalized
command string sounds great in theory, as soon as you want to allow
(for example) mapping schema A on node 1 to schema B on node 2, the
wheels come off: you'll have to deparse that normalized command string
so you can change out the schema name and then reassemble it back into
a command string again.  So we're going to parse the user input, then
deparse it, hand over the results to the application code, which will
then parse it, modify that, and deparse it again.  At every step of
that process, any mistake will lead to subtle bugs in the resulting
system.  Larry Wall once wrote (approximately) that a good programming
language makes simple things simple and hard things possible; I think
this design fails the second prong of that test.

Now, I guess I can live with that if it's what everyone else wants,
but I don't have a great feeling about the long-term utility of it.
Exposing the data from the DDL statement in some structured way - like
what we've done with drops, or a JSON blob, or something like that,
feels much more robust and useful than a normalized command string to
me.  If the normalized command string can easily be built up from that
data, then you don't really need to expose the command string
separately.  If it can't, then you're not doing a good job exposing
the data in a usable form.  Saying "well, people can always parse the
normalized command string" is a cop-out.  Parsing SQL is *hard*; the
only external project I know of that parses our SQL syntax well is
pgpool, and that's because they copy our parser wholesale, surely not
the sort of solution we want to foist off on event trigger authors.

Finally, I'm very skeptical of the word "normalized".  To me, that
sounds like an alias for "modifying the command string in unspecified
ways that big brother thinks will be useful to event trigger authors".Color me skeptical.  What if somebody doesn't
wanttheir command

string normalized?  What if they want it normalized in a way that's
different from the way that we've chosen to normalize it?  I fear that
this whole direction amounts to "we don't know how to design a real
API so let's just do surgery on the command string and call whatever
pops out the API".  Maybe that's harsh, but if it is I don't know why.The normalization steps we build into this
processconstitute

assumptions about how the feature will be used, and they back the user
into using that feature in just that way and no other.

> 2. The ideas we used to build DROP support.  Mainly, the interesting
>    thing here is the fact that we use a SRF to report, at
>    ddl_command_end, all the objects that were created during execution
>    of that command.  We do this by collecting them in a list in some raw
>    form somewhere during ProcessUtility, and then spitting them out if
>    the SRF is called.  I think the general idea is sound, although of
>    course I admit there might be bugs in the implementation.

Agreed.

> Note this patch doesn't try to add any kind of ALTER support.  I think
> this is fine in principle, because we agreed that we would attack each
> kind of command separately (divide to conquer and all that);

Also agreed.

> but there
> is a slight problem for some kind of objects that are represented partly
> as ALTER state during creation; for example creating a table with a
> sequence uses ALTER SEQ/OWNED BY internally at some point.  There might
> be other cases I'm missing, also.  (The REFRESH command is nominally
> also supported.)

There are lots of places in the DDL code where we pass around
constructed parse trees as a substitute for real argument lists.  I
expect that many of those places will eventually get refactored away,
so it's important that this feature does not end up relying on
accidents of the current code structure.  For example, an
AlterTableStmt can actually do a whole bunch of different things in a
single statement: SOME of those are handled by a loop in
ProcessUtilitySlow() and OTHERS are handled internally by AlterTable.
I'm pretty well convinced that that division of labor is a bad design,
and I think it's important that this feature doesn't make that dubious
design decision into documented behavior.

> Now about the questions I mentioned above:
>
> a) It doesn't work to reverse-parse the statement nodes in all cases;
> there are several unfixable bugs if we only do that.  In order to create
> always-correct statements, we need access to the catalogs for the
> created objects.  But if we are doing catalog access, then it seems to
> me that we can do away with the statement parse nodes completely and
> just reconstruct the objects from catalog information.  Shall we go that
> route?

That works well for CREATE and is definitely appealing in some ways;
it probably means needing a whole lot of what's in pg_dump in the
backend also.  Of course, converting the pg_dump code to a library
that can be linked into either a client or the server would make a lot
of people happy.  Making pg_dump much dumber (at least as regards
future versions) and relying on new backend code to serve the same
purpose would perhaps be reasonable as well, although I know Tom is
against it.  But having two copies of that code doesn't sound very
good; and we'd need some way of thoroughly testing the one living in
the backend.

> c) The current patch stashes all objects in a list, whenever there's an
> event trigger function.  But perhaps some users want to have event
> triggers and not have any use for the CREATE statements.  So one idea is
> to require users that want the objects reported to call a special
> function in a ddl_command_start event trigger which enables collection;
> if it's not called, objects are not reported.  This eliminates
> performance impact for people not using it, but then maybe it will be
> surprising for people that call the SRF and find that it always returns
> empty.

This seems like premature optimization to me, but I think I lost the
last iteration of this argument.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add CREATE support to event triggers

От

Christopher Browne

Дата:

20 ноября 2013 г., 18:36:52

On Fri, Nov 8, 2013 at 10:33 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Hello,
>
> Attached you can find a very-much-WIP patch to add CREATE info support
> for event triggers (normalized commands).  This patch builds mainly on
> two things:
>
> 1. Dimitri's "DDL rewrite" patch he submitted way back, in
>    http://www.postgresql.org/message-id/m2zk1j9c44.fsf@2ndQuadrant.fr
>
> I borrowed the whole ddl_rewrite.c code, and tweaked it a bit.  There
> are several things still wrong with it and which will need to be fixed
> before a final patch can even be contemplated; but there are some
> questions that require a consensus answer before I go and fix it all,
> because what it will look like will depend on said answers.

I have tried this out; the patch applies fine.

Note that it induces (modulo my environment) a failure in "make check".

The "opr_sanity" test fails.

postgres@cbbrowne ~/p/s/t/regress> diff expected/opr_sanity.out
results/opr_sanity.out
348,350c348,351
<  oid | proname
< -----+---------
< (0 rows)
---
>  oid  |                 proname
> ------+------------------------------------------
>  3567 | pg_event_trigger_get_normalized_commands
> (1 row)

That's a minor problem; the trouble there is that the new function is not
yet documented.  Not a concern at this stage.

> 2. The ideas we used to build DROP support.  Mainly, the interesting
>    thing here is the fact that we use a SRF to report, at
>    ddl_command_end, all the objects that were created during execution
>    of that command.  We do this by collecting them in a list in some raw
>    form somewhere during ProcessUtility, and then spitting them out if
>    the SRF is called.  I think the general idea is sound, although of
>    course I admit there might be bugs in the implementation.
>
> Note this patch doesn't try to add any kind of ALTER support.  I think
> this is fine in principle, because we agreed that we would attack each
> kind of command separately (divide to conquer and all that); but there
> is a slight problem for some kind of objects that are represented partly
> as ALTER state during creation; for example creating a table with a
> sequence uses ALTER SEQ/OWNED BY internally at some point.  There might
> be other cases I'm missing, also.  (The REFRESH command is nominally
> also supported.)

I imagine that the things we create in earlier stages may help with later
stages, so it's worth *some* planning so we can hope not to build bits
now that push later enhancements into corners that they can't get out of.

But I'm not disagreeing at all.

> Now about the questions I mentioned above:
>
> a) It doesn't work to reverse-parse the statement nodes in all cases;
> there are several unfixable bugs if we only do that.  In order to create
> always-correct statements, we need access to the catalogs for the
> created objects.  But if we are doing catalog access, then it seems to
> me that we can do away with the statement parse nodes completely and
> just reconstruct the objects from catalog information.  Shall we go that
> route?

Here's a case where it doesn't work.

testevent@localhost->  create schema foo;
CREATE SCHEMA
testevent@localhost->  create domain foo.bar integer;
CREATE DOMAIN
testevent@localhost->  CREATE OR REPLACE FUNCTION snitch() RETURNS
event_trigger LANGUAGE plpgsql AS $$
testevent$# DECLARE
testevent$#         r RECORD;
testevent$# BEGIN
testevent$#         FOR r IN SELECT * FROM
pg_event_trigger_get_normalized_commands()
testevent$#         LOOP
testevent$#                 RAISE NOTICE 'object created: id %, statement %',
testevent$#                 r.identity, r.command;
testevent$#         END LOOP;
testevent$# END;
testevent$# $$;
CREATE FUNCTION
testevent@localhost->  CREATE EVENT TRIGGER snitch ON ddl_command_end
EXECUTE PROCEDURE snitch();
CREATE EVENT TRIGGER
testevent@localhost->  set search_path to public, foo;
SET
testevent@localhost->  create table foo.foo2 (acolumn bar);
NOTICE:  object created: id foo.foo2, statement CREATE TABLE foo.foo2
(acolumn bar)
CREATE TABLE

The trouble is that you have only normalized the table name.  The
domain, bar, needs its name normalized as well.

> b) What's the best design of the SRF output?  This patch proposes two
> columns, object identity and create statement.  Is there use for
> anything else?  Class/object OIDs perhaps, schema OIDs for objects types
> that have it?  I don't see any immediate need to that info, but perhaps
> someone does.

Probably an object type is needed as well, to know if it's a table or
a domain or a sequence or whatever.

I suspect that what will be needed to make it all usable is some sort of
"structured" form.  That is in keeping with Robert Haas' discomfort with
the normalized form.

My "minor" gripe is that you haven't normalized enough (e.g. - it should be
CREATE TABLE foo.foo2 (acolumn foo.bar), capturing the normalization of
data types that are referenced).

But Robert's quite right that users may want more than just to capture that
literally; they may want to modify it, for instance, by shifting to another
schema.  And it will be no fun at all if you have to construct an SQL parser
in order to change it.

The "big change" in Slony 2.2 was essentially to solve that problem; we had
been capturing logged updates as near-raw SQL, and discovered that we
really needed to capture it in a form that allows restructuring it without
needing to reparse it.  It seems to me that the same need applies here.

The answer we came up with in Slony was to change from "SQL fragment"
to "array of attribute names and attribute values."  I expect that is quite
unsuitable for CREATE events, being too "flat" a representation.

> c) The current patch stashes all objects in a list, whenever there's an
> event trigger function.  But perhaps some users want to have event
> triggers and not have any use for the CREATE statements.  So one idea is
> to require users that want the objects reported to call a special
> function in a ddl_command_start event trigger which enables collection;
> if it's not called, objects are not reported.  This eliminates
> performance impact for people not using it, but then maybe it will be
> surprising for people that call the SRF and find that it always returns
> empty.

Hmm.  I'm not sure I follow what the issue is here (I think Robert has
the same problem; if that's so, then it's probably worth a separate
explanation/discussion).

I think there are actually more options here than just worrying about
CREATE...

I can see caring/not caring about CREATE events.  Also caring/not
caring about different sorts of objects.

Thus, an event trigger might either filter on event type
(CREATE versus DROP versus ALTER vs ...), as well as on object
type (TABLE vs VIEW vs SCHEMA vs DOMAIN vs ...).

I could see that being either part of the trigger definition or as an
attribute established within the trigger.

Thus...

create event trigger foo snitch on ddl_command_end
on create, drop, alter
for table, view
execute procedure snitch();

And having internals

create or replace function snitch() returns event_trigger language plpgsql as
$$
declare  r record;
begin   for r in select * from pg_event_trigger_get_normalized_commands loop      raise notice 'action: % object type %
objectname % object id %

normalized statement %',      r.type, r.identity, r.id, r.command;   end loop
end;
$$;

And I suspect that there needs to be another value returned (not
necessarily by the same function) that is some form of the parse tree.

If the parse tree is already available in memory on the C side of things, then
making it accessible by calling a function that allows "walking" the tree and
invoking methods on the elements is a perfectly reasonable idea.  Ideally,
there's something that can be usefully called that is written in pl/pgsql;
mandating that we punt to C to do arbitrary magic doesn't seem nice.
(If that's not reading coherently, well, blame it on my mayor being on
crack!  ;-) )

> d) There's a new routine uncookConstraintOrDefault.  This takes a raw
> expression, runs transformExpr() on it, and then deparses it (possibly
> setting up a deparse context based on some relation).  This is a
> somewhat funny thing to be doing, so if there are other ideas on how to
> handle this, I'm all ears.

Is that perhaps some of the "magic" to address my perhaps incoherent
bit about "punt to C"?  It looks a bit small to be that...
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

Re: Add CREATE support to event triggers

От

Andrew Tipton

Дата:

24 ноября 2013 г., 18:22:08

On Thu, Nov 21, 2013 at 2:36 AM, Christopher Browne <cbbrowne@gmail.com> wrote:
>> b) What's the best design of the SRF output?  This patch proposes two
>> columns, object identity and create statement.  Is there use for
>> anything else?  Class/object OIDs perhaps, schema OIDs for objects types
>> that have it?  I don't see any immediate need to that info, but perhaps
>> someone does.
>
> Probably an object type is needed as well, to know if it's a table or
> a domain or a sequence or whatever.
>
> I suspect that what will be needed to make it all usable is some sort of
> "structured" form.  That is in keeping with Robert Haas' discomfort with
> the normalized form.
>
> My "minor" gripe is that you haven't normalized enough (e.g. - it should be
> CREATE TABLE foo.foo2 (acolumn foo.bar), capturing the normalization of
> data types that are referenced).
>
> But Robert's quite right that users may want more than just to capture that
> literally; they may want to modify it, for instance, by shifting to another
> schema.  And it will be no fun at all if you have to construct an SQL parser
> in order to change it.


It's certainly much easier to transform a structured representation
into a valid SQL command string than it is to do the inverse.

You may be interested in an extension that I'm working on for a
client, which provides relation_create, relation_alter, and
relation_drop event triggers for 9.3:
 https://bitbucket.org/malloclabs/pg_schema_triggers

I decided to create a composite type for each event, which can be
accessed from within the event trigger by calling a special function.
For example, the relation_alter event supplies the relation Oid and
the "old" and "new" pg_class rows.  It's easy to then examine the old
vs. new rows and determine what has changed.


Kind regards,
Andrew Tipton

Re: Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

03 января 2014 г., 14:05:40

Robert Haas escribió:

> The other thing that bothers me here is that, while a normalized
> command string sounds great in theory, as soon as you want to allow
> (for example) mapping schema A on node 1 to schema B on node 2, the
> wheels come off: you'll have to deparse that normalized command string
> so you can change out the schema name and then reassemble it back into
> a command string again.  So we're going to parse the user input, then
> deparse it, hand over the results to the application code, which will
> then parse it, modify that, and deparse it again.

I have considered several ideas on this front, but most of them turn out
to be useless or too cumbersome to use.  What seems most adequate is to
build a command string containing certain patterns, and an array of
replacement values for such patterns; each pattern corresponds to one
element that somebody might want modified in the command.  As a trivial
example, a command such as
  CREATE TABLE foo (bar INTEGER);

would return a string like  CREATE TABLE ${table_schema}.${table_name} (bar INTEGER);

and the replacement array would be  {table_schema => "public", table_name => "foo"}

If we additionally provide a function to expand the replacements in the
string, we would have the base funcionality of a normalized command
string.  If somebody wants to move the table to some other schema, they
can simply modify the array to suit their taste, and again expand using
the provided function; this doesn't require parsing SQL.  It's likely
that there are lots of fine details that need exploring before this is a
fully workable idea -- I have just started work on it, so please bear
with me.

I think this is basically what you call "a JSON blob".

> Finally, I'm very skeptical of the word "normalized".  To me, that
> sounds like an alias for "modifying the command string in unspecified
> ways that big brother thinks will be useful to event trigger authors".
>  Color me skeptical.  What if somebody doesn't want their command
> string normalized?  What if they want it normalized in a way that's
> different from the way that we've chosen to normalize it?  I fear that
> this whole direction amounts to "we don't know how to design a real
> API so let's just do surgery on the command string and call whatever
> pops out the API".

You might criticize the example above by saying that I haven't
considered using a JSON array for the list of table elements; in a
sense, I would be being Big Brother and deciding that you (as the user)
don't need to mess up with the column/constraints list in a table you're
creating.  I thought about it and wasn't sure if there was a need to
implement that bit in the first iteration of this implementation.  One
neat thing about this string+replaceables idea is that we can later
change what replaceable elements the string has, thus providing more
functionality (thus, for example, perhaps the column list can be altered
in v2 that was a "constant" in v1), without breaking existing users of
the v1.

> > but there
> > is a slight problem for some kind of objects that are represented partly
> > as ALTER state during creation; for example creating a table with a
> > sequence uses ALTER SEQ/OWNED BY internally at some point.  There might
> > be other cases I'm missing, also.  (The REFRESH command is nominally
> > also supported.)
> 
> There are lots of places in the DDL code where we pass around
> constructed parse trees as a substitute for real argument lists.  I
> expect that many of those places will eventually get refactored away,
> so it's important that this feature does not end up relying on
> accidents of the current code structure.  For example, an
> AlterTableStmt can actually do a whole bunch of different things in a
> single statement: SOME of those are handled by a loop in
> ProcessUtilitySlow() and OTHERS are handled internally by AlterTable.
> I'm pretty well convinced that that division of labor is a bad design,
> and I think it's important that this feature doesn't make that dubious
> design decision into documented behavior.

Yeah, the submitted patch took care of these elements by invoking the
appropriate collection function at all the right places.  Most of it
happened right in ProcessUtilitySlow, but other bits were elsewhere (for
instance, sub-objects created in a complex CREATE SCHEMA command).  I
mentioned the ALTER SEQUENCE example above because that happens in a
code path that wasn't even close to the rest of the stuff.

> > Now about the questions I mentioned above:
> >
> > a) It doesn't work to reverse-parse the statement nodes in all cases;
> > there are several unfixable bugs if we only do that.  In order to create
> > always-correct statements, we need access to the catalogs for the
> > created objects.  But if we are doing catalog access, then it seems to
> > me that we can do away with the statement parse nodes completely and
> > just reconstruct the objects from catalog information.  Shall we go that
> > route?
> 
> That works well for CREATE and is definitely appealing in some ways;
> it probably means needing a whole lot of what's in pg_dump in the
> backend also.  Of course, converting the pg_dump code to a library
> that can be linked into either a client or the server would make a lot
> of people happy.  Making pg_dump much dumber (at least as regards
> future versions) and relying on new backend code to serve the same
> purpose would perhaps be reasonable as well, although I know Tom is
> against it.  But having two copies of that code doesn't sound very
> good; and we'd need some way of thoroughly testing the one living in
> the backend.

Agreed on requiring thorough testing.

> > c) The current patch stashes all objects in a list, whenever there's an
> > event trigger function.  But perhaps some users want to have event
> > triggers and not have any use for the CREATE statements.  So one idea is
> > to require users that want the objects reported to call a special
> > function in a ddl_command_start event trigger which enables collection;
> > if it's not called, objects are not reported.  This eliminates
> > performance impact for people not using it, but then maybe it will be
> > surprising for people that call the SRF and find that it always returns
> > empty.
> 
> This seems like premature optimization to me, but I think I lost the
> last iteration of this argument.

I don't think you did; we ended up using a design that both doesn't
require explicit user action, nor causes a performance hit.  I wasn't
sure about this, but thought I would mention just it in case.  Maybe we
will end up doing the same here and have a create_sql_objects event or
something like that --- not real sure of this part yet, as there's a lot
of infrastructure to design and code first.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Add CREATE support to event triggers

От

Robert Haas

Дата:

06 января 2014 г., 17:51:32

On Fri, Jan 3, 2014 at 9:05 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> The other thing that bothers me here is that, while a normalized
>> command string sounds great in theory, as soon as you want to allow
>> (for example) mapping schema A on node 1 to schema B on node 2, the
>> wheels come off: you'll have to deparse that normalized command string
>> so you can change out the schema name and then reassemble it back into
>> a command string again.  So we're going to parse the user input, then
>> deparse it, hand over the results to the application code, which will
>> then parse it, modify that, and deparse it again.
>
> I have considered several ideas on this front, but most of them turn out
> to be useless or too cumbersome to use.  What seems most adequate is to
> build a command string containing certain patterns, and an array of
> replacement values for such patterns; each pattern corresponds to one
> element that somebody might want modified in the command.  As a trivial
> example, a command such as
>
>    CREATE TABLE foo (bar INTEGER);
>
> would return a string like
>    CREATE TABLE ${table_schema}.${table_name} (bar INTEGER);
>
> and the replacement array would be
>    {table_schema => "public", table_name => "foo"}
>
> If we additionally provide a function to expand the replacements in the
> string, we would have the base funcionality of a normalized command
> string.  If somebody wants to move the table to some other schema, they
> can simply modify the array to suit their taste, and again expand using
> the provided function; this doesn't require parsing SQL.  It's likely
> that there are lots of fine details that need exploring before this is a
> fully workable idea -- I have just started work on it, so please bear
> with me.
>
> I think this is basically what you call "a JSON blob".

I think this direction has some potential.  I'm not sure it's right in
detail.  The exact scheme you propose above won't work if you want to
leave out the schema name altogether, and more generally it's not
going to help very much with anything other than substituting in
identifiers.  What if you want to add a column called satellite_id to
every table that gets created, for example?  What if you want to make
the tables UNLOGGED?  I don't see how that kind of things is going to
work at all cleanly.

What I can imagine that might work is that you get a JSON blob for a
create table statement and then you have a function
make_a_create_table_statement(json) that will turn it back into SQL.
You can pass a modified blob (adding, removing, or changing the
schema_name or any other element) instead and it will do its thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

06 января 2014 г., 18:17:45

Robert Haas escribió:

> I think this direction has some potential.  I'm not sure it's right in
> detail.  The exact scheme you propose above won't work if you want to
> leave out the schema name altogether, and more generally it's not
> going to help very much with anything other than substituting in
> identifiers.  What if you want to add a column called satellite_id to
> every table that gets created, for example?  What if you want to make
> the tables UNLOGGED?  I don't see how that kind of things is going to
> work at all cleanly.

Thanks for the discussion.  I am building some basic infrastructure to
make this possible, and will explore ideas to cover these oversights
(not posting anything concrete yet because I expect several iterations
to crash and burn before I have something sensible to post).

> What I can imagine that might work is that you get a JSON blob for a
> create table statement and then you have a function
> make_a_create_table_statement(json) that will turn it back into SQL.
> You can pass a modified blob (adding, removing, or changing the
> schema_name or any other element) instead and it will do its thing.

I agree, except that I would prefer not to have one function for each
DDL statement type; instead we would have a single function that knows
to expand arbitrary strings using arbitrary JSON parameter objects, and
let ddl_rewrite.c (or whatever we call that file) deal with all the
ugliness.

One function per statement would increase the maintenance cost more, I
think (three extra pg_proc entries every time you add a new object type?
Ugh.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Add CREATE support to event triggers

От

Robert Haas

Дата:

06 января 2014 г., 19:03:26

On Mon, Jan 6, 2014 at 1:17 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> I agree, except that I would prefer not to have one function for each
> DDL statement type; instead we would have a single function that knows
> to expand arbitrary strings using arbitrary JSON parameter objects, and
> let ddl_rewrite.c (or whatever we call that file) deal with all the
> ugliness.

Yeah, that might work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

08 января 2014 г., 20:42:32

Alvaro Herrera escribió:
> Robert Haas escribió:
> 
> > I think this direction has some potential.  I'm not sure it's right in
> > detail.  The exact scheme you propose above won't work if you want to
> > leave out the schema name altogether, and more generally it's not
> > going to help very much with anything other than substituting in
> > identifiers.  What if you want to add a column called satellite_id to
> > every table that gets created, for example?  What if you want to make
> > the tables UNLOGGED?  I don't see how that kind of things is going to
> > work at all cleanly.
> 
> Thanks for the discussion.  I am building some basic infrastructure to
> make this possible, and will explore ideas to cover these oversights
> (not posting anything concrete yet because I expect several iterations
> to crash and burn before I have something sensible to post).

Here's a working example.  Suppose the user runs

CREATE SCHEMA IF NOT EXISTS "some schema" AUTHORIZATION "some guy";

In an event trigger, the function pg_event_trigger_get_creation_commands()
returns the following JSON blob:

{"authorization":{"authorization_role":"some guy",                 "output":"AUTHORIZATION
%i{authorization_role}"},"if_not_exists":"IFNOT EXISTS","name":"some schema","output":"CREATE SCHEMA %{if_not_exists}
%i{name}%{authorization}"}
 

wherein I have chosen to have a JSON element with the hardcoded name of
"output" which is what needs to be expanded; for each %{} parameter
found in it, there is an equally-named element in the JSON blob.  This
can be a string, a NULL, or another JSON object.

If it's a string, it expands to that value; if it's an object,
recursively an "output" element is expanded in the same way, and the
expanded string is used.

If there's a NULL element when expanding an object, the whole thing
expands to empty.  For example, if no AUTHORIZATION
clause is specified, "authorization" element is still there, but the
"authorization_role" element within it is NULL, and so the whole
AUTHORIZATION clause expands to empty and the resulting command contains
no authorization clause.  This is useful to support the case that
someone doesn't have an AUTHORIZATION clause in the CREATE SCHEMA
command, and the event trigger injects one simply by setting the
authorization_role to some role name.

IF NOT EXISTS is handled by defining it to either the string IF NOT
EXISTS or to empty if no such clause was specified.

The user can modify elements in the JSON to get a different version of
the command.  (I reckon the "output" can also be modified, but this is
probably a bad idea in most/all cases.  I don't think there's a need to
prohibit this explicitely.)  Also, someone might define "if_not_exists"
to something completely unrelated, but that would be their own fault.
(Maybe we can have some cross-check that the if_not_exists element in
JSON cannot be anything other than "IF NOT EXISTS" or the empty string;
and that the "output" element remains the same at expansion time than it
was at generation time.  Perhaps we should even hide the "output"
element from the user completely and only add them to the JSON at time
of expansion.  Not sure it's worth the trouble.)

There is another function,
pg_event_trigger_expand_creation_command(json), which will expand the
above JSON blob and return the following text:

CREATE SCHEMA IF NOT EXISTS "some schema" AUTHORIZATION "some guy"

Note the identifiers are properly quoted (there are quotes in the JSON
blob, but they correspond to JSON's own delimiters).  I have defined a
'i' modifier to have %i{} elements, which means that the element is an
identifier which might need quoting.

I have also defined a %d{} modifier that means to use the element to
expand a possibly-qualified dotted name.  (There would be no "output"
element in this case.)  This is to support the case where you have 

CREATE TABLE public.foo
which results in
{"table_name":{"schema":"public",              "relname":"foo"}}

and you want to edit the "table_name" element in the root JSON and set
the schema to something else (perhaps NULL), so in the event trigger
after expansion you can end up with "CREATE TABLE foo" or "CREATE TABLE
private.foo" or whatever.

Most likely there are some more rules that will need to be created, but
so far this looks sensible.

I'm going to play some more with the %d{} stuff, and also with the idea
of representing table elements such as columns and constraints as an
array.  In the meantime please let me know whether this makes sense.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Add CREATE support to event triggers

От

Pavel Stehule

Дата:

08 января 2014 г., 22:12:42

<p dir="ltr">Hello<p dir="ltr">I don't like this direction. What we can do with JSON from plpgsql? More, JSON is not
toorobust format against some future changes.<p dir="ltr">Regards<p dir="ltr">Pavel<div class="gmail_quote">Dne
8.1.201421:43 "Alvaro Herrera" <<a href="mailto:alvherre@2ndquadrant.com">alvherre@2ndquadrant.com</a>>
napsal(a):<brtype="attribution" /><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">Alvaro Herrera escribió:<br /> > Robert Haas escribió:<br /> ><br /> > > I think
thisdirection has some potential.  I'm not sure it's right in<br /> > > detail.  The exact scheme you propose
abovewon't work if you want to<br /> > > leave out the schema name altogether, and more generally it's not<br />
>> going to help very much with anything other than substituting in<br /> > > identifiers.  What if you
wantto add a column called satellite_id to<br /> > > every table that gets created, for example?  What if you
wantto make<br /> > > the tables UNLOGGED?  I don't see how that kind of things is going to<br /> > > work
atall cleanly.<br /> ><br /> > Thanks for the discussion.  I am building some basic infrastructure to<br /> >
makethis possible, and will explore ideas to cover these oversights<br /> > (not posting anything concrete yet
becauseI expect several iterations<br /> > to crash and burn before I have something sensible to post).<br /><br />
Here'sa working example.  Suppose the user runs<br /><br /> CREATE SCHEMA IF NOT EXISTS "some schema" AUTHORIZATION
"someguy";<br /><br /> In an event trigger, the function pg_event_trigger_get_creation_commands()<br /> returns the
followingJSON blob:<br /><br /> {"authorization":{"authorization_role":"some guy",<br />                  
"output":"AUTHORIZATION%i{authorization_role}"},<br />  "if_not_exists":"IF NOT EXISTS",<br />  "name":"some
schema",<br/>  "output":"CREATE SCHEMA %{if_not_exists} %i{name} %{authorization}"}<br /><br /> wherein I have chosen
tohave a JSON element with the hardcoded name of<br /> "output" which is what needs to be expanded; for each %{}
parameter<br/> found in it, there is an equally-named element in the JSON blob.  This<br /> can be a string, a NULL, or
anotherJSON object.<br /><br /> If it's a string, it expands to that value; if it's an object,<br /> recursively an
"output"element is expanded in the same way, and the<br /> expanded string is used.<br /><br /> If there's a NULL
elementwhen expanding an object, the whole thing<br /> expands to empty.  For example, if no AUTHORIZATION<br /> clause
isspecified, "authorization" element is still there, but the<br /> "authorization_role" element within it is NULL, and
sothe whole<br /> AUTHORIZATION clause expands to empty and the resulting command contains<br /> no authorization
clause. This is useful to support the case that<br /> someone doesn't have an AUTHORIZATION clause in the CREATE
SCHEMA<br/> command, and the event trigger injects one simply by setting the<br /> authorization_role to some role
name.<br/><br /> IF NOT EXISTS is handled by defining it to either the string IF NOT<br /> EXISTS or to empty if no
suchclause was specified.<br /><br /> The user can modify elements in the JSON to get a different version of<br /> the
command. (I reckon the "output" can also be modified, but this is<br /> probably a bad idea in most/all cases.  I don't
thinkthere's a need to<br /> prohibit this explicitely.)  Also, someone might define "if_not_exists"<br /> to something
completelyunrelated, but that would be their own fault.<br /> (Maybe we can have some cross-check that the
if_not_existselement in<br /> JSON cannot be anything other than "IF NOT EXISTS" or the empty string;<br /> and that
the"output" element remains the same at expansion time than it<br /> was at generation time.  Perhaps we should even
hidethe "output"<br /> element from the user completely and only add them to the JSON at time<br /> of expansion.  Not
sureit's worth the trouble.)<br /><br /> There is another function,<br />
pg_event_trigger_expand_creation_command(json),which will expand the<br /> above JSON blob and return the following
text:<br/><br /> CREATE SCHEMA IF NOT EXISTS "some schema" AUTHORIZATION "some guy"<br /><br /> Note the identifiers
areproperly quoted (there are quotes in the JSON<br /> blob, but they correspond to JSON's own delimiters).  I have
defineda<br /> 'i' modifier to have %i{} elements, which means that the element is an<br /> identifier which might need
quoting.<br/><br /> I have also defined a %d{} modifier that means to use the element to<br /> expand a
possibly-qualifieddotted name.  (There would be no "output"<br /> element in this case.)  This is to support the case
whereyou have<br /><br /> CREATE TABLE public.foo<br /> which results in<br /> {"table_name":{"schema":"public",<br />
              "relname":"foo"}}<br /><br /> and you want to edit the "table_name" element in the root JSON and set<br
/>the schema to something else (perhaps NULL), so in the event trigger<br /> after expansion you can end up with
"CREATETABLE foo" or "CREATE TABLE<br /> private.foo" or whatever.<br /><br /> Most likely there are some more rules
thatwill need to be created, but<br /> so far this looks sensible.<br /><br /> I'm going to play some more with the
%d{}stuff, and also with the idea<br /> of representing table elements such as columns and constraints as an<br />
array. In the meantime please let me know whether this makes sense.<br /><br /> --<br /> Álvaro Herrera              
 <ahref="http://www.2ndQuadrant.com/" target="_blank">http://www.2ndQuadrant.com/</a><br /> PostgreSQL Development,
24x7Support, Training & Services<br /><br /><br /> --<br /> Sent via pgsql-hackers mailing list (<a
href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br/> To make changes to your
subscription:<br/><a href="http://www.postgresql.org/mailpref/pgsql-hackers"
target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/></blockquote></div>

Re: Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

08 января 2014 г., 22:17:30

Pavel Stehule escribió:
> Hello
> 
> I don't like this direction. What we can do with JSON from plpgsql?

We have plenty of JSON functions and operators in SQL, and more to come
soon.  Is that not enough?

> More, JSON is not too robust format against some future changes.

Not sure what you mean.  This JSON is generated and consumed by our own
code, so we only need to concern ourselves with making sure that we can
consume (in the expansion function) what we generated in the previous
phase.  There is no transmission to the exterior.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

08 января 2014 г., 22:49:11

CC to hackers restored.

Pavel Stehule escribió:
> Dne 8.1.2014 23:17 "Alvaro Herrera" <alvherre@2ndquadrant.com> napsal(a):
> >
> > Pavel Stehule escribió:
> > > Hello
> > >
> > > I don't like this direction. What we can do with JSON from plpgsql?
> >
> > We have plenty of JSON functions and operators in SQL, and more to come
> > soon.  Is that not enough?
> 
> No, is not. Im sure. It is wrong a request to parse  system internal data,
> that is available in structured form. You create string that should be
> parsed same time.
> 
> Few functions with OUT parameters are beter than any semistructured string.

That was shot down, for good reasons: we assume that the users of this
are going to want to modify the command before considering it final.
Maybe they want to add a column to each table being created, or they
want to change the tablespace if the table name ends with "_big", or
they want to change the schema in which it is created.

This JSON representations lets you receive the table creation data in a
well-known JSON schema; you can tweak individual elements without having
to parse the SQL command.  And when you're done tweaking, there's a
function that lets you produce the SQL command that corresponds to the
original with the tweaks you just did.

(Please note that, thus far, this facility DOES NOT let you change the
table that was created, at least not directly: these event triggers are
run AFTER the creation command has completed.  You can tweak the command
that would be sent to a remote server in a replication swarm, for
example.  Or create a mirror table for audit purposes.  Or perhaps even
generate an ALTER TABLE command for the new table.)

If by "few functions with OUT parameters" you mean that we need to have
one record type that is able to receive all possible CREATE TABLE
options, so that you can change them as a record in plpgsql, this
doesn't sound too good to me, for three reasons: a) it's going to
require too many types and functions (one per statement type); b)
cramming the stuff in pg_type.h / pg_proc.h is going to be a horrid
task; c) any change is going to require an initdb.

-- 
Alvaro Herrera

Re: Add CREATE support to event triggers

От

Craig Ringer

Дата:

09 января 2014 г., 01:41:25

On 01/09/2014 04:42 AM, Alvaro Herrera wrote:

> If there's a NULL element when expanding an object, the whole thing
> expands to empty.  For example, if no AUTHORIZATION
> clause is specified, "authorization" element is still there, but the
> "authorization_role" element within it is NULL, and so the whole
> AUTHORIZATION clause expands to empty and the resulting command contains
> no authorization clause.

I'd like to see this applied consistently to argument-less clauses like
IF NOT EXISTS too. So the same rules apply.

> IF NOT EXISTS is handled by defining it to either the string IF NOT
> EXISTS or to empty if no such clause was specified.

I'm not keen on this bit. It puts clauses of syntax into value strings
other than the special "output" key. Those keys aren't easily
distinguished from user data without clause specific knowledge. I'm not
keen on that.

Instead, can't we use your already proposed subclause structure?

{"authorization":{"authorization_role":"some guy",                 "output":"AUTHORIZATION
%i{authorization_role}"},"if_not_exists":{"output": "IF NOT EXISTS"},"name":"some schema","output":"CREATE SCHEMA
%{if_not_exists}%i{name} %{authorization}"}

i.e. "if_not_exists" becomes an object. All clauses are objects, all
non-object values are user data. (right?). If the clause is absent, the
"output" key is the empty string.

The issue with that (and with your original proposal) is that you can't
tell what these clauses are supposed to be if they're not present in the
original query. You can't *enable* "IF NOT EXISTS" without pulling
knowledge of that syntax from somewhere else.

Depending on the problem you intend to solve there, that might be fine.

If it isn't, then instead there just needs to be a key to flag such
clauses as present or not.

{"authorization":{"authorization_role":"some guy",                 "output":"AUTHORIZATION
%i{authorization_role}"},"if_not_exists":{"output": "IF NOT EXISTS"                  "present": true},"name":"some
schema","output":"CREATESCHEMA %{if_not_exists} %i{name} %{authorization}"}

Am I just over-complicating something simple here?

My reasoning is that it'd be good to be able to easily tell the
difference between *structure* and *user data* in these query trees and
do so without possibly version-specific and certainly
syntax/clause-specific knowledge about the meaning of every key of every
clause.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

09 января 2014 г., 03:27:31

Craig Ringer escribió:

> Instead, can't we use your already proposed subclause structure?
> 
> {"authorization":{"authorization_role":"some guy",
>                   "output":"AUTHORIZATION %i{authorization_role}"},
>  "if_not_exists": {"output": "IF NOT EXISTS"},
>  "name":"some schema",
>  "output":"CREATE SCHEMA %{if_not_exists} %i{name} %{authorization}"}
> 
> i.e. "if_not_exists" becomes an object. All clauses are objects, all
> non-object values are user data. (right?). If the clause is absent, the
> "output" key is the empty string.
> 
> The issue with that (and with your original proposal) is that you can't
> tell what these clauses are supposed to be if they're not present in the
> original query. You can't *enable* "IF NOT EXISTS" without pulling
> knowledge of that syntax from somewhere else.
> 
> Depending on the problem you intend to solve there, that might be fine.

Hmm.  This seems like a reasonable thing to do, except that I would like
the "output" to always be the constant, and have some other way to
enable the clause or disable it.  With your "present" boolean:
so

"if_not_exists": {"output": "IF NOT EXISTS",                 "present": true/false}

In fact, I'm now wondering whether this is a better idea than not
emitting anything when some element in the output expands to NULL; so it
would apply to "authorization" as well; if the command includes the
clause, it'd be
{"authorization":{"authorization_role":"some guy",                  "present": true,
"output":"AUTHORIZATION%i{authorization_role}"},
 

and if there wasn't anything, you'd have
{"authorization":{"authorization_role": null,                  "present": false,
"output":"AUTHORIZATION%i{authorization_role}"},
 

so if you want to turn it on and it wasn't, you need to change both the
present boolean and also set the authorization_role element; and if you
want to turn it off when it was present, just set present to false.

> Am I just over-complicating something simple here?

I think it's a fair point.

> My reasoning is that it'd be good to be able to easily tell the
> difference between *structure* and *user data* in these query trees and
> do so without possibly version-specific and certainly
> syntax/clause-specific knowledge about the meaning of every key of every
> clause.

Sounds reasonable.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Add CREATE support to event triggers

От

Robert Haas

Дата:

09 января 2014 г., 17:36:56

On Wed, Jan 8, 2014 at 10:27 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Craig Ringer escribió:
>> Instead, can't we use your already proposed subclause structure?
>>
>> {"authorization":{"authorization_role":"some guy",
>>                   "output":"AUTHORIZATION %i{authorization_role}"},
>>  "if_not_exists": {"output": "IF NOT EXISTS"},
>>  "name":"some schema",
>>  "output":"CREATE SCHEMA %{if_not_exists} %i{name} %{authorization}"}
>>
>> i.e. "if_not_exists" becomes an object. All clauses are objects, all
>> non-object values are user data. (right?). If the clause is absent, the
>> "output" key is the empty string.
>>
>> The issue with that (and with your original proposal) is that you can't
>> tell what these clauses are supposed to be if they're not present in the
>> original query. You can't *enable* "IF NOT EXISTS" without pulling
>> knowledge of that syntax from somewhere else.
>>
>> Depending on the problem you intend to solve there, that might be fine.
>
> Hmm.  This seems like a reasonable thing to do, except that I would like
> the "output" to always be the constant, and have some other way to
> enable the clause or disable it.  With your "present" boolean:
> so
>
> "if_not_exists": {"output": "IF NOT EXISTS",
>                   "present": true/false}

Why not:

"if_not_exists": true/false

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

09 января 2014 г., 17:58:53

Robert Haas escribió:
> On Wed, Jan 8, 2014 at 10:27 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:

> > Hmm.  This seems like a reasonable thing to do, except that I would like
> > the "output" to always be the constant, and have some other way to
> > enable the clause or disable it.  With your "present" boolean:
> > so
> >
> > "if_not_exists": {"output": "IF NOT EXISTS",
> >                   "present": true/false}
> 
> Why not:
> 
> "if_not_exists": true/false

Yeah, that's another option.  If we do this, though, the expansion
function would have to know that an "if_not_exist" element expands to IF
NOT EXISTS.  Maybe that's okay.  Right now, the expansion function is
pretty stupid, which is nice.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Add CREATE support to event triggers

От

Jim Nasby

Дата:

09 января 2014 г., 22:17:41

On 1/9/14, 11:58 AM, Alvaro Herrera wrote:
> Robert Haas escribió:
>> On Wed, Jan 8, 2014 at 10:27 PM, Alvaro Herrera
>> <alvherre@2ndquadrant.com> wrote:
>
>>> Hmm.  This seems like a reasonable thing to do, except that I would like
>>> the "output" to always be the constant, and have some other way to
>>> enable the clause or disable it.  With your "present" boolean:
>>> so
>>>
>>> "if_not_exists": {"output": "IF NOT EXISTS",
>>>                    "present": true/false}
>>
>> Why not:
>>
>> "if_not_exists": true/false
>
> Yeah, that's another option.  If we do this, though, the expansion
> function would have to know that an "if_not_exist" element expands to IF
> NOT EXISTS.  Maybe that's okay.  Right now, the expansion function is
> pretty stupid, which is nice.

Yeah, the source side of this will always have to understand the nuances of every command; it'd be really nice to not
burdenthe other side with that as well. The only downside I see is a larger JSON output, but meh.
 

Another advantage is if you really wanted to you could modify the output formatting in the JSON doc to do something
radicallydifferent if so inclined...
 
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

Re: Add CREATE support to event triggers

От

Simon Riggs

Дата:

10 января 2014 г., 15:36:17

On 8 January 2014 20:42, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> CREATE SCHEMA IF NOT EXISTS "some schema" AUTHORIZATION "some guy";

Hmm, given in 9.3 it was OK to have only DROP event triggers, I think
it should be equally acceptable to have just CREATE, but without every
option on CREATE.  CREATE SCHEMA is easily the most complex thing here
and would be the command/event to deprioritise if we had any issues
getting this done/agreeing something for 9.4.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Add CREATE support to event triggers

От

Robert Haas

Дата:

10 января 2014 г., 15:48:13

On Thu, Jan 9, 2014 at 5:17 PM, Jim Nasby <jim@nasby.net> wrote:
> On 1/9/14, 11:58 AM, Alvaro Herrera wrote:
>> Robert Haas escribió:
>>>
>>> On Wed, Jan 8, 2014 at 10:27 PM, Alvaro Herrera
>>> <alvherre@2ndquadrant.com> wrote:
>>
>>
>>>> Hmm.  This seems like a reasonable thing to do, except that I would like
>>>> the "output" to always be the constant, and have some other way to
>>>> enable the clause or disable it.  With your "present" boolean:
>>>> so
>>>>
>>>> "if_not_exists": {"output": "IF NOT EXISTS",
>>>>                    "present": true/false}
>>>
>>>
>>> Why not:
>>>
>>> "if_not_exists": true/false
>>
>>
>> Yeah, that's another option.  If we do this, though, the expansion
>> function would have to know that an "if_not_exist" element expands to IF
>> NOT EXISTS.  Maybe that's okay.  Right now, the expansion function is
>> pretty stupid, which is nice.
>
> Yeah, the source side of this will always have to understand the nuances of
> every command; it'd be really nice to not burden the other side with that as
> well. The only downside I see is a larger JSON output, but meh.
>
> Another advantage is if you really wanted to you could modify the output
> formatting in the JSON doc to do something radically different if so
> inclined...

Yeah.  I wasn't necessarily objecting to the way Alvaro did it, just
asking why he did it that way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add CREATE support to event triggers

От

Robert Haas

Дата:

10 января 2014 г., 15:48:51

On Fri, Jan 10, 2014 at 10:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 8 January 2014 20:42, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
>> CREATE SCHEMA IF NOT EXISTS "some schema" AUTHORIZATION "some guy";
>
> Hmm, given in 9.3 it was OK to have only DROP event triggers, I think
> it should be equally acceptable to have just CREATE, but without every
> option on CREATE.  CREATE SCHEMA is easily the most complex thing here
> and would be the command/event to deprioritise if we had any issues
> getting this done/agreeing something for 9.4.

I don't know that I agree with that, but I guess we can cross that
bridge when we come to it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add CREATE support to event triggers

От

Simon Riggs

Дата:

10 января 2014 г., 15:55:53

On 10 January 2014 15:48, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jan 10, 2014 at 10:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 8 January 2014 20:42, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>>
>>> CREATE SCHEMA IF NOT EXISTS "some schema" AUTHORIZATION "some guy";
>>
>> Hmm, given in 9.3 it was OK to have only DROP event triggers, I think
>> it should be equally acceptable to have just CREATE, but without every
>> option on CREATE.  CREATE SCHEMA is easily the most complex thing here
>> and would be the command/event to deprioritise if we had any issues
>> getting this done/agreeing something for 9.4.
>
> I don't know that I agree with that, but I guess we can cross that
> bridge when we come to it.

We've come to it...

You would prefer either everything or nothing?? On what grounds?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Add CREATE support to event triggers

От

Robert Haas

Дата:

10 января 2014 г., 17:07:25

On Fri, Jan 10, 2014 at 10:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 10 January 2014 15:48, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Jan 10, 2014 at 10:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> On 8 January 2014 20:42, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>>>
>>>> CREATE SCHEMA IF NOT EXISTS "some schema" AUTHORIZATION "some guy";
>>>
>>> Hmm, given in 9.3 it was OK to have only DROP event triggers, I think
>>> it should be equally acceptable to have just CREATE, but without every
>>> option on CREATE.  CREATE SCHEMA is easily the most complex thing here
>>> and would be the command/event to deprioritise if we had any issues
>>> getting this done/agreeing something for 9.4.
>>
>> I don't know that I agree with that, but I guess we can cross that
>> bridge when we come to it.
>
> We've come to it...
>
> You would prefer either everything or nothing?? On what grounds?

I hardly think I need to justify that position.  That's project policy
and always has been.  When somebody implements 50% of a feature, or
worse yet 95% of a feature, it violates the POLA for users and doesn't
always subsequently get completed, leaving us with long-term warts
that are hard to eliminate.  It's perfectly fine to implement a
feature incrementally if the pieces are individually self-consistent
and ideally even useful, but deciding to support every command except
one because the last one is hard to implement doesn't seem like a
principled approach to anything.  It's not even obvious to me that
CREATE SCHEMA is all that much harder than anything else and Alvaro
has not said that that's the only thing he can't implement (or why) so
I think it's entirely premature to make the decision now about which
way to proceed - but, OK, sure, if you want to force the issue now,
then yeah, I think it's better to have everything or nothing than to
have support for only some things justified by nothing more than
implementation complexity.

Aside from the general issue, in this particular case, I have
previously and repeatedly expressed concerns about regression test
coverage and suggested a path that would guarantee thorough regression
testing but which would require that support be complete for
everything present in our regression tests.  Although there may be
some other plan for guaranteeing thorough regression testing not only
now but going forward, I have not seen it proposed here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add CREATE support to event triggers

От

Simon Riggs

Дата:

10 января 2014 г., 17:59:18

On 10 January 2014 17:07, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jan 10, 2014 at 10:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 10 January 2014 15:48, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Fri, Jan 10, 2014 at 10:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>> On 8 January 2014 20:42, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>>>>
>>>>> CREATE SCHEMA IF NOT EXISTS "some schema" AUTHORIZATION "some guy";
>>>>
>>>> Hmm, given in 9.3 it was OK to have only DROP event triggers, I think
>>>> it should be equally acceptable to have just CREATE, but without every
>>>> option on CREATE.  CREATE SCHEMA is easily the most complex thing here
>>>> and would be the command/event to deprioritise if we had any issues
>>>> getting this done/agreeing something for 9.4.
>>>
>>> I don't know that I agree with that, but I guess we can cross that
>>> bridge when we come to it.
>>
>> We've come to it...
>>
>> You would prefer either everything or nothing?? On what grounds?
>
> I hardly think I need to justify that position.

Yeh, you do. Everybody does.

> That's project policy
> and always has been.  When somebody implements 50% of a feature, or
> worse yet 95% of a feature, it violates the POLA for users and doesn't
> always subsequently get completed, leaving us with long-term warts
> that are hard to eliminate.

So why was project policy violated when we released 9.3 with only DROP
event support? Surely that was a worse violation of POLA than my
suggestion?

It's not reasonable to do something yourself and then object when
others suggest doing the same thing.

After 3 years we need something useful. I think the "perfect being the
enemy of the good" argument applies here after this length of time.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Add CREATE support to event triggers

От

Robert Haas

Дата:

10 января 2014 г., 18:17:59

On Fri, Jan 10, 2014 at 12:59 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> That's project policy
>> and always has been.  When somebody implements 50% of a feature, or
>> worse yet 95% of a feature, it violates the POLA for users and doesn't
>> always subsequently get completed, leaving us with long-term warts
>> that are hard to eliminate.
>
> So why was project policy violated when we released 9.3 with only DROP
> event support? Surely that was a worse violation of POLA than my
> suggestion?

Well, obviously I didn't think so at the time, or I would have
objected.  I felt, and still feel, that implementing one kind of event
trigger (drop) does not necessarily require implementing another kind
(create).  I think that's clearly different from implementing either
one for only some object types.

"This event trigger will be called whenever an object is dropped" is a
reasonable contract with the user.  "This other event trigger will be
called whenever an object is created, unless it happens to be a
schema" is much less reasonable.

At least in my opinion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add CREATE support to event triggers

От

Simon Riggs

Дата:

10 января 2014 г., 21:40:24

On 10 January 2014 18:17, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jan 10, 2014 at 12:59 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> That's project policy
>>> and always has been.  When somebody implements 50% of a feature, or
>>> worse yet 95% of a feature, it violates the POLA for users and doesn't
>>> always subsequently get completed, leaving us with long-term warts
>>> that are hard to eliminate.
>>
>> So why was project policy violated when we released 9.3 with only DROP
>> event support? Surely that was a worse violation of POLA than my
>> suggestion?
>
> Well, obviously I didn't think so at the time, or I would have
> objected.  I felt, and still feel, that implementing one kind of event
> trigger (drop) does not necessarily require implementing another kind
> (create).  I think that's clearly different from implementing either
> one for only some object types.
>
> "This event trigger will be called whenever an object is dropped" is a
> reasonable contract with the user.  "This other event trigger will be
> called whenever an object is created, unless it happens to be a
> schema" is much less reasonable.
>
> At least in my opinion.

In the fullness of time, I agree that is not a restriction we should maintain.

Given that CREATE SCHEMA with multiple objects is less well used, its
a reasonable restriction to accept for one release, if the alternative
is to implement nothing at all of value. Especially since we are now
in the third year of development of this set of features, it is time
to reduce the scope to ensure delivery.

There may be other ways to ensure something of value is added, this
was just one suggestion.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

10 января 2014 г., 23:22:53

Can we please stop arguing over a problem I don't have?  I started with
CREATE SCHEMA because it is one of the easy cases, not because it was
the most difficult case: we only need to deparse the bits of it that
don't involve the objects within, because those are reported by the
event trigger as separate commands.  Each object gets its own creation
command, which works pretty nicely.  Of course, the deparsed version of
the command will not look very much like what was submitted by the user,
but that doesn't matter -- what does matter is that the objects created
by running the commands reported in the event trigger will be (or should
be) the same as those created by the original command.

I was showing CREATE SCHEMA as a way to discuss the fine details: how to
report identifiers that might need quoting, what to do with optional
clauses (AUTHORIZATION), etc.  I am past that now.

On the subject of testing the triggers, Robert Haas wrote:

> Here's one idea: create a contrib module that (somehow, via APIs to be
> invented) runs every DDL command that gets executed through the
> deparsing code, and then parses the result and executes *that* instead
> of the original command.  Then, add a build target that runs the
> regression test suite in that mode, and get the buildfarm configured
> to run that build target regularly on at least some machines.  That
> way, adding syntax to the regular regression test suite also serves to
> test that the deparsing logic for that syntax is working.  If we do
> this, there's still some maintenance burden associated with having DDL
> deparsing code, but at least our chances of noticing when we've failed
> to maintain it should be pretty good.

I gave this some more thought and hit a snag.  The problem here is that
by the time the event trigger runs, the original object has already been
created.  At that point, we can't simply replace the created objects
with objects that would hypothetically be created by a command trigger.

A couple of very hand-wavy ideas:

1. in the event trigger, DROP the original object and CREATE it as
reported by the creation_commands SRF.

2. Have ddl_command_start open a savepoint, and then roll it back in
ddl_command_end, then create the object again.  Not sure this is doable
because of the whole SPI nesting issue .. maybe with C-language event
trigger functions?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Add CREATE support to event triggers

От

Jim Nasby

Дата:

11 января 2014 г., 00:14:55

On 1/10/14, 3:40 PM, Simon Riggs wrote:
> Given that CREATE SCHEMA with multiple objects is less well used, its
> a reasonable restriction to accept for one release, if the alternative
> is to implement nothing at all of value. Especially since we are now
> in the third year of development of this set of features, it is time
> to reduce the scope to ensure delivery.

ISTM that since you can nest JSON then CREATE SCHEMA just needs to support providing an array of other JSON
definitions,no? And it doesn't seem like it should be that hard to code that up.

That said, if it's a choice between getting a bunch of CREATE blah support or getting nothing because of CREATE SCHEMA
withobjects then as long as we have a scheme that supports the full CREATE SCHEMA feature we shouldn't hold up just
becauseit's not coded.

-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

Re: Add CREATE support to event triggers

От

Jim Nasby

Дата:

11 января 2014 г., 00:18:20

On 1/10/14, 5:22 PM, Alvaro Herrera wrote:
>> Here's one idea: create a contrib module that (somehow, via APIs to be
>> >invented) runs every DDL command that gets executed through the
>> >deparsing code, and then parses the result and executes*that*  instead
>> >of the original command.  Then, add a build target that runs the
>> >regression test suite in that mode, and get the buildfarm configured
>> >to run that build target regularly on at least some machines.  That
>> >way, adding syntax to the regular regression test suite also serves to
>> >test that the deparsing logic for that syntax is working.  If we do
>> >this, there's still some maintenance burden associated with having DDL
>> >deparsing code, but at least our chances of noticing when we've failed
>> >to maintain it should be pretty good.
> I gave this some more thought and hit a snag.  The problem here is that
> by the time the event trigger runs, the original object has already been
> created.  At that point, we can't simply replace the created objects
> with objects that would hypothetically be created by a command trigger.
>
> A couple of very hand-wavy ideas:
>
> 1. in the event trigger, DROP the original object and CREATE it as
> reported by the creation_commands SRF.
>
> 2. Have ddl_command_start open a savepoint, and then roll it back in
> ddl_command_end, then create the object again.  Not sure this is doable
> because of the whole SPI nesting issue .. maybe with C-language event
> trigger functions?

What if we don't try and do this all in one shot? I'm thinking let the original DDL do it's thing while capturing the
re-parsedcommands in order somewhere. Dump those commands into a brand new database and use that for testing.
 
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

Re: Add CREATE support to event triggers

От

Simon Riggs

Дата:

11 января 2014 г., 09:26:38

On 10 January 2014 23:22, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> On the subject of testing the triggers, Robert Haas wrote:
>
>> Here's one idea: create a contrib module that (somehow, via APIs to be
>> invented) runs every DDL command that gets executed through the
>> deparsing code, and then parses the result and executes *that* instead
>> of the original command.  Then, add a build target that runs the
>> regression test suite in that mode, and get the buildfarm configured
>> to run that build target regularly on at least some machines.  That
>> way, adding syntax to the regular regression test suite also serves to
>> test that the deparsing logic for that syntax is working.  If we do
>> this, there's still some maintenance burden associated with having DDL
>> deparsing code, but at least our chances of noticing when we've failed
>> to maintain it should be pretty good.
>
> I gave this some more thought and hit a snag.  The problem here is that
> by the time the event trigger runs, the original object has already been
> created.  At that point, we can't simply replace the created objects
> with objects that would hypothetically be created by a command trigger.

Yes, all we are doing is firing a trigger that has access to the
information about the current object.

> A couple of very hand-wavy ideas:
>
> 1. in the event trigger, DROP the original object and CREATE it as
> reported by the creation_commands SRF.
>
> 2. Have ddl_command_start open a savepoint, and then roll it back in
> ddl_command_end, then create the object again.  Not sure this is doable
> because of the whole SPI nesting issue .. maybe with C-language event
> trigger functions?

We need to be clear what action we are testing exactly. Once we know
that we can come up with a test mechanism.

We can come up with various tests that test coverage but that may not
be enough. Mostly we're just splitting up fields from the DDL, like
object name into its own text field, as well as augmenting it with
database name. That's pretty simple and shouldn't require much
testing. This code seems likely to be prone to missing features much
more so than actual bugs. Missing a piece of useful information in the
JSON doesn't mean there is a bug.

If all the event trigger does is generate JSON, then all we need to do
is test that the trigger fired and generated well-formed JSON with the
right information content. That should be very simple and I would say
the simpler the better.
To do that we can run code to parse the JSON and produce formatted
text output like this...
Field    |   Value
---------+-----------
XX          afagaha
YY          aamnamna
The output can be derived from visual inspection, proving that we
generated the required JSON.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

13 января 2014 г., 16:18:43

Alvaro Herrera escribió:

> In an event trigger, the function pg_event_trigger_get_creation_commands()
> returns the following JSON blob:

After playing with this for a while, I realized something that must have
seemed quite obvious to those paying attention: what this function is,
is just a glorified sprintf() for JSON.  So I propose we take our
existing format(text) and use it to model a new format(json) function,
which will be useful to the project at hand and be of more general
applicability.

To make it a better fit, I have changed the spec slightly.  The format
string is now the "fmt" element in the topmost JSON.  This format string
can contain % escapes, which consist of:

* the literal % itself
* an element name, enclosed in braces { }.  The name can optionally be followed by a colon and a possibly-empty array
separator.
* a format specifier, which can be I (identifier), D (dotted name), or s (string)
* Alternatively, %% expands to a literal %, as usual.

For each such escape, the JSON object is searched using the element name
as key.  For identifiers, the element is expected to be a string, and
will be quoted per identifier quoting rules.  Dotted-names are used to
format possibly-qualified relation names and such; the element must be
an object with one, two or three string elements, each of which is
quoted per identifier rules, and output separated by periods.

Finally, for arrays we expand each element in the JSON array element,
and separate them with the separator specified in the {} part of the
format specifier.

For instance,
alvherre=# select format(json '{"fmt":"hello, %{who}s! This is %{name}I", "who":"world", "name":"a function"}');
     format               
 
------------------------------------------hello, world! This is "a function"

Elements can be objects, in which case they are expanded recursively: a
"fmt" element is looked up and expanded as described above.


I don't yet see a need for %L escapes (that is, literals that can expand
to a single-quoted value or to NULL), but if I see it I will add that
too.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Add CREATE support to event triggers

От

Pavel Stehule

Дата:

14 января 2014 г., 08:09:17

Hello

2014/1/13 Alvaro Herrera <alvherre@2ndquadrant.com>

Alvaro Herrera escribió:

> In an event trigger, the function pg_event_trigger_get_creation_commands()
> returns the following JSON blob:

After playing with this for a while, I realized something that must have
seemed quite obvious to those paying attention: what this function is,
is just a glorified sprintf() for JSON. So I propose we take our
existing format(text) and use it to model a new format(json) function,
which will be useful to the project at hand and be of more general
applicability.

To make it a better fit, I have changed the spec slightly. The format
string is now the "fmt" element in the topmost JSON. This format string
can contain % escapes, which consist of:

* the literal % itself
* an element name, enclosed in braces { }. The name can optionally be
followed by a colon and a possibly-empty array separator.
* a format specifier, which can be I (identifier), D (dotted name), or s
(string)
* Alternatively, %% expands to a literal %, as usual.

For each such escape, the JSON object is searched using the element name
as key. For identifiers, the element is expected to be a string, and
will be quoted per identifier quoting rules. Dotted-names are used to
format possibly-qualified relation names and such; the element must be
an object with one, two or three string elements, each of which is
quoted per identifier rules, and output separated by periods.

Finally, for arrays we expand each element in the JSON array element,
and separate them with the separator specified in the {} part of the
format specifier.

For instance,
alvherre=# select format(json '{"fmt":"hello, %{who}s! This is %{name}I", "who":"world", "name":"a function"}');
format
------------------------------------------
hello, world! This is "a function"

Elements can be objects, in which case they are expanded recursively: a
"fmt" element is looked up and expanded as described above.

I don't yet see a need for %L escapes (that is, literals that can expand
to a single-quoted value or to NULL), but if I see it I will add that
too.

I am not against to this idea, although I don't see a strong benefit. . Just special function can be better - it has minimal relation to variadic "function" format - and nested mixed format can be messy

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Add CREATE support to event triggers

От

Robert Haas

Дата:

14 января 2014 г., 20:06:02

On Fri, Jan 10, 2014 at 6:22 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
>> Here's one idea: create a contrib module that (somehow, via APIs to be
>> invented) runs every DDL command that gets executed through the
>> deparsing code, and then parses the result and executes *that* instead
>> of the original command.  Then, add a build target that runs the
>> regression test suite in that mode, and get the buildfarm configured
>> to run that build target regularly on at least some machines.  That
>> way, adding syntax to the regular regression test suite also serves to
>> test that the deparsing logic for that syntax is working.  If we do
>> this, there's still some maintenance burden associated with having DDL
>> deparsing code, but at least our chances of noticing when we've failed
>> to maintain it should be pretty good.
>
> I gave this some more thought and hit a snag.  The problem here is that
> by the time the event trigger runs, the original object has already been
> created.  At that point, we can't simply replace the created objects
> with objects that would hypothetically be created by a command trigger.

Hmm, so these triggers are firing after the corresponding actions have
been performed.  Yeah, that's tricky.  I don't immediately have an
idea how to come up with a comprehensive test framework for that,
although I do think that the way you're structuring the JSON blobs
makes this a whole lot less error-prone than the approaches that have
been explored previously.  The big question in my mind is still "if
somebody updates the grammar in gram.y, how do we make sure that they
notice that this stuff needs to be updated to match?".  Ideally the
answer is "if they don't, the regression tests fail".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add CREATE support to event triggers

От

Jim Nasby

Дата:

14 января 2014 г., 22:28:16

On 1/14/14, 2:05 PM, Robert Haas wrote:
> On Fri, Jan 10, 2014 at 6:22 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>>> Here's one idea: create a contrib module that (somehow, via APIs to be
>>> invented) runs every DDL command that gets executed through the
>>> deparsing code, and then parses the result and executes *that* instead
>>> of the original command.  Then, add a build target that runs the
>>> regression test suite in that mode, and get the buildfarm configured
>>> to run that build target regularly on at least some machines.  That
>>> way, adding syntax to the regular regression test suite also serves to
>>> test that the deparsing logic for that syntax is working.  If we do
>>> this, there's still some maintenance burden associated with having DDL
>>> deparsing code, but at least our chances of noticing when we've failed
>>> to maintain it should be pretty good.
>>
>> I gave this some more thought and hit a snag.  The problem here is that
>> by the time the event trigger runs, the original object has already been
>> created.  At that point, we can't simply replace the created objects
>> with objects that would hypothetically be created by a command trigger.
>
> Hmm, so these triggers are firing after the corresponding actions have
> been performed.  Yeah, that's tricky.  I don't immediately have an
> idea how to come up with a comprehensive test framework for that,
> although I do think that the way you're structuring the JSON blobs
> makes this a whole lot less error-prone than the approaches that have
> been explored previously.  The big question in my mind is still "if
> somebody updates the grammar in gram.y, how do we make sure that they
> notice that this stuff needs to be updated to match?".  Ideally the
> answer is "if they don't, the regression tests fail".

Can't we capture the JSON, use that to create a new database, and then test against that? Unfortunately it'd be easiest
todo that at a high level and not for individual commands, but it's a lot better than nothing...
 
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

Re: Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

15 января 2014 г., 05:17:39

Pavel Stehule escribió:

> 2014/1/13 Alvaro Herrera <alvherre@2ndquadrant.com>

> > After playing with this for a while, I realized something that must have
> > seemed quite obvious to those paying attention: what this function is,
> > is just a glorified sprintf() for JSON.  So I propose we take our
> > existing format(text) and use it to model a new format(json) function,
> > which will be useful to the project at hand and be of more general
> > applicability.

> I am not against to this idea, although I don't see a strong benefit. .
> Just special function can be better - it has minimal relation to variadic
> "function" format - and nested mixed format can be messy

Yeah.  I eventually realized that I need some very specialized format
specifiers here; I invented on %{}T specifier, for instance, which is
used to format types.  So I think this is better confined to expansion
of SQL commands rather than a generic string formatter.

So here's a patch implementing the ideas expressed in this thread.
There are two new SQL functions:

pg_event_trigger_get_creation_commands()
Can be called in a ddl_command_end event, and returns a JSON blob for
each object created in that command.

pg_event_trigger_expand_command()
Takes such a JSON blob and turns it back into an executable command.

The usefulness of this combination is that the user can edit the JSON
between those two calls, say by adding new columns or changing or
removing schema specifications, tablespaces, and so on.

One interesting bit I had to add was format_type_detailed().  This
function returns a type specification in minute detail: schema, type
name, typemod, array are all returned separately.  This might seem
overkill, but if we want to let the user mess with column definitions, I
think it's necessary.

Some things are uglier than I would like -- one reason is I stayed away
from using the JSON things too directly.  There are at least two patches
in that area, and some helpers might appear that help this patch.
However, at the moment I am not sure whether the end result would be
better or worse, and I don't want to make this patch depend on some
other patch which might or might not end up being applied.  In any case,
the JSON stuff is pretty localized so it should be reasonably easy to
rip out and replace.  The first half of deparse_utility.c is concerned
with a simple-minded mechanism to accumulate an object hierarchy until
it's time to convert it to proper JSON.  Perhaps the new JSON stuff will
make it possible to rip that all out.

The JSON parsing is done in event_trigger.c.  This code should probably
live elsewhere, but I again hesitate to put it in json.c or jsonfuncs.c,
at least until some discussion about its general applicability takes
place.

The second half of deparse_utility.c is concerned with actually
processing the parse nodes to construct objects.  There are several
cases missing (at the moment, only CREATE SCHEMA, CREATE TABLE, CREATE
INDEX and CREATE SEQUENCE are supported, and in each of them there are
some things missing).  This is code tedious to write but not difficult.

To see this stuff in action, an event trigger function such as this is
useful:

CREATE OR REPLACE FUNCTION snitch() RETURNS event_trigger LANGUAGE plpgsql AS $$
DECLARE
        r RECORD;
BEGIN
        FOR r IN SELECT * FROM pg_event_trigger_get_creation_commands()
        LOOP
                RAISE NOTICE 'JSON blob: %', r.command;
                RAISE NOTICE 'expanded: %', pg_event_trigger_expand_command(r.command::json);
        END LOOP;
END;
$$;

CREATE EVENT TRIGGER snitch ON ddl_command_end
when tag in ('create schema', 'create table', 'create index', 'create sequence')
EXECUTE PROCEDURE snitch();

Then execute commands to your liking.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Вложения

event-trigger-create-2.patch

Re: Add CREATE support to event triggers

От

Andres Freund

Дата:

23 января 2014 г., 12:57:22

On 2014-01-15 02:11:11 -0300, Alvaro Herrera wrote:
> Then execute commands to your liking.

So, I am giving a quick look, given that I very likely will have to
write a consumer for it...

* deparse_utility_command returns payload via parameter, not return value?
* functions beginning with an underscore formally are reserved, we shouldn't add new places using such a convention.
* I don't think dequote_jsonval is correct as is, IIRC that won't correctly handle unicode escapes and such. I think
json.cneeds to expose functionality for this.
 
* I wonder if expand_jsonval_identifier shouldn't truncate as well? It should get handled by the individual created
commands,right?
 
* So, if I read things correctly, identifiers in json are never downcased, is that correct? I.e. they are implicitly
quoted?
* Why must we not schema qualify system types (c.f. expand_jsonval_typename)? It seems to be perfectly sensible to me
tojust use pg_catalog there.
 
* It looks like pg_event_trigger_expand_command will misparse nested {, error out instead?
* I wonder if deparseColumnConstraint couldn't somehow share a bit more code with ruleutils.c's
pg_get_constraintdef_worker().
* I guess you know, but deparseColumnConstraint() doesn't handle foreign keys yet.
* Is tcop/ the correct location for deparse_utility.c? I wonder if it shouldn't be alongside ruleutils.c instead.
* shouldn't pg_event_trigger_get_creation_commands return "command" as json instead of text?

* Not your department, but a builtin json pretty printer would be really, really handy. Something like
CREATE FUNCTION json_prettify(j json)
RETURNS TEXT AS $$   import json   return json.dumps(json.loads(j), sort_keys=True, indent=4)
$$ LANGUAGE PLPYTHONU;makes the json so much more readable.

Some minimal tests:
* CREATE SEQUENCE errors out with:
NOTICE:  JSON blob: {"sequence":{"relation":"frakbar2","schema":"public"},"persistence":"","fmt":"CREATE
%{persistence}sSEQUENCE %{identity}D"}
 
ERROR:  non-existant element "identity" in JSON formatting object
*CREATE TABLE frakkbar2(id int); error out with:
postgres=# CREATE TABLE frakkbar2(id int);
NOTICE:  JSON blob: {"on_commit":{"present":false,"on_commit_value":null,"fmt":"ON COMMIT
%{on_commit_value}s"},"tablespace":{"present":false,"tablespace":null,"fmt":"TABLESPACE
%{tablespace}I"},"inherits":{"present":false,"parents":null,"fmt":"INHERITS(%{parents:,
}D)"},"table_elements":[{"collation":{"present":false,"fmt":"COLLATE
%{name}I"},"type":{"typmod":"","typename":"integer","is_system":true,"is_array":false},"name":"id","fmt":"%{name}I
%{type}T%{collation}s"}],"of_type":{"present":false,"of_type":null,"fmt":"OF
%{of_type}T"},"if_not_exists":"","identity":{"relation":"frakkbar2","schema":"public"},"persistence":"","fmt":"CREATE
%{persistence}sTABLE %{identity}D %{if_not_exists}s %{of_type}s (%{table_elements:, }s) %{inherits}s %{on_commit}s
%{tablespace}s"}
ERROR:  invalid NULL is_system flag in %T element
CONTEXT:  PL/pgSQL function snitch() line 8 at RAISE

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

23 января 2014 г., 18:58:41

Andres Freund escribió:

> * Why must we not schema qualify system types
>   (c.f. expand_jsonval_typename)? It seems to be perfectly sensible to
>   me to just use pg_catalog there.

So, the reason for doing things this way is to handle cases like
"varchar(10)" being turned into "character varying"; and that name
requires that the typename NOT be schema-qualified, otherwise it fails.
But thinking about this again, I don't see a reason why this can't be
returned simply as pg_catalog.varchar(10); this should work fine on the
receiving end as well, and give the same result.

The other cases I'm worried about are types like bit(1) vs. unadorned
bit vs. double-quoted "bit", and "char", etc.  I'm not sure I'm dealing
with them correctly right now.  So even if by the above paragraph I
could make the is_system thingy go away, I might still need it to cover
this case.

Thanks for the review, I will post an updated version later after fixing
the other issues you mentioned plus adding support for more commands.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Add CREATE support to event triggers

От

Tom Lane

Дата:

23 января 2014 г., 20:21:04

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> So, the reason for doing things this way is to handle cases like
> "varchar(10)" being turned into "character varying"; and that name
> requires that the typename NOT be schema-qualified, otherwise it fails.
> But thinking about this again, I don't see a reason why this can't be
> returned simply as pg_catalog.varchar(10); this should work fine on the
> receiving end as well, and give the same result.

I think people would be unhappy if we changed the output of, say, pg_dump
that way.  But it's presumably not a problem for strings inside event
triggers.  Once upon a time, the typmods would have been an issue, but now
that we support them in generic typename syntax I think we're good.

> The other cases I'm worried about are types like bit(1) vs. unadorned
> bit vs. double-quoted "bit", and "char", etc.  I'm not sure I'm dealing
> with them correctly right now.  So even if by the above paragraph I
> could make the is_system thingy go away, I might still need it to cover
> this case.

Yeah, there are some weird cases there.  I think you can make them go away
if you always print the fully qualified type name, ie don't omit
"pg_catalog", and use pg_type.typname not any converted version (and don't
forget to double-quote anything that might be a reserved word).
But I've not looked closely at the code.
        regards, tom lane

Re: Add CREATE support to event triggers

От

Dimitri Fontaine

Дата:

30 января 2014 г., 13:53:43

Hi,

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> So here's a patch implementing the ideas expressed in this thread.
> There are two new SQL functions:

I spent some time reviewing the patch and tried to focus on a higher
level review, as I saw Andres already began with the lower level stuff.

The main things to keep in mind here are:
 - this patch enables running Event Triggers anytime a new object is   created, in a way that the user code is run once
theobject already   made it through the catalogs; 
 - the Event Trigger code has access to the full details about every   created object, so it's not tied to a command
butreally the fact   that an object just was created in the catalogs; 
      (it's important with serial and primary key sub-commands)
 - facilities are provided so that it's possible to easily build an SQL   command that if executed would create the
exactsame object again; 
 - the facilities around passing the created object details and   building a SQL command are made in such a way that
it'strivially   possible to "hack" away the captured objects properties before   producing again a new SQL command. 

After careful study and thinking, it appears to me that the proposed
patch addresses the whole range of features we expect here, and is both
flexible enough for the users and easy enough to maintain.

The event being fired once the objects are available in the catalogs
makes it possible for the code providing the details in the JSON format
to complete the parsetree with necessary information.

Current state of the patch is not ready for commit yet, independant of
code details some more high-level work needs to be done:
 - preliminary commit
   It might be a good idea to separate away some pre-requisites of this   patch and commit them separately: the OID
publishingparts and   allowing an Event Trigger to get fired after CREATE but without the   extra detailed JSON
formatedinformation might be good commits   already, and later add the much needed details about what did   happen. 
 - document the JSON format
   I vote for going with the proposed format, because it actually   allows to implement both the audit and replication
featureswe want,   with the capability of hacking schema, data types, SQL   representation etc; and because I couldn't
thinkof anything better   than what's proposed here ;-) 
 - other usual documentation
   I don't suppose I have to expand on what I mean here…
 - fill-in other commands
   Not all commands are supported in the submitted patch. I think once   we have a clear documentation on the general
JSONformating and how   to use it as a user, we need to include support for all CREATE   commands that we have. 
   I see nothing against extending when this work has to bo done until   after the CF, as long as it's fully done
beforebeta. After all it's   only about filling in minutia at this point. 
 - review the JSON producing code
   It might be possible to use more of the internal supports for JSON   now that the format is freezing.
 - regression tests
   The patch will need some. The simpler solution is to add a new   regression test entry and exercise all the CREATE
commandsin there,   in a specific schema, activating an event trigger that outputs the   JSON detailed information each
time(the snitch() example). 
   Best would be to have some "pretty" indented output of the JSON to   help with reviewing diffs, and I have to wonder
aboutJSON object   inner-ordering if we're going to do that. 
   No other ideas on this topic from me.

> The JSON parsing is done in event_trigger.c.  This code should probably
> live elsewhere, but I again hesitate to put it in json.c or jsonfuncs.c,
> at least until some discussion about its general applicability takes
> place.

I see that as useful enough if it can be made to work without the
special "fmt" fields somehow, with a nice default formatting ability.

In particular, being able to build some intermediate object with
json_agg then call the formating/expanding function on top of that might
be quite useful.

That said, I don't think we have enough time to attack this problem now,
I think it would be wiser to address your immediate problem separately
in your patch and clean it later (next release) with sharing code and
infrastructure and offering a more generally useful tool. At least we
will have some feedback about the Event Trigger specific context then.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

04 февраля 2014 г., 05:11:58

Hi,

Dimitri, thanks for the review.

Here's a new version of this patch.  I focused on a small set of
commands for now, with the intention of building something that could
serve as a useful foundation for an interesting percentage of use cases.
There are enough strange things going on in this code that I thought I
should go one step further in the foundations before I go much further
in supporting the rest of the commands, which will be mostly tedious
additions of extracting info from catalogs to immediately emit as JSON
blobs.

So this version supports the following commands:
   CREATE SCHEMA
   CREATE TABLE
   CREATE SEQUENCE
   CREATE INDEX
   CREATE VIEW

I have run into some issues, though:

1. certain types, particularly timestamp/timestamptz but really this
could happen for any type, have unusual typmod output behavior.  For
those one cannot just use the schema-qualified catalog names and then
append the typmod at the end; because what you end up is something like
   pg_catalog.timestamptz(4) with time zone
because, for whatever reason, the "with time zone" is part of typmod
output.  But this doesn't work at all for input.  I'm not sure how to
solve this.

2. I have been having object definitions be emitted complete; in
particular, sequences have OWNED BY clauses when they have an owning
column.  But this doesn't work with a SERIAL column, because we get
output like this:

alvherre=# CREATE  TABLE public.hijo  (b serial);
NOTICE:  expanded: CREATE  SEQUENCE public.hijo_b_seq INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START WITH
1CACHE 1 NO CYCLE OWNED BY public.hijo.b 
NOTICE:  expanded: CREATE  TABLE public.hijo  (b pg_catalog.int4 DEFAULT nextval('hijo_b_seq'::regclass) NOT NULL )

which is all nice, except that the sequence is using the column name as
owner before the column has been created in the first place.  Both these
command will, of course, fail, because both depend on the other to have
been executed first.  The tie is easy to break in this case: just don't
emit the OWNED BY clause .. but it makes me a bit nervous to be
hardcoding the decision of parts that might depend on others.  OTOH
pg_dump already knows how to split objects in constituent parts as
necessary; maybe it's not so bad.

3. It is possible to coerce ruleutils.c to emit always-qualified names
by using PushOverrideSearchPath() facility; but actually this doesn't
always work, because some places in namespace.c believe that
PG_CATALOG_NAMESPACE is always visible and so certain objects are not
qualified.  In particular, text columns using default collation will be
emitted as having collation "default" and not pg_catalog.default as I
would have initially expected.  Right now it doesn't seem like this is a
problem, but it is unusual.

Dimitri Fontaine wrote:

> After careful study and thinking, it appears to me that the proposed
> patch addresses the whole range of features we expect here, and is both
> flexible enough for the users and easy enough to maintain.

Excellent, thanks.

>   - preliminary commit
>
>     It might be a good idea to separate away some pre-requisites of this
>     patch and commit them separately: the OID publishing parts and
>     allowing an Event Trigger to get fired after CREATE but without the
>     extra detailed JSON formated information might be good commits
>     already, and later add the much needed details about what did
>     happen.

I agree that perhaps the OID publishing bits should be pushed prior to
the rest of the patch.  It's two minor changes anyway that only affect
CREATE TABLE AS and REFRESH MATVIEW as far as I remember; the rest of
the commands already return the OID of the affected object.

I'm not sure about pushing the bits to have a trigger to fire before
having the deparse utility stuff.  I guess it is useful because it will
provide classID/objectID/identity for all created objects, even if we
don't have the creation command.  But changing the API of
pg_get_creation_commands() in a later release might be problematic.

>   - document the JSON format
>
>     I vote for going with the proposed format, because it actually
>     allows to implement both the audit and replication features we want,
>     with the capability of hacking schema, data types, SQL
>     representation etc; and because I couldn't think of anything better
>     than what's proposed here ;-)

That's great to hear.  Since the previous patch I have added a %{}O
escape that is used to format operators.  I think that makes the set of
format specifiers mostly complete.

>   - review the JSON producing code
>
>     It might be possible to use more of the internal supports for JSON
>     now that the format is freezing.

Yeah.  I realized that the trick through row_to_json is likely to fail
when the objects have more than 1600 columns.  Not sure if this is a
problem in practice (haven't tried creating a 1600-column table yet, but
I think that wouldn't fail because table elements are handed as arrays).
Anyway if we want to use the new json_build facilities, we'd need to
serialize from the in-memory representation I'm using to the text
format, and from there to JSON.  Not real sure that that's an
improvement ...

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Вложения

event-trigger-create-4.patch

Re: Add CREATE support to event triggers

От

Robert Haas

Дата:

05 февраля 2014 г., 18:25:16

On Tue, Feb 4, 2014 at 12:11 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> I have run into some issues, though:
>
> 1. certain types, particularly timestamp/timestamptz but really this
> could happen for any type, have unusual typmod output behavior.  For
> those one cannot just use the schema-qualified catalog names and then
> append the typmod at the end; because what you end up is something like
>    pg_catalog.timestamptz(4) with time zone
> because, for whatever reason, the "with time zone" is part of typmod
> output.  But this doesn't work at all for input.  I'm not sure how to
> solve this.

How about doing whatever pg_dump does?

> 2. I have been having object definitions be emitted complete; in
> particular, sequences have OWNED BY clauses when they have an owning
> column.  But this doesn't work with a SERIAL column, because we get
> output like this:
>
> alvherre=# CREATE  TABLE public.hijo  (b serial);
> NOTICE:  expanded: CREATE  SEQUENCE public.hijo_b_seq INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START
WITH1 CACHE 1 NO CYCLE OWNED BY public.hijo.b

> NOTICE:  expanded: CREATE  TABLE public.hijo  (b pg_catalog.int4 DEFAULT nextval('hijo_b_seq'::regclass) NOT NULL )
>
> which is all nice, except that the sequence is using the column name as
> owner before the column has been created in the first place.  Both these
> command will, of course, fail, because both depend on the other to have
> been executed first.  The tie is easy to break in this case: just don't
> emit the OWNED BY clause .. but it makes me a bit nervous to be
> hardcoding the decision of parts that might depend on others.  OTOH
> pg_dump already knows how to split objects in constituent parts as
> necessary; maybe it's not so bad.

Well, the sequence can't depend on a table column that doesn't exist
yet, so if it's in effect doing what you've shown there, it's
"cheating" by virtue of knowing that nobody can observe the
intermediate state.  Strictly speaking, there's nothing "wrong" with
emitting those commands just as you have them there; they won't run,
but if what you want to do is log what's happened rather than replay
it, that's OK.  Producing output that is actually executable is a
strictly harder problem than producing output that accurately
describes what happened.  As you say, pg_dump already splits things
and getting executable output out of this facility will require the
same kinds of tricks here.  This gets back to my worry about
maintaining two or three copies of the code that solve many of the
same problems in quite different ways...

> 3. It is possible to coerce ruleutils.c to emit always-qualified names
> by using PushOverrideSearchPath() facility; but actually this doesn't
> always work, because some places in namespace.c believe that
> PG_CATALOG_NAMESPACE is always visible and so certain objects are not
> qualified.  In particular, text columns using default collation will be
> emitted as having collation "default" and not pg_catalog.default as I
> would have initially expected.  Right now it doesn't seem like this is a
> problem, but it is unusual.

We have a quote_all_identifiers flag.  We could have a
schema_qualify_all_identifiers flag, too.  Then again, why is the
behavior of schema-qualifying absolutely everything even desirable?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

05 февраля 2014 г., 19:42:55

Robert Haas escribió:
> On Tue, Feb 4, 2014 at 12:11 AM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> > I have run into some issues, though:
> >
> > 1. certain types, particularly timestamp/timestamptz but really this
> > could happen for any type, have unusual typmod output behavior.  For
> > those one cannot just use the schema-qualified catalog names and then
> > append the typmod at the end; because what you end up is something like
> >    pg_catalog.timestamptz(4) with time zone
> > because, for whatever reason, the "with time zone" is part of typmod
> > output.  But this doesn't work at all for input.  I'm not sure how to
> > solve this.
> 
> How about doing whatever pg_dump does?

We use format_type() for that as far as I know.  What it does
differently is use undecorated names defined by the standard for some
types, which are never schema qualified and are never ambiguous because
they are reserved words that would require quoting if used by
user-defined type names.  We can't use that here: somewhere upthread we
noticed issues when using those which is why we're now trying to use
catalog names instead of those special names.  (I don't think it's
impossible to use such names: we just need to ensure we handle quoting
correctly for the funny cases such as "char" and "bit.)

One idea is to chop the typmod output string at the closing parens.
That seems to work well for the types that come with core code ... and
the rule with the funny string at the end after the parenthised part of
the typmod works only by type names with hardcoded productions in
gram.y, so there is no danger that outside code will have a problem with
that strategy.

(I verified PostGIS types with typmods just to see an example of
out-of-core code at work, and unsurprisingly it only emits stuff inside
parens.)

> > 2. I have been having object definitions be emitted complete; in
> > particular, sequences have OWNED BY clauses when they have an owning
> > column.  But this doesn't work with a SERIAL column, because we get
> > output like this:

> Well, the sequence can't depend on a table column that doesn't exist
> yet, so if it's in effect doing what you've shown there, it's
> "cheating" by virtue of knowing that nobody can observe the
> intermediate state.  Strictly speaking, there's nothing "wrong" with
> emitting those commands just as you have them there; they won't run,
> but if what you want to do is log what's happened rather than replay
> it, that's OK.  Producing output that is actually executable is a
> strictly harder problem than producing output that accurately
> describes what happened.  As you say, pg_dump already splits things
> and getting executable output out of this facility will require the
> same kinds of tricks here.

Yeah.  Moreover this will require that this new event trigger code is
able to handle (at least certain kinds of) ALTER, which expands this
patch in scope by a wide margin.

Producing executable commands is an important goal.

> This gets back to my worry about maintaining two or three copies of
> the code that solve many of the same problems in quite different
> ways...

Agreed.  It would be real good to be able to use this code for pg_dump
too, but it seems dubious.  The two environments seem just too different
to be able to reuse anything.  But if you see a way, by all means shout.

> > 3. It is possible to coerce ruleutils.c to emit always-qualified names
> > by using PushOverrideSearchPath() facility; but actually this doesn't
> > always work, because some places in namespace.c believe that
> > PG_CATALOG_NAMESPACE is always visible and so certain objects are not
> > qualified.  In particular, text columns using default collation will be
> > emitted as having collation "default" and not pg_catalog.default as I
> > would have initially expected.  Right now it doesn't seem like this is a
> > problem, but it is unusual.
> 
> We have a quote_all_identifiers flag.  We could have a
> schema_qualify_all_identifiers flag, too.

Actually the bit about collations was just a garden variety bug: turns
out I was using a %{}I specifier (and correspondingly only the collation
name as a string) instead of %{}D and get_catalog_object_by_oid() to
match.  I'm not yet sure if this new flag you suggest will really be
needed, but thanks for the idea.

> Then again, why is the behavior of schema-qualifying absolutely
> everything even desirable?

Well, someone could create a collation in another schema with the same
name as a system collation and the command would become ambiguous.  For
example, this command create table f (a text collate "POSIX");
creates a column whose collation is either public."POSIX" or the system
POSIX collation, depending on whether public appears before pg_catalog
in search_path.  Having someone create a POSIX collation in a schema
that appears earlier than pg_catalog is pretty far-fetched, but ISTM
that we really need to cover that scenario.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Add CREATE support to event triggers

От

Tom Lane

Дата:

05 февраля 2014 г., 19:57:43

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Robert Haas escribi�:
>> How about doing whatever pg_dump does?

> We use format_type() for that as far as I know.  What it does
> differently is use undecorated names defined by the standard for some
> types, which are never schema qualified and are never ambiguous because
> they are reserved words that would require quoting if used by
> user-defined type names.  We can't use that here: somewhere upthread we
> noticed issues when using those which is why we're now trying to use
> catalog names instead of those special names.  (I don't think it's
> impossible to use such names: we just need to ensure we handle quoting
> correctly for the funny cases such as "char" and "bit.)

Yeah, but wouldn't that complexity also bubble into user code within the
event triggers?  Since there's no real need for SQL standard compliance
in this context, I think minimizing the number of weird formats is a
win.

> One idea is to chop the typmod output string at the closing parens.

+1.  The only reason timestamptypmodout works like that is that we're
trying to match the SQL standard's spelling of the type names, and
that committee apparently considers it an off day whenever they can't
invent some randomly-incompatible-with-everything syntax.
        regards, tom lane

Re: Add CREATE support to event triggers

От

Robert Haas

Дата:

06 февраля 2014 г., 02:58:07

On Wed, Feb 5, 2014 at 2:26 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> Then again, why is the behavior of schema-qualifying absolutely
>> everything even desirable?
>
> Well, someone could create a collation in another schema with the same
> name as a system collation and the command would become ambiguous.  For
> example, this command
>   create table f (a text collate "POSIX");
> creates a column whose collation is either public."POSIX" or the system
> POSIX collation, depending on whether public appears before pg_catalog
> in search_path.  Having someone create a POSIX collation in a schema
> that appears earlier than pg_catalog is pretty far-fetched, but ISTM
> that we really need to cover that scenario.

Hmm, good point.  I guess we don't worry much about this with pg_dump
because we assume that we're restoring into an empty database (and if
not, the user gets to keep both pieces).  You're applying a higher
standard here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add CREATE support to event triggers

От

Tom Lane

Дата:

06 февраля 2014 г., 05:08:56

Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Feb 5, 2014 at 2:26 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>>> Then again, why is the behavior of schema-qualifying absolutely
>>> everything even desirable?

>> Well, someone could create a collation in another schema with the same
>> name as a system collation and the command would become ambiguous.

> Hmm, good point.  I guess we don't worry much about this with pg_dump
> because we assume that we're restoring into an empty database (and if
> not, the user gets to keep both pieces).  You're applying a higher
> standard here.

Robert, that's just horsepucky.  pg_dump is very careful about schemas.
It's also careful to not schema-qualify names unnecessarily, which is an
intentional tradeoff to improve readability of the dump --- at the cost
that the dump might break if restored into a nonempty database with
conflicting objects.  In the case of data passed to event triggers,
there's a different tradeoff to be made: people will probably value
consistency over readability, so always-qualify is probably the right
choice here.  But in neither case are we being sloppy.
        regards, tom lane

Re: Add CREATE support to event triggers

От

Robert Haas

Дата:

06 февраля 2014 г., 12:38:31

On Thu, Feb 6, 2014 at 12:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Feb 5, 2014 at 2:26 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>>>> Then again, why is the behavior of schema-qualifying absolutely
>>>> everything even desirable?
>
>>> Well, someone could create a collation in another schema with the same
>>> name as a system collation and the command would become ambiguous.
>
>> Hmm, good point.  I guess we don't worry much about this with pg_dump
>> because we assume that we're restoring into an empty database (and if
>> not, the user gets to keep both pieces).  You're applying a higher
>> standard here.
>
> Robert, that's just horsepucky.  pg_dump is very careful about schemas.
> It's also careful to not schema-qualify names unnecessarily, which is an
> intentional tradeoff to improve readability of the dump --- at the cost
> that the dump might break if restored into a nonempty database with
> conflicting objects.  In the case of data passed to event triggers,
> there's a different tradeoff to be made: people will probably value
> consistency over readability, so always-qualify is probably the right
> choice here.  But in neither case are we being sloppy.

I didn't mean to imply otherwise.  I know the pg_dump tries to avoid
excess schema-qualification for readability among other reasons; what
I meant was that Alvaro is applying a higher standard specifically in
regards to replayability.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

06 февраля 2014 г., 17:23:59

Tom Lane escribió:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:

> > One idea is to chop the typmod output string at the closing parens.
>
> +1.  The only reason timestamptypmodout works like that is that we're
> trying to match the SQL standard's spelling of the type names, and
> that committee apparently considers it an off day whenever they can't
> invent some randomly-incompatible-with-everything syntax.

Okay, I did it that way.

I also fixed the sequence OWNED BY problem simply by adding support for
ALTER SEQUENCE.  Of course, the intention is that all forms of CREATE
and ALTER are supported, but this one seems reasonable standalone
because CREATE TABLE uses it internally.

So in the attached patch I have fixed all the three problems I reported
for the previous version.

I also changed the EventTriggerStashCreatedObject() function in two
ways:

1. rename it to EventTriggerStashCommand().  This is because it will
   take ALTER commands as arguments, so the old name seemed a misnomer.
   (I guess I was initially thinking it'd only handle object creation,
   but obviously that will not work.)

2. Take ObjectType instead of ObjectClass.  I'm not really sure about
   this change; the old usage of OCLASS at the utility.c level seemed
   wrong (in particular, in the DefineStmt case it was having to
   translate the stmt->kind into ObjClass manually).  But on the other
   hand, the new usage of OBJECT_FOO requires a function to translate
   each enum value to the catalog that contains objects of that type
   (which is repetitive; the OCLASS_FOO code in dependency.c already has
   a table with that).  And also I had to add OBJECT_USER_MAPPING, which
   was missing.  I think I might end up reverting this bit, unless
   somebody sees a way to beautify the Objtype->catalog OID
   transformation.  (The bit about missing USER MAPPING stuff seems a
   bit troubling; EventTriggerSupportsObjectType wasn't aware of that
   object type.)

By way of illustration, here's the output a simple command using the
snitch() function previously posted.  You can see that the CREATE TABLE
command is expanded as three commands: CREATE SEQUENCE, CREATE TABLE,
ALTER SEQUENCE.

alvherre=# create unlogged table t1 (a serial);
NOTICE:  JSON blob: {
    "definition": [
        {
            "clause": "cache",
            "fmt": "CACHE %{value}s",
            "value": "1"
        },
        {
            "clause": "cycle",
            "fmt": "%{no}s CYCLE",
            "no": "NO"
        },
        {
            "clause": "increment_by",
            "fmt": "INCREMENT BY %{value}s",
            "value": "1"
        },
        {
            "clause": "minvalue",
            "fmt": "MINVALUE %{value}s",
            "value": "1"
        },
        {
            "clause": "maxvalue",
            "fmt": "MAXVALUE %{value}s",
            "value": "9223372036854775807"
        },
        {
            "clause": "start",
            "fmt": "START WITH %{value}s",
            "value": "1"
        },
        {
            "clause": "restart",
            "fmt": "RESTART %{value}s",
            "value": "1"
        }
    ],
    "fmt": "CREATE %{persistence}s SEQUENCE %{identity}D %{definition: }s",
    "identity": {
        "objname": "t1_a_seq",
        "schemaname": "public"
    },
    "persistence": ""
}
NOTICE:  expanded: CREATE  SEQUENCE public.t1_a_seq CACHE 1 NO CYCLE INCREMENT BY 1 MINVALUE 1 MAXVALUE
9223372036854775807START WITH 1 RESTART 1 
NOTICE:  JSON blob: {
    "fmt": "CREATE %{persistence}s TABLE %{identity}D %{if_not_exists}s (%{table_elements:, }s) %{inherits}s
%{on_commit}s%{tablespace}s",  
    "identity": {
        "objname": "t1",
        "schemaname": "public"
    },
    "if_not_exists": "",
    "inherits": {
        "fmt": "INHERITS (%{parents:, }D)",
        "parents": null,
        "present": false
    },
    "on_commit": {
        "fmt": "ON COMMIT %{on_commit_value}s",
        "on_commit_value": null,
        "present": false
    },
    "persistence": "UNLOGGED",
    "table_elements": [
        {
            "collation": {
                "fmt": "COLLATE %{name}D",
                "present": false
            },
            "coltype": {
                "is_array": false,
                "schemaname": "pg_catalog",
                "typename": "int4",
                "typmod": ""
            },
            "default": {
                "default": "nextval('t1_a_seq'::regclass)",
                "fmt": "DEFAULT %{default}s"
            },
            "fmt": "%{name}I %{coltype}T %{default}s %{not_null}s %{collation}s",
            "name": "a",
            "not_null": "NOT NULL",
            "type": "column"
        }
    ],
    "table_kind": "plain",
    "tablespace": {
        "fmt": "TABLESPACE %{tablespace}I",
        "present": false,
        "tablespace": null
    }
}
NOTICE:  expanded: CREATE UNLOGGED TABLE public.t1  (a pg_catalog.int4 DEFAULT nextval('t1_a_seq'::regclass) NOT NULL )

NOTICE:  JSON blob: {
    "definition": [
        {
            "clause": "owned",
            "fmt": "OWNED BY %{owner}D",
            "owner": {
                "attrname": "a",
                "objname": "t1",
                "schemaname": "public"
            }
        }
    ],
    "fmt": "ALTER SEQUENCE %{identity}D %{definition: }s",
    "identity": {
        "objname": "t1_a_seq",
        "schemaname": "public"
    }
}
NOTICE:  expanded: ALTER SEQUENCE public.t1_a_seq OWNED BY public.t1.a
CREATE TABLE

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Вложения

event-trigger-create-5.patch

Re: Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

13 марта 2014 г., 21:06:54

Alvaro Herrera escribió:

> I also fixed the sequence OWNED BY problem simply by adding support for
> ALTER SEQUENCE.  Of course, the intention is that all forms of CREATE
> and ALTER are supported, but this one seems reasonable standalone
> because CREATE TABLE uses it internally.

I have been hacking on this on and off.  This afternoon I discovered
that interval typmod output can also be pretty unusual.  Example:

create table a (a interval year to month);

For the column, we get this type spec (note the typmod):
           "coltype": {               "is_array": false,                "schemaname": "pg_catalog",
"typename":"interval",                "typmod": " year to month"           }, 
 

so the whole command output ends up being this:

NOTICE:  expanded: CREATE  TABLE  public.a (a pg_catalog."interval" year to month   )    WITH (oids=OFF)

However, this is not accepted on input:

alvherre=# CREATE  TABLE  public.a (a pg_catalog."interval" year to month   )    WITH (oids=OFF);
ERROR:  syntax error at or near "year"
LÍNEA 1: CREATE  TABLE  public.a (a pg_catalog."interval" year to mon...
        ^
 

I'm not too sure what to do about this yet.  I checked the catalogs and
gram.y, and it seems that interval is the only type that allows such
strange games to be played.  I would hate to be forced to add a kludge
specific to type interval, but that seems to be the only option.  (This
would involve checking the OID of the type in deparse_utility.c, and if
it's INTERVALOID, then omit the schema qualification and quoting on the
type name).

I have also been working on adding ALTER TABLE support.  So far it's
pretty simple; here is an example.  Note I run a single command which
includes a SERIAL column, and on output I get three commands (just like
a serial column on create table).

alvherre=# alter table tt add column b numeric, add column c serial, alter column a set default extract(epoch from
now());
NOTICE:  JSON blob: {   "definition": [       {           "clause": "cache",            "fmt": "CACHE %{value}s",
    "value": "1"       },        {           "clause": "cycle",            "fmt": "%{no}s CYCLE",            "no": "NO"
     },        {           "clause": "increment_by",            "fmt": "INCREMENT BY %{value}s",            "value":
"1"      },        {           "clause": "minvalue",            "fmt": "MINVALUE %{value}s",            "value": "1"
  },        {           "clause": "maxvalue",            "fmt": "MAXVALUE %{value}s",            "value":
"9223372036854775807"      },        {           "clause": "start",            "fmt": "START WITH %{value}s",
"value": "1"       },        {           "clause": "restart",            "fmt": "RESTART %{value}s",
"value":"1"       }   ],    "fmt": "CREATE %{persistence}s SEQUENCE %{identity}D %{definition: }s",    "identity": {
  "objname": "tt_c_seq",        "schemaname": "public"   },    "persistence": ""
 
}
NOTICE:  expanded: CREATE  SEQUENCE public.tt_c_seq CACHE 1 NO CYCLE INCREMENT BY 1 MINVALUE 1 MAXVALUE
9223372036854775807START WITH 1 RESTART 1
 
NOTICE:  JSON blob: {   "fmt": "ALTER TABLE %{identity}D %{subcmds:, }s",    "identity": {       "objname": "tt",
"schemaname": "public"   },    "subcmds": [       {           "definition": {               "collation": {
    "fmt": "COLLATE %{name}D",                    "present": false               },                "coltype": {
         "is_array": false,                    "schemaname": "pg_catalog",                    "typename": "numeric",
               "typmod": ""               },                "default": {                   "fmt": "DEFAULT
%{default}s",                   "present": false               },                "fmt": "%{name}I %{coltype}T
%{default}s%{not_null}s %{collation}s",                "name": "b",                "not_null": "",
"type":"column"           },            "fmt": "ADD COLUMN %{definition}s",            "type": "add column"       },
   {           "definition": {               "collation": {                   "fmt": "COLLATE %{name}D",
   "present": false               },                "coltype": {                   "is_array": false,
"schemaname": "pg_catalog",                    "typename": "int4",                    "typmod": ""               },
          "default": {                   "default": "pg_catalog.nextval('public.tt_c_seq'::pg_catalog.regclass)",
            "fmt": "DEFAULT %{default}s"               },                "fmt": "%{name}I %{coltype}T %{default}s
%{not_null}s%{collation}s",                "name": "c",                "not_null": "",                "type": "column"
        },            "fmt": "ADD COLUMN %{definition}s",            "type": "add column"       },        {
"column":"a",            "definition": "pg_catalog.date_part('epoch'::pg_catalog.text, pg_catalog.now())",
"fmt":"ALTER COLUMN %{column}I SET DEFAULT %{definition}s",            "type": "set default"       }   ]
 
}
NOTICE:  expanded: ALTER TABLE public.tt ADD COLUMN b pg_catalog."numeric"   , ADD COLUMN c pg_catalog.int4 DEFAULT
pg_catalog.nextval('public.tt_c_seq'::pg_catalog.regclass) , ALTER COLUMN a SET DEFAULT
pg_catalog.date_part('epoch'::pg_catalog.text,pg_catalog.now())
 
NOTICE:  JSON blob: {   "definition": [       {           "clause": "owned",            "fmt": "OWNED BY %{owner}D",
       "owner": {               "attrname": "c",                "objname": "tt",                "schemaname": "public"
        }       }   ],    "fmt": "ALTER SEQUENCE %{identity}D %{definition: }s",    "identity": {       "objname":
"tt_c_seq",       "schemaname": "public"   }
 
}
NOTICE:  expanded: ALTER SEQUENCE public.tt_c_seq OWNED BY public.tt.c
ALTER TABLE


Each subcommand is represented separately in a JSON array.  Each element
in the array has a "type" element indicating (broadly) what it's doing;
the "fmt" element has all the details.  So things like replication
systems might decide to replicate some part of the ALTER or not,
depending on the specific type.  (And, of course, they can easily decide
that replica XYZ must not replay the command because the table is not
supposed to exist there; or perhaps it belongs to a replication set that
is not the one the current node is origin for.)


-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Add CREATE support to event triggers

От

Robert Haas

Дата:

14 марта 2014 г., 12:50:42

On Thu, Mar 13, 2014 at 5:06 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Alvaro Herrera escribió:
>
>> I also fixed the sequence OWNED BY problem simply by adding support for
>> ALTER SEQUENCE.  Of course, the intention is that all forms of CREATE
>> and ALTER are supported, but this one seems reasonable standalone
>> because CREATE TABLE uses it internally.
>
> I have been hacking on this on and off.  This afternoon I discovered
> that interval typmod output can also be pretty unusual.  Example:
>
> create table a (a interval year to month);
>
> For the column, we get this type spec (note the typmod):
>
>             "coltype": {
>                 "is_array": false,
>                 "schemaname": "pg_catalog",
>                 "typename": "interval",
>                 "typmod": " year to month"
>             },
>
> so the whole command output ends up being this:
>
> NOTICE:  expanded: CREATE  TABLE  public.a (a pg_catalog."interval" year to month   )    WITH (oids=OFF)
>
> However, this is not accepted on input:
>
> alvherre=# CREATE  TABLE  public.a (a pg_catalog."interval" year to month   )    WITH (oids=OFF);
> ERROR:  syntax error at or near "year"
> LÍNEA 1: CREATE  TABLE  public.a (a pg_catalog."interval" year to mon...
>                                                           ^
>
> I'm not too sure what to do about this yet.  I checked the catalogs and
> gram.y, and it seems that interval is the only type that allows such
> strange games to be played.  I would hate to be forced to add a kludge
> specific to type interval, but that seems to be the only option.  (This
> would involve checking the OID of the type in deparse_utility.c, and if
> it's INTERVALOID, then omit the schema qualification and quoting on the
> type name).
>
> I have also been working on adding ALTER TABLE support.  So far it's
> pretty simple; here is an example.  Note I run a single command which
> includes a SERIAL column, and on output I get three commands (just like
> a serial column on create table).
>
> alvherre=# alter table tt add column b numeric, add column c serial, alter column a set default extract(epoch from
now());
> NOTICE:  JSON blob: {
>     "definition": [
>         {
>             "clause": "cache",
>             "fmt": "CACHE %{value}s",
>             "value": "1"
>         },
>         {
>             "clause": "cycle",
>             "fmt": "%{no}s CYCLE",
>             "no": "NO"
>         },
>         {
>             "clause": "increment_by",
>             "fmt": "INCREMENT BY %{value}s",
>             "value": "1"
>         },
>         {
>             "clause": "minvalue",
>             "fmt": "MINVALUE %{value}s",
>             "value": "1"
>         },
>         {
>             "clause": "maxvalue",
>             "fmt": "MAXVALUE %{value}s",
>             "value": "9223372036854775807"
>         },
>         {
>             "clause": "start",
>             "fmt": "START WITH %{value}s",
>             "value": "1"
>         },
>         {
>             "clause": "restart",
>             "fmt": "RESTART %{value}s",
>             "value": "1"
>         }
>     ],
>     "fmt": "CREATE %{persistence}s SEQUENCE %{identity}D %{definition: }s",
>     "identity": {
>         "objname": "tt_c_seq",
>         "schemaname": "public"
>     },
>     "persistence": ""
> }

What does the colon-space in %{definition: }s mean?

In general, it seems like you're making good progress here, and I'm
definitely happier with this than with previous approaches, but I'm
still concerned about how maintainable it's going to be.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

13 июня 2014 г., 20:58:52

Here's a refreshed version of this patch.  I have split it up in a
largish number of pieces, which hopefully makes it easier to understand
what is going on.  First there are a number of trivial fixups which
mostly shouldn't require much of an explanation, or add functionality to
ruleutils that later patches need.  I also split builtins.h to create a
new ruleutils.h header, in which there are several prototypes for
functions in ruleutils.c.  (The reason for this is that a later patch
adds a bunch more prototypes which require other headers to be #included
by ruleutils.h, and it didn't seem a good idea to include those in
builtins.h.  This doesn't affect much -- only 7 .c files had to #include
the new file.)

The first really interesting patch is 0014, which introduces the concept
of a "command queue" in event triggers, where DDL commands are stored as
parse nodes plus supporting data.  At ddl_command_end, the user code can
call the new pg_event_trigger_get_creation_commands() function, which
causes the parse node to be deparsed and returned as a JSON blob.  This
patch also introduces a function to decode the JSON blob back into a
text string, possibly after being tweaked by the user code.  A bunch of
DDL commands are implemented here, such as CREATE SCHEMA, CREATE TABLE,
and the like -- the basics.

Then we have a bunch more patches which add support for more commands.
In these patches you can see how the deparsing code for a certain
command can be written.  Most are pretty simple, but interesting
examples such as CREATE FUNCTION are also included.  We also have some
commands that are not merely create, such as ALTER..RENAME.

Patch 23 is the next interesting one -- it adds support for ALTER TABLE.
This one is a lot more involved, because deparsing ALTER TABLE
subcommands requires adding a call to collect each subcommand in
tablecmds.c itself.  In order to implement this, I modified most
subcommand functions so that they return the OID of the affected object,
or the attribute number of the affected column.  For example if you add
a constraint, the OID of the constraint is returned.  This OID is
collected and added to the list of subcommands.  At deparse time, the
collected commands can be deparsed back into JSON blobs as usual.  Since
certain commands execute AlterTable internally, I had to add
"bracketing" calls to them, which set up a pseudo-ALTER TABLE context
onto which the subcommands can be added.  For instance, CREATE INDEX,
CREATE VIEW and CREATE TABLESPACE MOVE are known to do this.

... then a bunch more boring commands are implemented ...

Finally, patch 36 adds support for GRANT and REVOKE deparsing.  This one
is interesting because it's completely unlike the others.  Here we don't
have a single OID of an affected object.  I chose to support this by
exposing the InternalGrantStmt struct (in a new aclchk.h file), which we
can deparse easily into the familiar JSON format.  (Of course, in an
patch earlier in the series I also had to modify event triggers so that
grant/revoke fire them as well, which currently they do not.)


There are no docs yet.  This has been tested in the bidirectional
replication suite -- it works by saving the deparsed (SQL, not JSON)
command in a replicated table; the remote side executes the command when
it receives the tuple.  Commands that fail to deparse sanely cause very
visible errors.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Add CREATE support to event triggers

От

Jim Nasby

Дата:

15 июня 2014 г., 00:44:33

On 2/6/14, 11:20 AM, Alvaro Herrera wrote:
> NOTICE:  JSON blob: {
>      "definition": [
>          {
>              "clause": "owned",
>              "fmt": "OWNED BY %{owner}D",
>              "owner": {
>                  "attrname": "a",
>                  "objname": "t1",
>                  "schemaname": "public"
>              }
>          }
>      ],
>      "fmt": "ALTER SEQUENCE %{identity}D %{definition: }s",
>      "identity": {
>          "objname": "t1_a_seq",
>          "schemaname": "public"
>      }
> }
> NOTICE:  expanded: ALTER SEQUENCE public.t1_a_seq OWNED BY public.t1.a

Apologies if this has been discussed and I missed it, but shouldn't part of the JSON be a field that indicates what
commandis being run? It doesn't seem wise to conflate detecting what the command is with the overall format string.
 
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

Re: Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

15 июня 2014 г., 04:17:26

Jim Nasby wrote:
> On 2/6/14, 11:20 AM, Alvaro Herrera wrote:
> >NOTICE:  JSON blob: {
> >     "definition": [
> >         {
> >             "clause": "owned",
> >             "fmt": "OWNED BY %{owner}D",
> >             "owner": {
> >                 "attrname": "a",
> >                 "objname": "t1",
> >                 "schemaname": "public"
> >             }
> >         }
> >     ],
> >     "fmt": "ALTER SEQUENCE %{identity}D %{definition: }s",
> >     "identity": {
> >         "objname": "t1_a_seq",
> >         "schemaname": "public"
> >     }
> >}
> >NOTICE:  expanded: ALTER SEQUENCE public.t1_a_seq OWNED BY public.t1.a
> 
> Apologies if this has been discussed and I missed it, but shouldn't part of the JSON be a field that indicates what
commandis being run? It doesn't seem wise to conflate detecting what the command is with the overall format string.
 

That's reported as a separate field by the
pg_event_trigger_creation_commands function.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Add CREATE support to event triggers

От

Michael Paquier

Дата:

25 августа 2014 г., 20:15:43

On Sat, Jun 14, 2014 at 5:31 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Here's a refreshed version of this patch. I have split it up in a
largish number of pieces, which hopefully makes it easier to understand
what is going on.

Alvaro,

Could you confirm that the patches you just committed are 1, 3 and 6?

Regards,

--
Michael

Re: Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

25 августа 2014 г., 20:34:02

Michael Paquier wrote:
> On Sat, Jun 14, 2014 at 5:31 AM, Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:
> 
> > Here's a refreshed version of this patch.  I have split it up in a
> > largish number of pieces, which hopefully makes it easier to understand
> > what is going on.
> >
> Alvaro,
> 
> Could you confirm that the patches you just committed are 1, 3 and 6?

And 4.  Yes, they are.  I wanted to get trivial stuff out of the way
while I had some other trivial patch at hand.  I'm dealing with another
patch from the commitfest now, so I'm not posting a rebased version
right away, apologies.

How do people like this patch series?  It would be much easier for me to
submit a single patch, but I feel handing it in little pieces makes it
easier for reviewers.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Add CREATE support to event triggers

От

Michael Paquier

Дата:

25 августа 2014 г., 23:10:22

On Tue, Aug 26, 2014 at 5:33 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Michael Paquier wrote:
> And 4.  Yes, they are.  I wanted to get trivial stuff out of the way
> while I had some other trivial patch at hand.  I'm dealing with another
> patch from the commitfest now, so I'm not posting a rebased version
> right away, apologies.
No problems. I imagine that most of the patches still apply.

> How do people like this patch series?  It would be much easier for me to
> submit a single patch, but I feel handing it in little pieces makes it
> easier for reviewers.
Well, I like the patch series for what it counts as long as you can
submit it as such. That's far easier to test and certainly helps in
spotting issues when kicking different code paths.
-- 
Michael

Re: Add CREATE support to event triggers

От

Michael Paquier

Дата:

27 августа 2014 г., 04:10:44

On Tue, Aug 26, 2014 at 8:10 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Well, I like the patch series for what it counts as long as you can
> submit it as such. That's far easier to test and certainly helps in
> spotting issues when kicking different code paths.

So, for patch 2, which is a cosmetic change for
pg_event_trigger_dropped_objects:
=# select pg_event_trigger_dropped_objects();
ERROR:  0A000: pg_event_trigger_dropped_objects can only be called in
a sql_drop event trigger function
LOCATION:  pg_event_trigger_dropped_objects, event_trigger.c:1220
This drops "()" from the error message with the function name. I guess
that this is fine. But PG_FUNCNAME_MACRO is used nowhere except
elog.h, and can as well be NULL. So if that's really necessary
shouldn't we use FunctionCallInfo instead. It is not as well not that
bad to hardcode the function name in the error message as well IMO.

For patch 5:
+1 for this move. When working on Postgres-XC a couple of years back I
wondered why this distinction was not made. Wouldn't it make sense to
move as well the declaration of quote_all_identifiers to ruleutils.h.
That's a GUC and moving it out of builtins.h would make sense IMO.

Patch 8 needs a rebase (patch independent on 1~6 it seems):
1 out of 57 hunks FAILED -- saving rejects to file
src/backend/commands/tablecmds.c.rej
(Stripping trailing CRs from patch.)

Patch 9:
1) It needs a rebase, AlterTableMoveAllStmt has been renamed to
AlterTableMoveAllStmt by commit 3c4cf08
2) Table summarizing event trigger firing needs to be updated with the
new command supported (src/sgml/event-trigger.sgml)

Patch 10, similar problems as patch 9:
1) Needs a rebase
2) table summarizing supported commands should be updated.
You could directly group patches 9 and 10 in the final commit IMO.
GRANT/REVOKE would also be the first command that would be supported
by event triggers that is not of the type CREATE/DROP/ALTER, hence
once it is rebased I would like to do some testing with it (same with
patch 9 btw) and see how it reacts with the event sql_drop
particularly (it should error out but still).

Patch 11: applies with some hunks.
So... This patch introduces an operation performing doing reverse
engineering of format_type_internal... I think that this would be more
adapted with a couple of new routines in lsyscache.[c|h] instead:
- One for the type name
- One for typmodout
- One for is_array
- One for its namespace
TBH, I wanted those routines a couple of times when working on XC and
finished coding them at the end, but in XC only :)

Patch 12: patch applies correctly.
Form_pg_sequence is already exposed in sequence.h even if it is only
used in sequence.c, so yes it seems to be the correct way to do it
here assuming that we need this data to rebuild a DDL. Why is
ACL_UPDATE needed when checking permissions? This new routine only
reads the values and does not update it. And a confirmation: ACL_USAGE
is used to make this routine usable for PL languages in this case,
right?
I think that you should mention at the top of get_sequence_values that
it returns a palloc'd result, and that it is the responsibility of
caller to free it.

That's all I have for now.
Regards,
-- 
Michael

Re: Add CREATE support to event triggers

От

Michael Paquier

Дата:

27 августа 2014 г., 04:14:50

On Wed, Aug 27, 2014 at 1:10 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Tue, Aug 26, 2014 at 8:10 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> Well, I like the patch series for what it counts as long as you can
>> submit it as such. That's far easier to test and certainly helps in
>> spotting issues when kicking different code paths.
>
> So, for patch 2, which is a cosmetic change for
> pg_event_trigger_dropped_objects:
> =# select pg_event_trigger_dropped_objects();
> ERROR:  0A000: pg_event_trigger_dropped_objects can only be called in
> a sql_drop event trigger function
> LOCATION:  pg_event_trigger_dropped_objects, event_trigger.c:1220
> This drops "()" from the error message with the function name. I guess
> that this is fine. But PG_FUNCNAME_MACRO is used nowhere except
> elog.h, and can as well be NULL. So if that's really necessary
> shouldn't we use FunctionCallInfo instead. It is not as well not that
> bad to hardcode the function name in the error message as well IMO.
>
> For patch 5:
> +1 for this move. When working on Postgres-XC a couple of years back I
> wondered why this distinction was not made. Wouldn't it make sense to
> move as well the declaration of quote_all_identifiers to ruleutils.h.
> That's a GUC and moving it out of builtins.h would make sense IMO.
>
> Patch 8 needs a rebase (patch independent on 1~6 it seems):
> 1 out of 57 hunks FAILED -- saving rejects to file
> src/backend/commands/tablecmds.c.rej
> (Stripping trailing CRs from patch.)
>
> Patch 9:
> 1) It needs a rebase, AlterTableMoveAllStmt has been renamed to
> AlterTableMoveAllStmt by commit 3c4cf08
> 2) Table summarizing event trigger firing needs to be updated with the
> new command supported (src/sgml/event-trigger.sgml)
>
> Patch 10, similar problems as patch 9:
> 1) Needs a rebase
> 2) table summarizing supported commands should be updated.
> You could directly group patches 9 and 10 in the final commit IMO.
> GRANT/REVOKE would also be the first command that would be supported
> by event triggers that is not of the type CREATE/DROP/ALTER, hence
> once it is rebased I would like to do some testing with it (same with
> patch 9 btw) and see how it reacts with the event sql_drop
> particularly (it should error out but still).
>
> Patch 11: applies with some hunks.
> So... This patch introduces an operation performing doing reverse
> engineering of format_type_internal... I think that this would be more
> adapted with a couple of new routines in lsyscache.[c|h] instead:
> - One for the type name
> - One for typmodout
> - One for is_array
> - One for its namespace
> TBH, I wanted those routines a couple of times when working on XC and
> finished coding them at the end, but in XC only :)
>
> Patch 12: patch applies correctly.
> Form_pg_sequence is already exposed in sequence.h even if it is only
> used in sequence.c, so yes it seems to be the correct way to do it
> here assuming that we need this data to rebuild a DDL. Why is
> ACL_UPDATE needed when checking permissions? This new routine only
> reads the values and does not update it. And a confirmation: ACL_USAGE
> is used to make this routine usable for PL languages in this case,
> right?
> I think that you should mention at the top of get_sequence_values that
> it returns a palloc'd result, and that it is the responsibility of
> caller to free it.

And a last one before lunch, closing the review for all the basic things...
Patch 13: Could you explain why this is necessary?
+extern PGDLLIMPORT bool creating_extension;
It may make sense by looking at the core features (then why isn't it
with the core features?), but I am trying to review the patches in
order.
-- 
Michael

Re: Add CREATE support to event triggers

От

Michael Paquier

Дата:

20 сентября 2014 г., 04:02:49

On Tue, Aug 26, 2014 at 11:14 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> And a last one before lunch, closing the review for all the basic things...
> Patch 13: Could you explain why this is necessary?
> +extern PGDLLIMPORT bool creating_extension;
> It may make sense by looking at the core features (then why isn't it
> with the core features?), but I am trying to review the patches in
> order.
Those patches have been reviewed up to number 14. Some of them could
be applied as-is as they are useful taken independently, but most of
them need a rebase, hence marking it as returned with feedback.
-- 
Michael

Re: Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

24 сентября 2014 г., 21:47:14

Michael Paquier wrote:
> On Tue, Aug 26, 2014 at 8:10 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
> > Well, I like the patch series for what it counts as long as you can
> > submit it as such. That's far easier to test and certainly helps in
> > spotting issues when kicking different code paths.
>
> So, for patch 2, which is a cosmetic change for
> pg_event_trigger_dropped_objects:
> =# select pg_event_trigger_dropped_objects();
> ERROR:  0A000: pg_event_trigger_dropped_objects can only be called in
> a sql_drop event trigger function
> LOCATION:  pg_event_trigger_dropped_objects, event_trigger.c:1220
> This drops "()" from the error message with the function name. I guess
> that this is fine. But PG_FUNCNAME_MACRO is used nowhere except
> elog.h, and can as well be NULL. So if that's really necessary
> shouldn't we use FunctionCallInfo instead. It is not as well not that
> bad to hardcode the function name in the error message as well IMO.

You're right.  Dropped this patch.

> For patch 5:
> +1 for this move. When working on Postgres-XC a couple of years back I
> wondered why this distinction was not made. Wouldn't it make sense to
> move as well the declaration of quote_all_identifiers to ruleutils.h.
> That's a GUC and moving it out of builtins.h would make sense IMO.

I had it that way initially, but decided against it, because it creates
too much churn; there are way too many .c files that depend on the
quoting stuff.  I don't want to repeat the htup_details.h disaster ..

> Patch 8 needs a rebase (patch independent on 1~6 it seems):
> 1 out of 57 hunks FAILED -- saving rejects to file
> src/backend/commands/tablecmds.c.rej
> (Stripping trailing CRs from patch.)

Fixed.

> Patch 9:
> 1) It needs a rebase, AlterTableMoveAllStmt has been renamed to
> AlterTableMoveAllStmt by commit 3c4cf08

Fixed.

> 2) Table summarizing event trigger firing needs to be updated with the
> new command supported (src/sgml/event-trigger.sgml)

Will add.

> Patch 10, similar problems as patch 9:
> 1) Needs a rebase

Done.

> 2) table summarizing supported commands should be updated.

Will add.

> You could directly group patches 9 and 10 in the final commit IMO.
> GRANT/REVOKE would also be the first command that would be supported
> by event triggers that is not of the type CREATE/DROP/ALTER, hence
> once it is rebased I would like to do some testing with it (same with
> patch 9 btw) and see how it reacts with the event sql_drop
> particularly (it should error out but still).

Actually, I think I would commit most of this stuff not in the same way
I'm submitting; I've split it up to ease review only.

> Patch 11: applies with some hunks.
> So... This patch introduces an operation performing doing reverse
> engineering of format_type_internal... I think that this would be more
> adapted with a couple of new routines in lsyscache.[c|h] instead:
> - One for the type name
> - One for typmodout
> - One for is_array
> - One for its namespace
> TBH, I wanted those routines a couple of times when working on XC and
> finished coding them at the end, but in XC only :)

Not real sure about this.  Being able to do the whole thing in one fell
swoop seems more sensible to me.  Did you need each of those things
separately, or together?

> Patch 12: patch applies correctly.
> Form_pg_sequence is already exposed in sequence.h even if it is only
> used in sequence.c, so yes it seems to be the correct way to do it
> here assuming that we need this data to rebuild a DDL. Why is
> ACL_UPDATE needed when checking permissions? This new routine only
> reads the values and does not update it. And a confirmation: ACL_USAGE
> is used to make this routine usable for PL languages in this case,
> right?

Hm, I guess I just copied it from some other routine.  Will fix.

> I think that you should mention at the top of get_sequence_values that
> it returns a palloc'd result, and that it is the responsibility of
> caller to free it.

Will add.


> And a last one before lunch, closing the review for all the basic things...
> Patch 13: Could you explain why this is necessary?
> +extern PGDLLIMPORT bool creating_extension;
> It may make sense by looking at the core features (then why isn't it
> with the core features?), but I am trying to review the patches in
> order.

It's needed in MSVC build; I merged it with the next patch, which
actually uses it.  The point is to detect whether a command is being
executed as part of an extension; we set a flag in the output for this.
In BDR, we emit only the CREATE EXTENSION command, not the individual
commands that the extension creates.  I would guess that most other
replication systems will want to do the same.

I attach a rebased version of this.  The patch structure is a bit
different; patch 4 (which used to be 14) introduces the infrastructure
for DDL deparsing to JSON, but no users of it; patch 5 introduces the
first few commands that actually produce deparse output.

I will add this patch series to the next commitfest.  Thanks for
reviewing.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

25 сентября 2014 г., 21:59:59

Actually here's a different split of these patches, which I hope makes
more sense.  My intention here is that patches 0001 to 0004 are simple
changes that can be pushed right away; they are not directly related to
the return-creation-command feature.  Patches 0005 to 0027 implement
that feature incrementally.  You can see in patch 0005 the DDL commands
that are still not implemented in deparse (they are the ones that have
an elog(ERROR) rather than a "command = NULL").  Patch 0006 adds calls
in ProcessUtilitySlow() to each command, so that the object(s) being
touched are added to the event trigger command stash.

Patches from 0007 to 0027 (excepting patch 0017) implement one or a
small number of commands in deparse.  Patch 0017 is necessary
infrastructure in ALTER TABLE to support deparsing that one.

My intention with the later patches is that they would all be pushed as
a single commit, i.e. the deparse support would be implemented for all
commands in a fell swoop rather than piecemeal -- except possibly patch
0017 (the ALTER TABLE infrastructure).  I split them up only for ease of
review.  Of course, before pushing we (I) need to implement deparsing
for all the remaining commands.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Add CREATE support to event triggers

От

Andres Freund

Дата:

25 сентября 2014 г., 23:57:41

On 2014-09-25 18:59:31 -0300, Alvaro Herrera wrote:
>  /* tid.c */
> diff --git a/src/include/utils/ruleutils.h b/src/include/utils/ruleutils.h
> new file mode 100644
> index 0000000..520b066
> --- /dev/null
> +++ b/src/include/utils/ruleutils.h
> @@ -0,0 +1,34 @@
> +/*-------------------------------------------------------------------------
> + *
> + * ruleutils.h
> + *        Declarations for ruleutils.c
> + *
> + * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
> + * Portions Copyright (c) 1994, Regents of the University of California
> + *
> + * src/include/ruleutils.h
> + *
> + *-------------------------------------------------------------------------
> + */
> +#ifndef RULEUTILS_H
> +#define RULEUTILS_H
> +
> +#include "nodes/nodes.h"
> +#include "nodes/parsenodes.h"
> +#include "nodes/pg_list.h"
> +
> +
> +extern char *pg_get_indexdef_string(Oid indexrelid);
> +extern char *pg_get_indexdef_columns(Oid indexrelid, bool pretty);
> +
> +extern char *pg_get_constraintdef_string(Oid constraintId);
> +extern char *deparse_expression(Node *expr, List *dpcontext,
> +                   bool forceprefix, bool showimplicit);
> +extern List *deparse_context_for(const char *aliasname, Oid relid);
> +extern List *deparse_context_for_planstate(Node *planstate, List *ancestors,
> +                              List *rtable, List *rtable_names);
> +extern List *select_rtable_names_for_explain(List *rtable,
> +                                Bitmapset *rels_used);
> +extern char *generate_collation_name(Oid collid);
> +
> +#endif    /* RULEUTILS_H */
> -- 
> 1.9.1
> 

I wondered for a minute whether any of these are likely to cause
problems for code just including builtins.h - but I think there will be
sufficiently few callers for them to make that not much of a concern.

> From 5e3555058a1bbb93e25a3a104c26e6b96dce994d Mon Sep 17 00:00:00 2001
> From: Alvaro Herrera <alvherre@alvh.no-ip.org>
> Date: Thu, 25 Sep 2014 16:34:50 -0300
> Subject: [PATCH 02/27] deparse/core: have RENAME return attribute number

Looks "harmless". I.e. +1.

> From 72c4d0ef875f9e9b0f3b424499738fd1bd946c32 Mon Sep 17 00:00:00 2001
> From: Alvaro Herrera <alvherre@alvh.no-ip.org>
> Date: Fri, 9 May 2014 18:32:23 -0400
> Subject: [PATCH 03/27] deparse/core: event triggers support GRANT/REVOKE

Hm. The docs don't mention whether these only work for database local
objects or also shared objects. Generally event trigger are only
triggered for the former... But the code doesn't seem to make a
difference?

> From 5bb35d2e51fe6c6e219fe5cf7ddbab5423e6be1b Mon Sep 17 00:00:00 2001
> From: Alvaro Herrera <alvherre@alvh.no-ip.org>
> Date: Thu, 25 Sep 2014 15:44:44 -0300
> Subject: [PATCH 04/27] deparse/core: event triggers support COMMENT
> 
> ---
>  doc/src/sgml/event-trigger.sgml      |  8 +++++++-
>  src/backend/commands/comment.c       |  5 ++++-
>  src/backend/commands/event_trigger.c |  1 +
>  src/backend/tcop/utility.c           | 21 +++++++++++++++++----
>  src/include/commands/comment.h       |  2 +-
>  5 files changed, 30 insertions(+), 7 deletions(-)
> 
> diff --git a/doc/src/sgml/event-trigger.sgml b/doc/src/sgml/event-trigger.sgml
> index 49b8384..39ecd94 100644
> --- a/doc/src/sgml/event-trigger.sgml
> +++ b/doc/src/sgml/event-trigger.sgml
> @@ -37,7 +37,7 @@
>     <para>
>       The <literal>ddl_command_start</> event occurs just before the
>       execution of a <literal>CREATE</>, <literal>ALTER</>, <literal>DROP</>,
> -     <literal>GRANT</> or <literal>REVOKE</>
> +     <literal>COMMENT</>, <literal>GRANT</> or <literal>REVOKE</>
>       command.  No check whether the affected object exists or doesn't exist is
>       performed before the event trigger fires.
>       As an exception, however, this event does not occur for
> @@ -281,6 +281,12 @@
>          <entry align="center"><literal>-</literal></entry>
>         </row>
>         <row>
> +        <entry align="left"><literal>COMMENT</literal></entry>
> +        <entry align="center"><literal>X</literal></entry>
> +        <entry align="center"><literal>X</literal></entry>
> +        <entry align="center"><literal>-</literal></entry>
> +       </row>
> +       <row>

Hm. No drop support because COMMENTs are odd and use odd NULL
semantics. Fair enough.

> From 098f5acabd774004dc5d9c750d55e7c9afa60238 Mon Sep 17 00:00:00 2001
> From: Alvaro Herrera <alvherre@alvh.no-ip.org>
> Date: Wed, 24 Sep 2014 15:53:04 -0300
> Subject: [PATCH 05/27] deparse: infrastructure needed for command deparsing


> ---
>  src/backend/catalog/objectaddress.c  | 115 +++++
>  src/backend/commands/event_trigger.c | 941 ++++++++++++++++++++++++++++++++++-
>  src/backend/tcop/Makefile            |   2 +-
>  src/backend/tcop/deparse_utility.c   | 877 ++++++++++++++++++++++++++++++++
>  src/backend/tcop/utility.c           |   2 +
>  src/backend/utils/adt/format_type.c  | 113 ++++-
>  src/include/catalog/objectaddress.h  |   2 +
>  src/include/catalog/pg_proc.h        |   4 +
>  src/include/commands/event_trigger.h |   3 +
>  src/include/commands/extension.h     |   2 +-
>  src/include/nodes/parsenodes.h       |   2 +
>  src/include/tcop/deparse_utility.h   |  60 +++
>  src/include/utils/builtins.h         |   5 +
>  13 files changed, 2117 insertions(+), 11 deletions(-)
>  create mode 100644 src/backend/tcop/deparse_utility.c
>  create mode 100644 src/include/tcop/deparse_utility.h
> 
> diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
> index b69b75b..a2445f1 100644
> --- a/src/backend/catalog/objectaddress.c
> +++ b/src/backend/catalog/objectaddress.c
> @@ -723,6 +723,121 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
>  }
>  
>  /*
> + * Return the OID of the catalog corresponding to the given object type
> + */
> +Oid
> +get_objtype_catalog_oid(ObjectType objtype)
> +{

> +        case OBJECT_ATTRIBUTE:
> +            catalog_id = TypeRelationId;    /* XXX? */
> +            break;

I'm indeed not clear why thats a meaningful choice?

> +/* XXX merge this with ObjectTypeMap? */

hm. I think that's obsolete by now?

> @@ -924,6 +929,7 @@ EventTriggerSupportsObjectType(ObjectType obtype)
>          case OBJECT_ATTRIBUTE:
>          case OBJECT_CAST:
>          case OBJECT_COLUMN:
> +        case OBJECT_COMPOSITE:

Is it actually correct that we return false here up to now? I.e. isn't
this a bug?

> +/*
> + * EventTriggerStashCommand
> + *         Save data about a simple DDL command that was just executed
> + */
> +void
> +EventTriggerStashCommand(Oid objectId, uint32 objectSubId, ObjectType objtype,
> +                         Node *parsetree)
> +{
> +    MemoryContext oldcxt;
> +    StashedCommand *stashed;
> +
> +    oldcxt = MemoryContextSwitchTo(currentEventTriggerState->cxt);
> +
> +    stashed = palloc(sizeof(StashedCommand));
> +
> +    stashed->type = SCT_Simple;

I'm not really sure why that subtype is called 'simple'?

> +Datum
> +pg_event_trigger_get_creation_commands(PG_FUNCTION_ARGS)
> +{
> +    ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
> +    TupleDesc    tupdesc;
> +    Tuplestorestate *tupstore;
> +    MemoryContext per_query_ctx;
> +    MemoryContext oldcontext;
> +    ListCell   *lc;
> +
> +    /*
> +     * Protect this function from being called out of context
> +     */
> +    if (!currentEventTriggerState)
> +        ereport(ERROR,
> +                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                 errmsg("%s can only be called in an event trigger function",
> +                        "pg_event_trigger_get_creation_commands()")));

I wonder wether we shouldn't introduce ERRCODE_WRONG_CONTEXT or
such. Seems useful here and in a fair number of other situations. But
not really required, just something that annoyed me before.

> +    foreach(lc, currentEventTriggerState->stash)
> +    {
> +        StashedCommand *cmd = lfirst(lc);
> +        char       *command;
> +
> +        /*
> +         * For IF NOT EXISTS commands that attempt to create an existing
> +         * object, the returned OID is Invalid; in those cases, return an empty
> +         * command instead of trying to soldier on.
> +         *
> +         * XXX an alternative would be to look up the Oid of the existing
> +         * object and run the deparse with that.  But since the parse tree
> +         * might be different from the one that created the object in the first
> +         * place, we might not end up in a consistent state anyway.
> +         */
> +        if (cmd->type == SCT_Simple &&
> +            !OidIsValid(cmd->d.simple.objectId))
> +            continue;

I'd replace the XXX by 'One might think that'. You've perfectly
explained why that's not a viable alternative.

> +        command = deparse_utility_command(cmd);
> +
> +        /*
> +         * Some parse trees return NULL when deparse is attempted; we don't
> +         * emit anything for them.
> +         */
> +        if (command != NULL)
> +        {
> +            Datum        values[9];
> +            bool        nulls[9];
> +            ObjectAddress addr;
> +            int            i = 0;
> +
> +            MemSet(nulls, 0, sizeof(nulls));
> +
> +            if (cmd->type == SCT_Simple)
> +            {
> +                Oid            classId;
> +                Oid            objId;
> +                uint32        objSubId;
> +                const char *tag;
> +                char       *identity;
> +                char       *type;
> +                char       *schema = NULL;
> +
> +                if (cmd->type == SCT_Simple)
> +                {
> +                    classId = get_objtype_catalog_oid(cmd->d.simple.objtype);
> +                    objId = cmd->d.simple.objectId;
> +                    objSubId = cmd->d.simple.objectSubId;
> +                }
> +
> +                tag = CreateCommandTag(cmd->parsetree);
> +                addr.classId = classId;
> +                addr.objectId = objId;
> +                addr.objectSubId = objSubId;
> +
> +                type = getObjectTypeDescription(&addr);
> +                identity = getObjectIdentity(&addr);
> +
> +                /*
> +                 * Obtain schema name, if any ("pg_temp" if a temp object)
> +                 */
> +                if (is_objectclass_supported(addr.classId))
> +                {
> +                    AttrNumber    nspAttnum;
> +
> +                    nspAttnum = get_object_attnum_namespace(addr.classId);
> +                    if (nspAttnum != InvalidAttrNumber)
> +                    {
> +                        Relation    catalog;
> +                        HeapTuple    objtup;
> +                        Oid            schema_oid;
> +                        bool        isnull;
> +
> +                        catalog = heap_open(addr.classId, AccessShareLock);
> +                        objtup = get_catalog_object_by_oid(catalog,
> +                                                           addr.objectId);
> +                        if (!HeapTupleIsValid(objtup))
> +                            elog(ERROR, "cache lookup failed for object %u/%u",
> +                                 addr.classId, addr.objectId);
> +                        schema_oid = heap_getattr(objtup, nspAttnum,
> +                                                  RelationGetDescr(catalog), &isnull);
> +                        if (isnull)
> +                            elog(ERROR, "invalid null namespace in object %u/%u/%d",
> +                                 addr.classId, addr.objectId, addr.objectSubId);
> +                        if (isAnyTempNamespace(schema_oid))
> +                            schema = pstrdup("pg_temp");
> +                        else
> +                            schema = get_namespace_name(schema_oid);
> +
> +                        heap_close(catalog, AccessShareLock);
> +                    }
> +                }
> +
> +                /* classid */
> +                values[i++] = ObjectIdGetDatum(addr.classId);
> +                /* objid */
> +                values[i++] = ObjectIdGetDatum(addr.objectId);
> +                /* objsubid */
> +                values[i++] = Int32GetDatum(addr.objectSubId);
> +                /* command tag */
> +                values[i++] = CStringGetTextDatum(tag);
> +                /* object_type */
> +                values[i++] = CStringGetTextDatum(type);
> +                /* schema */
> +                if (schema == NULL)
> +                    nulls[i++] = true;
> +                else
> +                    values[i++] = CStringGetTextDatum(schema);
> +                /* identity */
> +                values[i++] = CStringGetTextDatum(identity);
> +                /* in_extension */
> +                values[i++] = BoolGetDatum(cmd->in_extension);
> +                /* command */
> +                values[i++] = CStringGetTextDatum(command);
> +            }
> +
> +            tuplestore_putvalues(tupstore, tupdesc, values, nulls);

So: a) we only support SCT_Simple here. b) we add commands whether
they're supported or not? And afaics the only way to discern that is by
looking at 'schema'? If we're not skipping unsupported stuff, shouldn't
we at least set a column indicating that?

> +/* ************************* JSON STUFF FROM HERE ************************* *
> + *    Code below is used to decode blobs returned by deparse_utility_command    *
> + *                                                                            */
> +
> +/*
> + * Note we only support types that are valid in command representation from
> + * deparse_utility_command.
> + */
> +typedef enum
> +{
> +    JsonTypeArray,
> +    JsonTypeObject,
> +    JsonTypeString
> +} JsonType;

There indeed seems to be no existing enum/define for this. Odd.

> +typedef enum
> +{
> +    SpecTypename,
> +    SpecOperatorname,
> +    SpecDottedName,
> +    SpecString,
> +    SpecStringLiteral,
> +    SpecIdentifier
> +} convSpecifier;

Ok, these are all pretty specific to the usecase.

> +/*
> + * Extract the named json field, which must be of type string, from the given
> + * JSON datum, which must be of type object.  If the field doesn't exist,
> + * NULL is returned.  Otherwise the string value is returned.
> + */
> +static char *
> +expand_get_strval(Datum json, char *field_name)
> +{
> +    FunctionCallInfoData fcinfo;
> +    Datum        result;
> +    char       *value_str;
> +
> +    InitFunctionCallInfoData(fcinfo, NULL, 2, InvalidOid, NULL, NULL);
> +
> +    fcinfo.arg[0] = json;
> +    fcinfo.argnull[0] = false;
> +    fcinfo.arg[1] = CStringGetTextDatum(field_name);
> +    fcinfo.argnull[1] = false;
> +
> +    result = (*json_object_field_text) (&fcinfo);

I've not looked it up, but is it actually legal to call functions
without flinfo setup? It seems nicer to revamp this to use
FunctionCallInfoInvoke() rather than doing it ourselves.

> +/*
> + * Given a JSON value, return its type.
> + *
> + * We return both a JsonType (for easy control flow), and a string name (for
> + * error reporting).
> + */
> +static JsonType
> +jsonval_get_type(Datum jsonval, char **typename)
> +{
> +    JsonType    json_elt_type;
> +    Datum        paramtype_datum;
> +    char       *paramtype;
> +
> +    paramtype_datum = DirectFunctionCall1(json_typeof, jsonval);
> +    paramtype = TextDatumGetCString(paramtype_datum);

So, here you suddenly use a different method of invoking functions?

> +    if (strcmp(paramtype, "array") == 0)
> +        json_elt_type = JsonTypeArray;
> +    else if (strcmp(paramtype, "object") == 0)
> +        json_elt_type = JsonTypeObject;
> +    else if (strcmp(paramtype, "string") == 0)
> +        json_elt_type = JsonTypeString;
> +    else
> +        /* XXX improve this; need to specify array index or param name */
> +        elog(ERROR, "unexpected JSON element type %s",
> +             paramtype);

What about that XXX? Seems like the current usage allows that to be
something for the future?


> +/*
> + * dequote_jsonval
> + *        Take a string value extracted from a JSON object, and return a copy of it
> + *        with the quoting removed.
> + *
> + * Another alternative to this would be to run the extraction routine again,
> + * using the "_text" variant which returns the value without quotes; but this
> + * is expensive, and moreover it complicates the logic a lot because not all
> + * values are extracted in the same way (some are extracted using
> + * json_object_field, others using json_array_element).  Dequoting the object
> + * already at hand is a lot easier.
> + */
> +static char *
> +dequote_jsonval(char *jsonval)
> +{
> +    char       *result;
> +    int            inputlen = strlen(jsonval);
> +    int            i;
> +    int            j = 0;
> +
> +    result = palloc(strlen(jsonval) + 1);
> +
> +    /* skip the start and end quotes right away */
> +    for (i = 1; i < inputlen - 1; i++)
> +    {
> +        if (jsonval[i] == '\\')
> +        {
> +            i++;
> +
> +            /* This uses same logic as json.c */
> +            switch (jsonval[i])
> +            {
> +                case 'b':
> +                    result[j++] = '\b';
> +                    continue;
> +                case 'f':
> +                    result[j++] = '\f';
> +                    continue;
> +                case 'n':
> +                    result[j++] = '\n';
> +                    continue;
> +                case 'r':
> +                    result[j++] = '\r';
> +                    continue;
> +                case 't':
> +                    result[j++] = '\t';
> +                    continue;
> +                case '"':
> +                case '\\':
> +                case '/':
> +                    break;
> +                default:
> +                    /* XXX: ERROR? */
> +                    break;
> +            }
> +        }
> +
> +        result[j++] = jsonval[i];
> +    }
> +    result[j] = '\0';
> +
> +    return result;
> +}

This looks like something that really should live at least partially in json.c?

> +/*
> + * Expand a json value as a dotted-name.  The value must be of type object
> + * and must contain elements "schemaname" (optional), "objname" (mandatory),
> + * "attrname" (optional).
> + *
> + * XXX do we need a "catalogname" as well?
> + */

I'd replace the XXX by 'One day we might need "catalogname" as well, but
no current ..."

> +static void
> +expand_jsonval_dottedname(StringInfo buf, Datum jsonval)
> +{
> +    char       *schema;
> +    char       *objname;
> +    char       *attrname;
> +    const char *qschema;
> +    const char *qname;
> +
> +    schema = expand_get_strval(jsonval, "schemaname");
> +    objname = expand_get_strval(jsonval, "objname");
> +    if (objname == NULL)
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                 errmsg("invalid NULL object name in %%D element")));
> +    qname = quote_identifier(objname);
> +    if (schema == NULL)
> +    {
> +        appendStringInfo(buf, "%s", qname);

*appendStringInfoString().

> +    }
> +    else
> +    {
> +        qschema = quote_identifier(schema);
> +        appendStringInfo(buf, "%s.%s",
> +                         qschema, qname);
> +        if (qschema != schema)
> +            pfree((char *) qschema);
> +        pfree(schema);
> +    }
> +
> +    attrname = expand_get_strval(jsonval, "attrname");
> +    if (attrname)
> +    {
> +        const char *qattr;
> +
> +        qattr = quote_identifier(attrname);
> +        appendStringInfo(buf, ".%s", qattr);
> +        if (qattr != attrname)
> +            pfree((char *) qattr);
> +        pfree(attrname);
> +    }

Hm. Does specifying schema, object and attrname result ins oemthing
valid with this? Might not ever be required, but in that case it should
probably error out

> +/*
> + * expand a json value as a type name.
> + */
> +static void
> +expand_jsonval_typename(StringInfo buf, Datum jsonval)
> +{
> +    char       *schema = NULL;
> +    char       *typename;
> +    char       *typmodstr;
> +    bool        array_isnull;
> +    bool        is_array;
> +
> +    typename = expand_get_strval(jsonval, "typename");
> +    if (typename == NULL)
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                 errmsg("invalid NULL type name in %%T element")));
> +    typmodstr = expand_get_strval(jsonval, "typmod");    /* OK if null */
> +    is_array = expand_get_boolval(jsonval, "is_array", &array_isnull);
> +    schema = expand_get_strval(jsonval, "schemaname");
> +
> +    /*
> +     * If schema is NULL, then don't schema qualify, but do quote the type
> +     * name; if the schema is empty instead, then we don't quote the type name.
> +     * This is our (admittedly quite ugly) way of dealing with type names that
> +     * might require special treatment.
> +     */

Indeed, really quite ugly. What's it needed for? Maybe reference the
function that needs it to explain.

> +
> +/*
> + * Expand a json value as an operator name
> + */
> +static void
> +expand_jsonval_operator(StringInfo buf, Datum jsonval)
> +{
> +    char       *schema = NULL;
> +    char       *operator;
> +
> +    operator = expand_get_strval(jsonval, "objname");
> +    if (operator == NULL)
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                 errmsg("invalid NULL operator name in %%O element")));
> +    schema = expand_get_strval(jsonval, "schemaname");
> +
> +    /* schema might be NULL or empty */

Why can it be empty here?

> +    if (schema == NULL || schema[0] == '\0')
> +        appendStringInfo(buf, "%s", operator);
> +    else
> +        appendStringInfo(buf, "%s.%s",
> +                         quote_identifier(schema),
> +                         operator);
> +}

> +/*
> + * Expand a json value as a string.  The value must be of type string or of
> + * type object, in which case it must contain a "fmt" element which will be
> + * recursively expanded; also, if the object contains an element "present"
> + * and it is set to false, the expansion is the empty string.
> + */
> +static void
> +expand_jsonval_string(StringInfo buf, Datum jsonval, JsonType json_elt_type)
> +{
> +    if (json_elt_type == JsonTypeString)
> +    {
> +        char       *str;
> +        char       *unquoted;
> +
> +        str = TextDatumGetCString(jsonval);
> +        unquoted = dequote_jsonval(str);
> +        appendStringInfo(buf, "%s", unquoted);
> +        pfree(str);
> +        pfree(unquoted);
> +    }
> +    else if (json_elt_type == JsonTypeObject)
> +    {
> +        bool        present;
> +        bool        isnull;
> +        present = expand_get_boolval(jsonval, "present", &isnull);

Nitpick, but I find isnull slightly confusing as a name here.

> +        if (isnull || present)
> +        {
> +            Datum        inner;
> +            char       *str;
> +
> +            inner = DirectFunctionCall1(pg_event_trigger_expand_command,
> +                                        jsonval);
> +            str = TextDatumGetCString(inner);
> +
> +            appendStringInfoString(buf, str);
> +            pfree(DatumGetPointer(inner));
> +            pfree(str);
> +        }
> +    }
> +}

> +/*
> + * Expand a json value as a string literal
> + */
> +static void
> +expand_jsonval_strlit(StringInfo buf, Datum jsonval)
> +{
> +    char   *str;
> +    char   *unquoted;
> +    StringInfoData dqdelim;
> +    static const char dqsuffixes[] = "_XYZZYX_";
> +    int         dqnextchar = 0;
> +
> +    /* obtain the string, and remove the JSON quotes and stuff */
> +    str = TextDatumGetCString(jsonval);
> +    unquoted = dequote_jsonval(str);
> +
> +    /* easy case: if there are no ' and no \, just use a single quote */
> +    if (strchr(unquoted, '\'') == NULL &&
> +        strchr(unquoted, '\\') == NULL)
> +    {
> +        appendStringInfo(buf, "'%s'", unquoted);
> +        return;
> +    }
> +
> +    /* Find a useful dollar-quote delimiter */
> +    initStringInfo(&dqdelim);
> +    appendStringInfoString(&dqdelim, "$");
> +    while (strstr(unquoted, dqdelim.data) != NULL)
> +    {
> +        appendStringInfoChar(&dqdelim, dqsuffixes[dqnextchar++]);
> +        dqnextchar %= sizeof(dqsuffixes) - 1;
> +    }
> +    /* add trailing $ */
> +    appendStringInfoChar(&dqdelim, '$');
> +
> +    /* And finally produce the quoted literal into the output StringInfo */
> +    appendStringInfo(buf, "%s%s%s", dqdelim.data, unquoted, dqdelim.data);
> +}

Ugh. Do we really need this logic here? Doesn't that already exist
somewhere?

> +/*
> + * Expand one json element according to rules.
> + */

What are rules?

> +static void
> +expand_one_element(StringInfo buf, char *param,
> +                   Datum jsonval, char *valtype, JsonType json_elt_type,
> +                   convSpecifier specifier)
> +{
> +    /*
> +     * Validate the parameter type.  If dotted-name was specified, then a JSON
> +     * object element is expected; if an identifier was specified, then a JSON
> +     * string is expected.    If a string was specified, then either a JSON
> +     * object or a string is expected.
> +     */
> +    if (specifier == SpecDottedName && json_elt_type != JsonTypeObject)
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                 errmsg("expected JSON object for %%D element \"%s\", got %s",
> +                        param, valtype)));
> +    if (specifier == SpecTypename && json_elt_type != JsonTypeObject)
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                 errmsg("expected JSON object for %%T element \"%s\", got %s",
> +                        param, valtype)));
> +    if (specifier == SpecOperatorname && json_elt_type != JsonTypeObject)
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                 errmsg("expected JSON object for %%O element \"%s\", got %s",
> +                        param, valtype)));
> +    if (specifier == SpecIdentifier && json_elt_type != JsonTypeString)
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                 errmsg("expected JSON string for %%I element \"%s\", got %s",
> +                        param, valtype)));
> +    if (specifier == SpecStringLiteral && json_elt_type != JsonTypeString)
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                 errmsg("expected JSON string for %%L element \"%s\", got %s",
> +                        param, valtype)));
> +    if (specifier == SpecString &&
> +        json_elt_type != JsonTypeString && json_elt_type != JsonTypeObject)
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                 errmsg("expected JSON string or object for %%s element \"%s\", got %s",
> +                        param, valtype)));

Isn't this essentially a duplication of the switch below?

> +    switch (specifier)
> +    {
> +        case SpecIdentifier:
> +            expand_jsonval_identifier(buf, jsonval);
> +            break;
> +
> +        case SpecDottedName:
> +            expand_jsonval_dottedname(buf, jsonval);
> +            break;
> +
> +        case SpecString:
> +            expand_jsonval_string(buf, jsonval, json_elt_type);
> +            break;
> +
> +        case SpecStringLiteral:
> +            expand_jsonval_strlit(buf, jsonval);
> +            break;
> +
> +        case SpecTypename:
> +            expand_jsonval_typename(buf, jsonval);
> +            break;
> +
> +        case SpecOperatorname:
> +            expand_jsonval_operator(buf, jsonval);
> +            break;
> +    }
> +}
> +
> +/*
> + * Expand one JSON array element according to rules.
> + */

The generic "rules" again...

> +#define ADVANCE_PARSE_POINTER(ptr,end_ptr) \
> +    do { \
> +        if (++(ptr) >= (end_ptr)) \
> +            ereport(ERROR, \
> +                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE), \
> +                     errmsg("unterminated format specifier"))); \
> +    } while (0)
> +
> +/*------
> + * Returns a formatted string from a JSON object.
> + *
> + * The starting point is the element named "fmt" (which must be a string).
> + * This format string may contain zero or more %-escapes, which consist of an
> + * element name enclosed in { }, possibly followed by a conversion modifier,
> + * followed by a conversion specifier.    Possible conversion specifiers are:
> + *
> + * %        expand to a literal %.
> + * I        expand as a single, non-qualified identifier
> + * D        expand as a possibly-qualified identifier
> + * T        expand as a type name
> + * O        expand as an operator name
> + * L        expand as a string literal (quote using single quotes)
> + * s        expand as a simple string (no quoting)
> + *
> + * The element name may have an optional separator specification preceded
> + * by a colon.    Its presence indicates that the element is expected to be
> + * an array; the specified separator is used to join the array elements.
> + *
> + * XXX the current implementation works fine, but is likely to be slow: for
> + * each element found in the fmt string we parse the JSON object once.    It
> + * might be better to use jsonapi.h directly so that we build a hash or tree of
> + * elements and their values once before parsing the fmt string, and later scan
> + * fmt using the tree.
> + *------
> + */
> +Datum
> +pg_event_trigger_expand_command(PG_FUNCTION_ARGS)
> +{
> +    text       *json = PG_GETARG_TEXT_P(0);
> +    char       *fmt_str;
> +    int            fmt_len;
> +    const char *cp;
> +    const char *start_ptr;
> +    const char *end_ptr;
> +    StringInfoData str;
> +
> +    fmt_str = expand_get_strval(PointerGetDatum(json), "fmt");
> +    if (fmt_str == NULL)
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                 errmsg("invalid NULL format string")));
> +    fmt_len = strlen(fmt_str);
> +
> +    start_ptr = fmt_str;
> +    end_ptr = start_ptr + fmt_len;
> +    initStringInfo(&str);
> +
> +    for (cp = start_ptr; cp < end_ptr; cp++)
> +    {

I haven't checked, but I'm not sure the code here handles multibyte
stuff correctly. IIRC it's legal for multibyte characters to appear
directly in json?

> +        /*
> +         * Obtain the element to be expanded.  Note we cannot use
> +         * DirectFunctionCall here, because the element might not exist.
> +         */

"and the return value thus could be NULL."


> +/*
> + * Append a string parameter to a tree.
> + *
> + * Note: we don't pstrdup the source string.  Caller must ensure the
> + * source string lives long enough.
> + */
> +static void

Why this difference to most other functions?

> +/* Allocate a new object parameter */
> +static ObjElem *
> +new_object_object(char *name, ObjTree *value)
> +{
> +    ObjElem    *param;
> +
> +    param = palloc0(sizeof(ObjElem));
> +
> +    param->name = name ? pstrdup(name) : NULL;
> +    param->objtype = ObjTypeObject;
> +    param->obj_value = value;    /* XXX not duped */

Aha? Why? Because it's freshly allocated anyhow? In which case the XXX
doesn't seem warranted?

> +/*
> + * Create a JSON blob from our ad-hoc representation.
> + *
> + * Note this leaks some memory; caller is responsible for later clean up.
> + *
> + * XXX this implementation will fail if there are more JSON objects in the tree
> + * than the maximum number of columns in a heap tuple.  To fix we would first call
> + * construct_md_array and then json_object.
> + */

I have no idea what that XXX is supposed to say to me. Note that there's
no construct_md_array in the patchset.
> +static char *
> +jsonize_objtree(ObjTree *tree)
> +{

> +    slist_foreach(iter, &tree->params)
> +    {
> +
> +        nulls[i - 1] = false;
> +        switch (object->objtype)
> +        {
> +            case ObjTypeNull:
> +                nulls[i - 1] = true;
> +                break;
> +            case ObjTypeBool:
> +                values[i - 1] = BoolGetDatum(object->bool_value);
> +                break;
> +            case ObjTypeString:
> +                values[i - 1] = CStringGetTextDatum(object->str_value);
> +                break;
> +            case ObjTypeArray:
> +                {
> +                    ArrayType  *arrayt;
> +                    Datum       *arrvals;
> +                    Datum        jsonary;
> +                    ListCell   *cell;
> +                    int            length = list_length(object->array_value);
> +                    int            j;
> +
> +                    /*
> +                     * Arrays are stored as Lists up to this point, with each
> +                     * element being a ObjElem; we need to construct an
> +                     * ArrayType with them to turn the whole thing into a JSON
> +                     * array.
> +                     */
> +                    j = 0;
> +                    arrvals = palloc(sizeof(Datum) * length);
> +                    foreach(cell, object->array_value)
> +                    {
> +                        ObjElem    *elem = lfirst(cell);
> +
> +                        switch (elem->objtype)
> +                        {
> +                            case ObjTypeString:
> +                                arrvals[j] =
> +                                    /*
> +                                     * XXX need quotes around the value.  This
> +                                     * needs to be handled differently because
> +                                     * it will fail for values of anything but
> +                                     * trivial complexity.
> +                                     */
> +                                    CStringGetTextDatum(psprintf("\"%s\"",
> +                                                                 elem->str_value));
> +                                break;

Didn't you earlier add a function doing most of this? That XXX looks
like something that definitely needs to be addressed before the commit.


> From ea8997de828931e954e1a52821ffc834370859d2 Mon Sep 17 00:00:00 2001
> From: Alvaro Herrera <alvherre@alvh.no-ip.org>
> Date: Thu, 25 Sep 2014 15:50:37 -0300
> Subject: [PATCH 06/27] deparse: sprinkle EventTriggerStashCommand() calls

Seems like a odd separate commit ;). What's the point of not squashing
it?


XXX: I'm now too tired to continue honestly. Man, that's a fair amount
of code... Will try to continue it. Once I've made it through once I
hopefully also give some sensible higher level comments.

Greetings,

Andres Freund

Re: Add CREATE support to event triggers

От

Michael Paquier

Дата:

26 сентября 2014 г., 02:08:59

On Fri, Sep 26, 2014 at 6:59 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Actually here's a different split of these patches, which I hope makes
> more sense.  My intention here is that patches 0001 to 0004 are simple
> changes that can be pushed right away; they are not directly related to
> the return-creation-command feature.  Patches 0005 to 0027 implement
> that feature incrementally.  You can see in patch 0005 the DDL commands
> that are still not implemented in deparse (they are the ones that have
> an elog(ERROR) rather than a "command = NULL").  Patch 0006 adds calls
> in ProcessUtilitySlow() to each command, so that the object(s) being
> touched are added to the event trigger command stash.
>
> Patches from 0007 to 0027 (excepting patch 0017) implement one or a
> small number of commands in deparse.  Patch 0017 is necessary
> infrastructure in ALTER TABLE to support deparsing that one.
>
> My intention with the later patches is that they would all be pushed as
> a single commit, i.e. the deparse support would be implemented for all
> commands in a fell swoop rather than piecemeal -- except possibly patch
> 0017 (the ALTER TABLE infrastructure).  I split them up only for ease of
> review.  Of course, before pushing we (I) need to implement deparsing
> for all the remaining commands.

Thanks for the update. This stuff is still on my TODO list and I was
planning to look at it at some extent today btw :) Andres has already
sent some comments (20140925235724.GH16581@awork2.anarazel.de) though,
I'll try to not overlap with what has already been mentioned.

Patch 1: I still like this patch as it gives a clear separation of the
built-in functions and the sub-functions of ruleutils.c that are
completely independent. Have you considered adding the external
declaration of quote_all_identifiers as well? It is true that this
impacts extensions (some of my stuff as well), but my point is to bite
the bullet and make the separation cleaner between builtins.h and
ruleutils.h. Attached is a patch that can be applied on top of patch 1
doing so... Feel free to discard for the potential breakage this would
create though.

Patch 2:
1) Declarations of renameatt are inconsistent:
@tablecmds.c:
-renameatt(RenameStmt *stmt)
+renameatt(RenameStmt *stmt, int32 *objsubid)
@tablecmds.h:
-extern Oid     renameatt(RenameStmt *stmt);
+extern Oid     renameatt(RenameStmt *stmt, int *attnum);
Looking at this code, for correctness renameatt should use everywhere
"int *attnum" and ExecRenameStmt should use "int32 *subobjid".
2) Just a smallish picky formatting remark, I would have done that on
a single line:
+       attnum =
+               renameatt_internal(relid,
3) in ExecRenameStmt, I think that you should set subobjid for
aggregate, collations, fdw, etc. There is an access to ObjectAddress
so that's easy to set using address.objectSubId.
4) This comment is misleading, as for an attribute what is returned is
actually an attribute number:
+ * Return value is the OID of the renamed object.  The objectSubId, if any,
+ * is returned in objsubid.
  */
5) Perhaps it is worth mentioning at the top of renameatt that the
attribute number is returned as well?

Patch 3: Looks fine, a bit of testing is showing up that this works as expected:
=# CREATE ROLE foo;
CREATE ROLE
=# CREATE TABLE aa (a int);
CREATE TABLE
=# CREATE OR REPLACE FUNCTION abort_grant()
RETURNS event_trigger AS $$
BEGIN
RAISE NOTICE 'Event: %', tg_event;
RAISE EXCEPTION 'Execution of command % forbidden', tg_tag;
END;
$$ LANGUAGE plpgsql;
CREATE FUNCTION
=# CREATE EVENT TRIGGER abort_grant_trigger ON ddl_command_start
WHEN TAG IN ('GRANT', 'REVOKE') EXECUTE PROCEDURE abort_grant();
CREATE EVENT TRIGGER
=# GRANT SELECT ON aa TO foo;
NOTICE:  00000: Event: ddl_command_start
LOCATION:  exec_stmt_raise, pl_exec.c:3068
ERROR:  P0001: Execution of command GRANT forbidden
LOCATION:  exec_stmt_raise, pl_exec.c:3068
=# DROP EVENT TRIGGER abort_grant_trigger;
DROP EVENT TRIGGER
=# CREATE EVENT TRIGGER abort_grant_trigger ON ddl_command_end
WHEN TAG IN ('GRANT', 'REVOKE') EXECUTE PROCEDURE abort_grant();
CREATE EVENT TRIGGER
=# REVOKE SELECT ON aa FROM foo;
NOTICE:  00000: Event: ddl_command_end
LOCATION:  exec_stmt_raise, pl_exec.c:3068
ERROR:  P0001: Execution of command REVOKE forbidden
LOCATION:  exec_stmt_raise, pl_exec.c:3068

Patch 4: Shouldn't int32 be used instead of uint32 in the declaration
of CommentObject? And yes, adding support for only ddl_command_start
and ddl_command_end is enough. Let's not play with dropped objects in
this area... Similarly to the test above:
=# comment on table aa is 'test';
NOTICE:  00000: Event: ddl_command_start
LOCATION:  exec_stmt_raise, pl_exec.c:3068
ERROR:  P0001: Execution of command COMMENT forbidden
LOCATION:  exec_stmt_raise, pl_exec.c:3068
Regards,
--
Michael

Вложения

20140926_ruleutils_quote.patch

Re: Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

08 октября 2014 г., 21:26:14

About patch 1, which I just pushed, there were two reviews:

Andres Freund wrote:
> On 2014-09-25 18:59:31 -0300, Alvaro Herrera wrote:
> > diff --git a/src/include/utils/ruleutils.h b/src/include/utils/ruleutils.h
> > new file mode 100644
> > index 0000000..520b066
> > --- /dev/null
> > +++ b/src/include/utils/ruleutils.h

> I wondered for a minute whether any of these are likely to cause
> problems for code just including builtins.h - but I think there will be
> sufficiently few callers for them to make that not much of a concern.

Great.

Michael Paquier wrote:

> Patch 1: I still like this patch as it gives a clear separation of the
> built-in functions and the sub-functions of ruleutils.c that are
> completely independent. Have you considered adding the external
> declaration of quote_all_identifiers as well? It is true that this
> impacts extensions (some of my stuff as well), but my point is to bite
> the bullet and make the separation cleaner between builtins.h and
> ruleutils.h. Attached is a patch that can be applied on top of patch 1
> doing so... Feel free to discard for the potential breakage this would
> create though.

Yes, I did consider moving the quoting function definitions and the
global out of builtins.h.  It is much more disruptive, however, because
it affects a lot of external code not only our own; and given the looks
I got after people had to mess with adding the htup_details.h header to
external projects due to the refactoring I did to htup.h in an earlier
release, I'm not sure about it.

However, if I were to do it, I would instead create a quote.h file and
would also add the quote_literal_cstr() prototype to it, perhaps even
move the implementations from their current ruleutils.c location to
quote.c.  (Why is half the stuff in ruleutils.c rather than quote.c is
beyond me.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Add CREATE support to event triggers

От

Michael Paquier

Дата:

08 октября 2014 г., 23:29:06

<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Thu, Oct 9, 2014 at 6:26 AM, Alvaro
Herrera<span dir="ltr"><<a href="mailto:alvherre@2ndquadrant.com"
target="_blank">alvherre@2ndquadrant.com</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px#ccc solid;padding-left:1ex">About patch 1, which I just pushed, there were two reviews:<br /><span
class=""><br/> Andres Freund wrote:<br /> > On 2014-09-25 18:59:31 -0300, Alvaro Herrera wrote:<br /></span><span
class="">>> diff --git a/src/include/utils/ruleutils.h b/src/include/utils/ruleutils.h<br /> > > new file
mode100644<br /> > > index 0000000..520b066<br /> > > --- /dev/null<br /> > > +++
b/src/include/utils/ruleutils.h<br/><br /></span><span class="">> I wondered for a minute whether any of these are
likelyto cause<br /> > problems for code just including builtins.h - but I think there will be<br /> >
sufficientlyfew callers for them to make that not much of a concern.<br /><br /></span>Great.<br /><span class=""><br
/>Michael Paquier wrote:<br /><br /> > Patch 1: I still like this patch as it gives a clear separation of the<br />
>built-in functions and the sub-functions of ruleutils.c that are<br /> > completely independent. Have you
consideredadding the external<br /> > declaration of quote_all_identifiers as well? It is true that this<br /> >
impactsextensions (some of my stuff as well), but my point is to bite<br /> > the bullet and make the separation
cleanerbetween builtins.h and<br /> > ruleutils.h. Attached is a patch that can be applied on top of patch 1<br />
>doing so... Feel free to discard for the potential breakage this would<br /> > create though.<br /><br
/></span>Yes,I did consider moving the quoting function definitions and the<br /> global out of builtins.h.  It is much
moredisruptive, however, because<br /> it affects a lot of external code not only our own; and given the looks<br /> I
gotafter people had to mess with adding the htup_details.h header to<br /> external projects due to the refactoring I
didto htup.h in an earlier<br /> release, I'm not sure about it.<br /><br /> However, if I were to do it, I would
insteadcreate a quote.h file and<br /> would also add the quote_literal_cstr() prototype to it, perhaps even<br /> move
theimplementations from their current ruleutils.c location to<br /> quote.c.  (Why is half the stuff in ruleutils.c
ratherthan quote.c is<br /> beyond me.)<br /></blockquote>Yes, an extra header file would be cleaner. Now if we are
worriedabout creating much incompatibilities with existing extension (IMO that's not something core should worry much
about),then it is better not to do it. Now, if I can vote on it, I think it is worth doing in order to have builtins.h
onlycontain only Datum foo(PG_FUNCTION_ARGS), and move all the quote-related things in quote.c.<br /></div>-- <br
/>Michael<br/></div></div>

Re: Add CREATE support to event triggers

От

Michael Paquier

Дата:

11 октября 2014 г., 13:32:24

On Thu, Oct 9, 2014 at 8:29 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Oct 9, 2014 at 6:26 AM, Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:
>> However, if I were to do it, I would instead create a quote.h file and
>> would also add the quote_literal_cstr() prototype to it, perhaps even
>> move the implementations from their current ruleutils.c location to
>> quote.c.  (Why is half the stuff in ruleutils.c rather than quote.c is
>> beyond me.)
>
> Yes, an extra header file would be cleaner. Now if we are worried about
> creating much incompatibilities with existing extension (IMO that's not
> something core should worry much about), then it is better not to do it.
> Now, if I can vote on it, I think it is worth doing in order to have
> builtins.h only contain only Datum foo(PG_FUNCTION_ARGS), and move all the
> quote-related things in quote.c.
The attached patch implements this idea, creating a new header quote.h
in include/utils that groups all the quote-related functions. This
makes the code more organized.
Regards,
--
Michael

Вложения

20141011_quote_h_refactor.patch

Re: Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

13 октября 2014 г., 03:49:16

Alvaro Herrera wrote:
> Actually here's a different split of these patches, which I hope makes
> more sense.  My intention here is that patches 0001 to 0004 are simple
> changes that can be pushed right away; they are not directly related to
> the return-creation-command feature.  Patches 0005 to 0027 implement
> that feature incrementally.  You can see in patch 0005 the DDL commands
> that are still not implemented in deparse (they are the ones that have
> an elog(ERROR) rather than a "command = NULL").  Patch 0006 adds calls
> in ProcessUtilitySlow() to each command, so that the object(s) being
> touched are added to the event trigger command stash.
>
> Patches from 0007 to 0027 (excepting patch 0017) implement one or a
> small number of commands in deparse.  Patch 0017 is necessary
> infrastructure in ALTER TABLE to support deparsing that one.
>
> My intention with the later patches is that they would all be pushed as
> a single commit, i.e. the deparse support would be implemented for all
> commands in a fell swoop rather than piecemeal -- except possibly patch
> 0017 (the ALTER TABLE infrastructure).  I split them up only for ease of
> review.  Of course, before pushing we (I) need to implement deparsing
> for all the remaining commands.

Here's a new version of this series.  The main change is that I've
changed deparse_utility.c to generate JSON, and the code that was in
commands/event_trigger.c to decode that JSON, so that it uses the new
Jsonb API instead.  In addition, I've moved the new code that was in
commands/event_trigger.c to utils/adt/ddl_json.c.  (The only entry point
of the new file is the SQL-callable pg_event_trigger_expand_command()
function, and its purpose is to expand a JSON object emitted by the
deparse_utility.c code back into a plain text SQL command.)

I have also cleaned up the code per comments from Michael Paquier and
Andres Freund:

* the GRANT support for event triggers now correctly ignores global
objects.

* COMMENT ON .. IS NULL no longer causes a crash

* renameatt() and ExecRenameStmt are consistent in returning the
objSubId as an "int" (not int32).  This is what is used as objectSubId
in ObjectAddress, which is what we're using this value for.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Add CREATE support to event triggers

От

Michael Paquier

Дата:

16 октября 2014 г., 07:01:17

On Mon, Oct 13, 2014 at 12:45 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Here's a new version of this series.  The main change is that I've
> changed deparse_utility.c to generate JSON, and the code that was in
> commands/event_trigger.c to decode that JSON, so that it uses the new
> Jsonb API instead.  In addition, I've moved the new code that was in
> commands/event_trigger.c to utils/adt/ddl_json.c.  (The only entry point
> of the new file is the SQL-callable pg_event_trigger_expand_command()
> function, and its purpose is to expand a JSON object emitted by the
> deparse_utility.c code back into a plain text SQL command.)
> I have also cleaned up the code per comments from Michael Paquier and
> Andres Freund:
>
> * the GRANT support for event triggers now correctly ignores global
> objects.
>
> * COMMENT ON .. IS NULL no longer causes a crash
Why would this not fail? Patch 3 in this set is identical to the last
one. I tested that and it worked well...

> * renameatt() and ExecRenameStmt are consistent in returning the
> objSubId as an "int" (not int32).  This is what is used as objectSubId
> in ObjectAddress, which is what we're using this value for.

In patch 1, ExecRenameStmt returns objsubid for an attribute name when
rename is done on an attribute. Now could you clarify why we skip this
list of objects even if their sub-object ID is available with
address.objectSubId?               case OBJECT_AGGREGATE:               case OBJECT_COLLATION:               case
OBJECT_CONVERSION:              case OBJECT_EVENT_TRIGGER:               case OBJECT_FDW:               case
OBJECT_FOREIGN_SERVER:              case OBJECT_FUNCTION:               case OBJECT_OPCLASS:               case
OBJECT_OPFAMILY:              case OBJECT_LANGUAGE:               case OBJECT_TSCONFIGURATION:               case
OBJECT_TSDICTIONARY:              case OBJECT_TSPARSER:               case OBJECT_TSTEMPLATE:

Patch 2 fails on make check for the tests privileges and foreign_data
by showing up double amount warnings for some object types:
*** /Users/ioltas/git/postgres/src/test/regress/expected/privileges.out
Fri Oct 10 14:34:10 2014
--- /Users/ioltas/git/postgres/src/test/regress/results/privileges.outThu Oct 16 15:47:42 2014
***************
*** 118,123 ****
--- 118,124 ---- ERROR:  permission denied for relation atest2 GRANT ALL ON atest1 TO PUBLIC; -- fail WARNING:  no
privilegeswere granted for "atest1"

+ WARNING:  no privileges were granted for "atest1" -- checks in subquery, both ok SELECT * FROM atest1 WHERE ( b IN (
SELECTcol1 FROM atest2 ) );  a | b

EventTriggerSupportsGrantObjectType is fine to remove
ACL_OBJECT_DATABASE and ACL_OBJECT_TABLESPACE from the list of
supported objects. That's as well in line with the current firing
matrix. I think that it would be appropriate to add a comment on top
of this function.

Patch 3 looks good.
Regards,
-- 
Michael

Re: Add CREATE support to event triggers

От

Michael Paquier

Дата:

28 октября 2014 г., 07:41:14

On Thu, Oct 16, 2014 at 4:01 PM, Michael Paquier <michael.paquier@gmail.com> wrote:

On Mon, Oct 13, 2014 at 12:45 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Here's a new version of this series.

Here is some input on patch 4:

1) Use of OBJECT_ATTRIBUTE:

+               case OBJECT_ATTRIBUTE:
+                       catalog_id = TypeRelationId;    /* XXX? */
+                       break;

I think that the XXX mark could be removed, using TypeRelationId is correct IMO as OBJECT ATTRIBUTE is used when managing an attribute object, which is a custom type used for CREATE/ALTER TYPE.

2) This patch is showing up many warnings, among them:
event_trigger.c:1460:20: note: uninitialized use occurs here
                                addr.classId = classId;
                                               ^~~~~~~
event_trigger.c:1446:21: note: initialize the variable 'objSubId' to silence this warning
                                uint32          objSubId;
                                                        ^
                                                         = 0

Or:
deparse_utility.c:301:1: warning: unused function 'append_object_object' [-Wunused-function]
append_object_object(ObjTree *tree, char *name, ObjTree *value)
^

In the 2nd case though I imagine that those functions in deparse_utility.c are used afterwards... There are a couple of other warning related to SCT_Simple but that's normal as long as 17 or 24 are not combined with it.
3) What's that actually?
+/* XXX merge this with ObjectTypeMap? */
static event_trigger_support_data event_trigger_support[] = {

We may be able to do something smarter with event_trigger_support[], but as this consists in a mapping of subcommands associated with CREATE/DROP/ALTER and is only a reverse engineering operation of somewhat AlterObjectTypeCommandTag or CreateCommandTag, I am not sure how you could merge that. Some input perhaps?
4)
+/*
+ * EventTriggerStashCommand
+ * Save data about a simple DDL command that was just executed
+ */

Shouldn't this be "single" instead of "simple"?

5) I think that SCT_Simple should be renamed as SCT_Single

+typedef enum StashedCommandType
+{
+ SCT_Simple,
+} StashedCommandType;

This comment holds as well for deparse_simple_command.

+               command = deparse_utility_command(cmd);
+
+               /*
+                * Some parse trees return NULL when deparse is attempted; we don't
+                * emit anything for them.
+                */
+               if (command != NULL)
+               {

Small detail, but you may here just use a continue to make the code a bit more readable after deparsing the command.
7) pg_event_trigger_get_creation_commands is modified as well in patches 17 and 24. You may as well use an enum on cmd->type.

8) Rejoining a comment done earlier by Andres, it would be nice to have ERRCODE_WRONG_CONTEXT (unrelated to this patch). ERRCODE_FEATURE_NOT_SUPPORTED seems rather a different error type...
9) Those declarations are not needed in event_trigger.c:
+#include "utils/json.h"
+#include "utils/jsonb.h"

10) Would you mind explaining what means "fmt"?
+ * Allocate a new object tree to store parameter values -- varargs version.
+ *
+ * The "fmt" argument is used to append as a "fmt" element in the output blob.

11) deparse_utility.c mentions here and there JSON objects, but what is created are JSONB objects. I'd rather be clear here.

12) Already mentioned before, but this reverse engineering machine for types would be more welcome as a set of functions in lsyscache (one for namespace, one for type name and one for is_array). For typemodstr the need is different as it uses printTypmod.

+void
+format_type_detailed(Oid type_oid, int32 typemod,
+                                        Oid *nspid, char **typname, char **typemodstr,
+                                        bool *is_array)

13) This change seems unrelated to this patch...
-       int                     type = 0;
+       JsonbIteratorToken type = WJB_DONE;
        JsonbValue      v;
        int                     level = 0;
        bool            redo_switch = false;
@@ -454,7 +454,7 @@ JsonbToCString(StringInfo out, JsonbContainer *in, int estimated_len)
                                first = false;
                                break;
                        default:
-                               elog(ERROR, "unknown flag of jsonb iterator");
+                               elog(ERROR, "unknown jsonb iterator token type");

14) This could already be pushed as a separate commit:

-extern bool creating_extension;
+extern PGDLLIMPORT bool creating_extension;

A couple of comments: this patch introduces a basic infrastructure able to do the following set of operations:

- Obtention of parse tree using StashedCommand

- Reorganization of parse tree to become an ObjTree, with boolean, array
- Reorganization of ObjTree to a JsonB value

I am actually a bit doubtful about why we actually need this intermediate ObjTree step... What would be wrong in manipulating a JSON(B) parsed tree, that is first empty, and pushing key/value objects to it when processing each command? Something moving toward in this direction is that ObjTree has some logic to manipulate booleans, null values, etc. This results in some duplication with what jsonb and json can actually do when creating when manipulating strings, as well as the extra processing like objtree_to_jsonb_element. ddl_json.c (should be ddl_jsonb.c?) would as well take more its sense as it directly manipulates JSONB containers.

Regards,

--
Michael

Re: Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

28 октября 2014 г., 08:30:56

Hi Michael, thanks for the review.

Michael Paquier wrote:
> On Thu, Oct 16, 2014 at 4:01 PM, Michael Paquier <michael.paquier@gmail.com>
> wrote:
> 
> > On Mon, Oct 13, 2014 at 12:45 PM, Alvaro Herrera
> > <alvherre@2ndquadrant.com> wrote:
> > > Here's a new version of this series.
> 
>  Here is some input on patch 4:
> 
> 1) Use of OBJECT_ATTRIBUTE:
> +               case OBJECT_ATTRIBUTE:
> +                       catalog_id = TypeRelationId;    /* XXX? */
> +                       break;
> I think that the XXX mark could be removed, using TypeRelationId is correct
> IMO as OBJECT ATTRIBUTE is used when managing an attribute object, which is
> a custom type used for CREATE/ALTER TYPE.

Agreed.

> 2) This patch is showing up many warnings, among them:

Will check.

> 3) What's that actually?
> +/* XXX merge this with ObjectTypeMap? */
>  static event_trigger_support_data event_trigger_support[] = {
> We may be able to do something smarter with event_trigger_support[], but as
> this consists in a mapping of subcommands associated with CREATE/DROP/ALTER
> and is only a reverse engineering operation of somewhat
> AlterObjectTypeCommandTag  or CreateCommandTag, I am not sure how you could
> merge that. Some input perhaps?

ObjectTypeMap is part of the patch that handles DROP for replication;
see my other patch in commitfest.  I am also not sure how to merge all
this stuff; with these patches, we would have three "do event triggers
support this object type" functions, so I was thinking in having maybe a
text file from which these functions are generated from a perl script or
something.  But for now I think it's okay to keep things like this.
That comment is only there to remind me that some cleanup might be in
order.

> 4)
> +/*
> + * EventTriggerStashCommand
> + *             Save data about a simple DDL command that was just executed
> + */
> Shouldn't this be "single" instead of "simple"?

In an older version it was "basic".  Not wedded to "simple", but I don't
think "single" is the right thing.  A later patch in the series
introduces type Grant, and there's also type AlterTable.  The idea
behind Simple is to include command types that do not require special
handling; but all these commands are single commands.

> 6)
> +               command = deparse_utility_command(cmd);
> +
> +               /*
> +                * Some parse trees return NULL when deparse is attempted;
> we don't
> +                * emit anything for them.
> +                */
> +               if (command != NULL)
> +               {
> Small detail, but you may here just use a continue to make the code a bit
> more readable after deparsing the command.

Will check.

> 9) Those declarations are not needed in event_trigger.c:
> +#include "utils/json.h"
> +#include "utils/jsonb.h"

Will check. I split ddl_json.c at the last minute and I may have
forgotten to remove these.

> 10) Would you mind explaining what means "fmt"?
> + * Allocate a new object tree to store parameter values -- varargs version.
> + *
> + * The "fmt" argument is used to append as a "fmt" element in the output
> blob.

"fmt" is equivalent to sprintf and friends' fmt argument.  I guess this
commands needs to be expanded a bit.

> 11) deparse_utility.c mentions here and there JSON objects, but what is
> created are JSONB objects. I'd rather be clear here.

Good point.

> 12) Already mentioned before, but this reverse engineering machine for
> types would be more welcome as a set of functions in lsyscache (one for
> namespace, one for type name and one for is_array). For typemodstr the need
> is different as it uses printTypmod.
> +void
> +format_type_detailed(Oid type_oid, int32 typemod,
> +                                        Oid *nspid, char **typname, char
> **typemodstr,
> +                                        bool *is_array)

I am unsure about this.  Other things that require many properties of
the same object do a single lookup and return all of them in a single
call, rather than repeated calls.

> 13) This change seems unrelated to this patch...
> -       int                     type = 0;
> +       JsonbIteratorToken type = WJB_DONE;
>         JsonbValue      v;
>         int                     level = 0;
>         bool            redo_switch = false;
> @@ -454,7 +454,7 @@ JsonbToCString(StringInfo out, JsonbContainer *in, int
> estimated_len)
>                                 first = false;
>                                 break;
>                         default:
> -                               elog(ERROR, "unknown flag of jsonb
> iterator");
> +                               elog(ERROR, "unknown jsonb iterator token
> type");

Yes, sorry.  I was trying to figure out how to use the jsonb stuff and
I found this error message was quite unclear.  In general, jsonb code
seems to have random warts ...

> A couple of comments: this patch introduces a basic infrastructure able to
> do the following set of operations:
> - Obtention of parse tree using StashedCommand
> - Reorganization of parse tree to become an ObjTree, with boolean, array
> - Reorganization of ObjTree to a JsonB value
> I am actually a bit doubtful about why we actually need this intermediate
> ObjTree step... What would be wrong in manipulating a JSON(B) parsed tree,
> that is first empty, and pushing key/value objects to it when processing
> each command? Something moving toward in this direction is that ObjTree has
> some logic to manipulate booleans, null values, etc. This results in some
> duplication with what jsonb and json can actually do when creating when
> manipulating strings, as well as the extra processing like
> objtree_to_jsonb_element. ddl_json.c (should be ddl_jsonb.c?) would as well
> take more its sense as it directly manipulates JSONB containers.

Uhm.  Obviously we didn't have jsonb when I started this and we do have
them now, so I could perhaps see about updating the patch to do things
this way; but I'm not totally sold on that idea, as my ObjTree stuff is
a lot easier to manage and the jsonb API is pretty ugly.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Add CREATE support to event triggers

От

Andres Freund

Дата:

28 октября 2014 г., 10:00:16

On 2014-10-28 05:30:43 -0300, Alvaro Herrera wrote:
> > A couple of comments: this patch introduces a basic infrastructure able to
> > do the following set of operations:
> > - Obtention of parse tree using StashedCommand
> > - Reorganization of parse tree to become an ObjTree, with boolean, array
> > - Reorganization of ObjTree to a JsonB value
> > I am actually a bit doubtful about why we actually need this intermediate
> > ObjTree step... What would be wrong in manipulating a JSON(B) parsed tree,
> > that is first empty, and pushing key/value objects to it when processing
> > each command? Something moving toward in this direction is that ObjTree has
> > some logic to manipulate booleans, null values, etc. This results in some
> > duplication with what jsonb and json can actually do when creating when
> > manipulating strings, as well as the extra processing like
> > objtree_to_jsonb_element. ddl_json.c (should be ddl_jsonb.c?) would as well
> > take more its sense as it directly manipulates JSONB containers.
> 
> Uhm.  Obviously we didn't have jsonb when I started this and we do have
> them now, so I could perhaps see about updating the patch to do things
> this way; but I'm not totally sold on that idea, as my ObjTree stuff is
> a lot easier to manage and the jsonb API is pretty ugly.

I looked at this as well, and I think trying to do so would not result
in readable code.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Add CREATE support to event triggers

От

Robert Haas

Дата:

29 октября 2014 г., 17:41:04

On Tue, Oct 28, 2014 at 6:00 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> Uhm.  Obviously we didn't have jsonb when I started this and we do have
>> them now, so I could perhaps see about updating the patch to do things
>> this way; but I'm not totally sold on that idea, as my ObjTree stuff is
>> a lot easier to manage and the jsonb API is pretty ugly.
>
> I looked at this as well, and I think trying to do so would not result
> in readable code.

That doesn't speak very well of jsonb.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add CREATE support to event triggers

От

Michael Paquier

Дата:

31 октября 2014 г., 02:28:00

On Thu, Oct 30, 2014 at 2:40 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Oct 28, 2014 at 6:00 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> Uhm. Obviously we didn't have jsonb when I started this and we do have
>> them now, so I could perhaps see about updating the patch to do things
>> this way; but I'm not totally sold on that idea, as my ObjTree stuff is
>> a lot easier to manage and the jsonb API is pretty ugly.
>
> I looked at this as well, and I think trying to do so would not result
> in readable code.

That doesn't speak very well of jsonb. :-(

Just did the same and I played a bit with the APIs. And I am getting the impression that the jsonb API is currently focused on the fact of deparsing and parsing Jsonb strings to/from containers but there is no real interface that allows to easily manipulate the containers where the values are located. So, what I think is missing is really a friendly interface to manipulate JsonbContainers directly, and I think that we are not far from it with something like this set, roughly:

- Initialization of an empty container

- Set of APIs to directly push a value to a container (boolean, array, null, string, numeric or other jsonb object)

- Initialization of JsonbValue objects

With this basic set of APIs patch 4 could for example use JsonbToCString to then convert the JSONB bucket back to a string it sends to client. Note as well that there is already findJsonbValueFromContainer present to get back a value in a container.

In short, my point is: instead of re-creating the wheel like what this series of patch is trying to do with ObjTree, I think that it would be more fruitful to have a more solid in-place JSONB infrastructure that allows to directly manipulate JSONB objects. This feature as well as future extensions could benefit from that.

Feel free to comment.

Regards,

Michael

Re: Add CREATE support to event triggers

От

Michael Paquier

Дата:

05 ноября 2014 г., 03:31:46

On Fri, Oct 31, 2014 at 11:27 AM, Michael Paquier <michael.paquier@gmail.com> wrote:

So, what I think is missing is really a friendly interface to manipulate JsonbContainers directly, and I think that we are not far from it with something like this set, roughly:
- Initialization of an empty container
- Set of APIs to directly push a value to a container (boolean, array, null, string, numeric or other jsonb object)
- Initialization of JsonbValue objects

Here are more thoughts among those lines looking at the current state of the patch 4 that introduces the infrastructure of the whole feature. This would make possible in-memory manipulation of jsonb containers without relying on a 3rd-part set of APIs like what this patch is doing with ObjTree to deparse the DDL parse trees.

1) Set of constructor functions for JsonbValue: null, bool, string, array, JSONB object for nested values. Note that keys for can be used as Jsonb string objects

2) Lookup functions for values in a JsonbContainer. Patch 4 is introducing that with find_string_in_jsonbcontainer and find_bool_in_jsonbcontainer. We may as well extend it to be able to look for another Jsonb object for nested searches for example.

3) Functions to push JsonbValue within a container, using a key and a value. This is where most of the work would be necessary, for bool, null, string, Jsonb object and numeric.

This infrastructure would allow in-memory manipulation of jsonb containers. Containers that can then be easily be manipulated to be changed back to strings and for value lookups using key strings.

Regards,

Michael

Re: Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

07 ноября 2014 г., 15:48:44

Michael Paquier wrote:

> Here are more thoughts among those lines looking at the current state of
> the patch 4 that introduces the infrastructure of the whole feature. This
> would make possible in-memory manipulation of jsonb containers without
> relying on a 3rd-part set of APIs like what this patch is doing with
> ObjTree to deparse the DDL parse trees.

Thanks for the thoughts.  I have to say that I have no intention of
reworking the jsonb code.  If somebody else wants to do the legwork and
add that API as you suggest, I'm happy to remove all the ObjTree stuff
from this patch.  I don't expect this to happen too soon, though, so I
would instead consider committing this patch based on ObjTree.  Later,
when somebody comes around to reworking jsonb, we can rip ObjTree out.

> This infrastructure would allow in-memory manipulation of jsonb containers.
> Containers that can then be easily be manipulated to be changed back to
> strings and for value lookups using key strings.

Honestly, I had hoped that the jsonb code would have already included
this kind of functionality.  I wasn't too happy when I discovered that I
needed to keep the ObjTree crap.  But I don't want to do that work
myself.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Add CREATE support to event triggers

От

Robert Haas

Дата:

08 ноября 2014 г., 02:41:26

On Fri, Nov 7, 2014 at 10:45 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Michael Paquier wrote:
>
>> Here are more thoughts among those lines looking at the current state of
>> the patch 4 that introduces the infrastructure of the whole feature. This
>> would make possible in-memory manipulation of jsonb containers without
>> relying on a 3rd-part set of APIs like what this patch is doing with
>> ObjTree to deparse the DDL parse trees.
>
> Thanks for the thoughts.  I have to say that I have no intention of
> reworking the jsonb code.  If somebody else wants to do the legwork and
> add that API as you suggest, I'm happy to remove all the ObjTree stuff
> from this patch.  I don't expect this to happen too soon, though, so I
> would instead consider committing this patch based on ObjTree.  Later,
> when somebody comes around to reworking jsonb, we can rip ObjTree out.
>
>> This infrastructure would allow in-memory manipulation of jsonb containers.
>> Containers that can then be easily be manipulated to be changed back to
>> strings and for value lookups using key strings.
>
> Honestly, I had hoped that the jsonb code would have already included
> this kind of functionality.  I wasn't too happy when I discovered that I
> needed to keep the ObjTree crap.  But I don't want to do that work
> myself.

If we're going to have infrastructure for this in core, we really
ought to make the effort to make it general instead of not.

I still think this whole idea is a mess.  It adds what looks to be a
LOT of code that, at least as I understand it, we have no compelling
way to regression test and which will likely receive very limited
testing from users to support a feature that is not in core, may never
be, and which I strongly suspect may be too clever by half.  Once
you've committed it, you're going to move onto other things and leave
it to everyone who comes after to try to maintain it.  I bet we'll
still be running into bugs and half-implemented features five years
from now, and maybe in ten.  Ramming through special-purpose
infrastructure instead of generalizing it is merely icing on the cake,
but it's still moving in the wrong direction.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add CREATE support to event triggers

От

Andres Freund

Дата:

08 ноября 2014 г., 09:37:30

On 2014-11-07 21:41:17 -0500, Robert Haas wrote:
> On Fri, Nov 7, 2014 at 10:45 AM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> > Michael Paquier wrote:
> >
> >> Here are more thoughts among those lines looking at the current state of
> >> the patch 4 that introduces the infrastructure of the whole feature. This
> >> would make possible in-memory manipulation of jsonb containers without
> >> relying on a 3rd-part set of APIs like what this patch is doing with
> >> ObjTree to deparse the DDL parse trees.
> >
> > Thanks for the thoughts.  I have to say that I have no intention of
> > reworking the jsonb code.  If somebody else wants to do the legwork and
> > add that API as you suggest, I'm happy to remove all the ObjTree stuff
> > from this patch.  I don't expect this to happen too soon, though, so I
> > would instead consider committing this patch based on ObjTree.  Later,
> > when somebody comes around to reworking jsonb, we can rip ObjTree out.
> >
> >> This infrastructure would allow in-memory manipulation of jsonb containers.
> >> Containers that can then be easily be manipulated to be changed back to
> >> strings and for value lookups using key strings.
> >
> > Honestly, I had hoped that the jsonb code would have already included
> > this kind of functionality.  I wasn't too happy when I discovered that I
> > needed to keep the ObjTree crap.  But I don't want to do that work
> > myself.
> 
> If we're going to have infrastructure for this in core, we really
> ought to make the effort to make it general instead of not.
> 
> I still think this whole idea is a mess.  It adds what looks to be a
> LOT of code that, at least as I understand it, we have no compelling
> way to regression test

I don't understand why this is particularly difficult to regresssion
test. It actually is comparatively simple?

> and which will likely receive very limited
> testing from users to support a feature that is not in core,

Just like half of the features you worked on yourself lately? Why is
this an argument?

Being able to replicate DDL is a feature wish that has been around
pretty much since the inception of trigger based replication
solution. It's not some current fancy. And the json stuff only got there
because people wanted some way to manipulate the names in the replicated
- which this abstraction provides them with.

> may never be, and which I strongly suspect may be too clever by half.
> Once you've committed it, you're going to move onto other things and
> leave it to everyone who comes after to try to maintain it.  I bet
> we'll still be running into bugs and half-implemented features five
> years from now, and maybe in ten.  Ramming through special-purpose
> infrastructure instead of generalizing it is merely icing on the cake,
> but it's still moving in the wrong direction.

You're just as much to blame for not writing a general json abstraction
layer for EXPLAIN. I'd say that's not much blame, but still, there's
really not much difference there.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Add CREATE support to event triggers

От

Michael Paquier

Дата:

08 ноября 2014 г., 13:53:03

On Sat, Nov 8, 2014 at 12:45 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Michael Paquier wrote:
>
>> Here are more thoughts among those lines looking at the current state of
>> the patch 4 that introduces the infrastructure of the whole feature. This
>> would make possible in-memory manipulation of jsonb containers without
>> relying on a 3rd-part set of APIs like what this patch is doing with
>> ObjTree to deparse the DDL parse trees.
>
> Thanks for the thoughts.  I have to say that I have no intention of
> reworking the jsonb code.  If somebody else wants to do the legwork and
> add that API as you suggest, I'm happy to remove all the ObjTree stuff
> from this patch.  I don't expect this to happen too soon, though, so I
> would instead consider committing this patch based on ObjTree.  Later,
> when somebody comes around to reworking jsonb, we can rip ObjTree out.
The thing freaking me out in this case is when would that really
happen? Maybe years from now, and perhaps at that point we would
regret to not have put in place the infrastructure that we knew we
could have done.

>> This infrastructure would allow in-memory manipulation of jsonb containers.
>> Containers that can then be easily be manipulated to be changed back to
>> strings and for value lookups using key strings.
>
> Honestly, I had hoped that the jsonb code would have already included
> this kind of functionality.  I wasn't too happy when I discovered that I
> needed to keep the ObjTree crap.  But I don't want to do that work
> myself.
I can't blame you for that.

In any case, if this code goes in as-is (I am against it at this point
but I am just giving my opinion as a reviewer here), I think that at
least the following things could be done with a minimal effort:
- Provide the set of constructor functions for JsonbValue
- Move the jsonb APIs (find_*_in_jsonbcontainer) doing value lookups
using keys from ddl_json.c to jsonb_util.c, rename them to something
more consistent and complete the set for the other object types.
- Add a TODO item in the wiki, and a TODO item in where ObjTree is defined.
Thanks,
-- 
Michael

Re: Add CREATE support to event triggers

От

Robert Haas

Дата:

08 ноября 2014 г., 15:42:23

On Sat, Nov 8, 2014 at 4:37 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> I don't understand why this is particularly difficult to regresssion
> test. It actually is comparatively simple?

If it is, great.  I previously wrote this email:

http://www.postgresql.org/message-id/CA+TgmoZ=vZriJMxLkqi_V0jg4k4LEAPmwUSC6RWXS5MquXUJNA@mail.gmail.com

Alvaro came up with a way of addressing the second point I raised
there, which I'm quite pleased about, but AFAIK there's been no
progress on the first one.  Maybe I missed something?

>> and which will likely receive very limited
>> testing from users to support a feature that is not in core,
>
> Just like half of the features you worked on yourself lately? Why is
> this an argument?

Because the stuff I'm adding doesn't break every time someone adds a
new DDL command, something that we do regularly.

If it were a question of writing this code once and being done with
it, that would be unobjectionable in my view.  But it isn't.
Practically every change to gram.y is going to require a corresponding
change to this stuff.  As far as I can see, nobody except me has
commented on the burden that places on everyone who may wish to add
syntax support for a new construct in the future, which might mean
that I'm worrying about something that isn't worth worrying about, but
what I think is more likely is that nobody's worrying about it right
now because they haven't had to do it yet.

Just to illustrate the point, consider the CREATE TABLE name OF type
syntax that Peter added a few years ago.   That patch
(e7b3349a8ad7afaad565c573fbd65fb46af6abbe) had the following impact on
gram.y:
src/backend/parser/gram.y |   56 ++++++++++++++++++++++++++++++++++-----------1 file changed, 43 insertions(+), 13
deletions(-)

Now let's have a look at what impact it has on the deparsing code.
Patch 6 has deparse_ColumnDef_Typed from lines 134 to 193.  There's 3
or so lines in deparseTableElements that decide whether to call it.
Patch 8 has more handling for this case, lines 439 to 443 and 463 to
490.  So, if this feature had been committed before TABLE OF, it would
have needed about 100 lines of net new code to handle this case -
exclusive of refactoring.  The actual size of the patch would probably
have been modestly larger than that, because some code would need to
be reindented when it got iffed out, and quite possibly some
rearrangement would have been needed.  But even ignoring all that, the
deparse footprint of the patch would have been MORE THAN TWICE the
parser footprint.

I think that's going to be typical; and I think the deparse code is
going to be significantly more labor-intensive to write than bison
productions are.  Do you really think that's not going to be a burden?

> Being able to replicate DDL is a feature wish that has been around
> pretty much since the inception of trigger based replication
> solution. It's not some current fancy. And the json stuff only got there
> because people wanted some way to manipulate the names in the replicated
> - which this abstraction provides them with.

I understand that being able to replicate DDL is an important need,
and there may be no better way to do it than this.  But that doesn't
mean that this code is going to be bug-free or easy to maintain.

>> may never be, and which I strongly suspect may be too clever by half.
>> Once you've committed it, you're going to move onto other things and
>> leave it to everyone who comes after to try to maintain it.  I bet
>> we'll still be running into bugs and half-implemented features five
>> years from now, and maybe in ten.  Ramming through special-purpose
>> infrastructure instead of generalizing it is merely icing on the cake,
>> but it's still moving in the wrong direction.
>
> You're just as much to blame for not writing a general json abstraction
> layer for EXPLAIN. I'd say that's not much blame, but still, there's
> really not much difference there.

Well, this patch is series is at least an order of magnitude larger,
and it's apparently doing significantly more complex stuff with JSON,
because the explain.c patch just writes it out into a StringInfo.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add CREATE support to event triggers

От

Tom Lane

Дата:

08 ноября 2014 г., 16:52:57

Robert Haas <robertmhaas@gmail.com> writes:
> If it were a question of writing this code once and being done with
> it, that would be unobjectionable in my view.  But it isn't.
> Practically every change to gram.y is going to require a corresponding
> change to this stuff.  As far as I can see, nobody except me has
> commented on the burden that places on everyone who may wish to add
> syntax support for a new construct in the future, which might mean
> that I'm worrying about something that isn't worth worrying about, but
> what I think is more likely is that nobody's worrying about it right
> now because they haven't had to do it yet.

I haven't been paying much attention to this thread, but I concur with
Robert that adding yet another set of overhead requirements to any
addition of new SQL is not a good thing.

> Just to illustrate the point, consider the CREATE TABLE name OF type
> syntax that Peter added a few years ago.   That patch
> (e7b3349a8ad7afaad565c573fbd65fb46af6abbe) had the following impact on
> gram.y:

This analysis is kind of cheating, because adding new syntax hasn't been
only a matter of touching gram.y for a very long time.  You invariably
have to touch pg_dump, and you have to touch ruleutils.c unless it's
strictly a DDL-command change.  But people are used to those, and the
value of keeping pg_dump working is clear to everybody.  Adding a similar
level of burden to support a feature with a narrow use-case seems like
a nonstarter from here.  ESPECIALLY if we also have to manually add
regression test cases because there's no easy way to test it directly.
        regards, tom lane

Re: Add CREATE support to event triggers

От

Andres Freund

Дата:

08 ноября 2014 г., 16:56:07

On 2014-11-08 11:52:43 -0500, Tom Lane wrote:
> Adding a similar
> level of burden to support a feature with a narrow use-case seems like
> a nonstarter from here.

I don't understand this statement. In my experience the lack of a usable
replication solution that allows temporary tables and major version
differences is one of the most, if not *the* most, frequent criticisms
of postgres I hear. How is this a narrow use case?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Add CREATE support to event triggers

От

Tom Lane

Дата:

08 ноября 2014 г., 17:07:51

Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-11-08 11:52:43 -0500, Tom Lane wrote:
>> Adding a similar
>> level of burden to support a feature with a narrow use-case seems like
>> a nonstarter from here.

> I don't understand this statement. In my experience the lack of a usable
> replication solution that allows temporary tables and major version
> differences is one of the most, if not *the* most, frequent criticisms
> of postgres I hear. How is this a narrow use case?

[ shrug... ]  I don't personally give a damn about logical replication,
especially not logical replication implemented in this fashion.  It looks
large and rickety (ie full of opportunities for bugs) and I would never
trust data I cared about to it.

Or in short: AFAICS you're not building the next WAL-shipping replication
solution, you're building the next Slony, and Slony never has and never
will be more than a niche use-case.  Putting half of it into core wouldn't
fix that, it would just put a lot more maintenance burden on core
developers.  Core developers are entitled to push back on such proposals.
        regards, tom lane

Re: Add CREATE support to event triggers

От

Andres Freund

Дата:

08 ноября 2014 г., 17:20:11

On 2014-11-08 12:07:41 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-11-08 11:52:43 -0500, Tom Lane wrote:
> >> Adding a similar
> >> level of burden to support a feature with a narrow use-case seems like
> >> a nonstarter from here.
> 
> > I don't understand this statement. In my experience the lack of a usable
> > replication solution that allows temporary tables and major version
> > differences is one of the most, if not *the* most, frequent criticisms
> > of postgres I hear. How is this a narrow use case?
> 
> [ shrug... ]  I don't personally give a damn about logical replication,
> especially not logical replication implemented in this fashion.

"In this fashion" meaning ddl replication via event triggers? If you
have an actual suggestion how to do it better I'm all ears. So far
nobody has come up with anything.

> Or in short: AFAICS you're not building the next WAL-shipping replication
> solution, you're building the next Slony, and Slony never has and never
> will be more than a niche use-case.

A good number of the sites out there use either londiste or slony. Not
because they like it, but because there's no other alternative.

I'd love to simply say that we can make WAL based replication work
across versions, platforms and subsets of relations in PG
clusters. Since that seems quite unrealistic people have to go different
ways.

> Putting half of it into core wouldn't fix that, it would just put a
> lot more maintenance burden on core developers.

Imo stuff that can't be done sanely outside core needs to be put into
core if it's actually desired by many users. And working DDL replication
for logical replication solutions surely is.

> Core developers are entitled to push back on such proposals.

I'm not saying "core developers" (whover that is) aren't allowed to do
so. But just because they think something is (too) invasive doesn't make
it a niche application.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Add CREATE support to event triggers

От

Robert Haas

Дата:

08 ноября 2014 г., 17:30:37

On Sat, Nov 8, 2014 at 12:20 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> Putting half of it into core wouldn't fix that, it would just put a
>> lot more maintenance burden on core developers.
>
> Imo stuff that can't be done sanely outside core needs to be put into
> core if it's actually desired by many users. And working DDL replication
> for logical replication solutions surely is.

I don't buy it.  This patch series is *all about* transferring the
maintenance burden of this feature from the BDR developers to the core
project.  There's nothing to keep you from exposing the parse trees to
C functions that can live in an extension, and you can do all of this
deparsing there.  Nobody will stop you, and when it breaks (not if)
you can fix it in your code.  The overhead of deparsing new types of
parse nodes can be born by you.  The only benefit of pushing it into
core is that some other logical replication solution could also take
advantage of that, but I know of nobody with any plans to do such a
thing.  On the flip side, the *cost* of pushing it into core is that
it now becomes the PostgreSQL's community problem to update the
deparsing code every time the grammar changes.  That cost-benefit
trade-off does not look favorable to me, especially given the absence
of any kind of robust testing framework.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add CREATE support to event triggers

От

Andres Freund

Дата:

08 ноября 2014 г., 18:05:50

On 2014-11-08 12:30:29 -0500, Robert Haas wrote:
> On Sat, Nov 8, 2014 at 12:20 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> >> Putting half of it into core wouldn't fix that, it would just put a
> >> lot more maintenance burden on core developers.
> >
> > Imo stuff that can't be done sanely outside core needs to be put into
> > core if it's actually desired by many users. And working DDL replication
> > for logical replication solutions surely is.
> 
> I don't buy it.  This patch series is *all about* transferring the
> maintenance burden of this feature from the BDR developers to the core
> project.

What? Not at all. It's *much* less work to do these kind of things out
of core. As you probably have experienced more than once. If it were
possible to do this entirely in a extension I'm pretty sure nobody would
have bothered.

> There's nothing to keep you from exposing the parse trees to
> C functions that can live in an extension, and you can do all of this
> deparsing there.

Not really. There's some core functions that need to be touched. Like
most of the stuff in patches 1,2,3,5,16 does.

We could just integrate those parts, and be done with it. But would that
actually be a good thing for the community? Then slony needs to do it
and potentially others as well? Then auditing can't use it? Then
potential schema tracking solutions can't use it?

> Nobody will stop you, and when it breaks (not if)
> you can fix it in your code. The overhead of deparsing new types of
> parse nodes can be born by you.  The only benefit of pushing it into
> core is that some other logical replication solution could also take
> advantage of that, but I know of nobody with any plans to do such a
> thing.

There've been people for a long while asking about triggers on catalogs
for that purpose. IIRC Jan was one of them.

> On the flip side, the *cost* of pushing it into core is that
> it now becomes the PostgreSQL's community problem to update the
> deparsing code every time the grammar changes.  That cost-benefit
> trade-off does not look favorable to me, especially given the absence
> of any kind of robust testing framework.

I agree that there should be testing in this.

But I think you're quite overstating the effort of maintaining
this. Looking at gram.y between the stamping of 9.3 devel and 9.4 devel
for commits that'd require DDL deparsing changes:
* ALTER TABLE .. ALTER CONSTRAINT: About 7 lines for the deparse code, out of a ~350 line patch
* REFRESH MATERIALIZED VIEW ... CONCURRENTLY: About 1 line for deparse, out of ~700
* WITH CHECK OPTION for views: About 3 lines for deparse, out of ~1300
* REPLICA IDENTITY: About 24 lines for deparse, out of ~950
* ALTER TABLESPACE MOVE: About 20 lines for deparse, out of ~340. Although that patch was essentially scrapped
afterwards.And rewritten differently.

* CREATE TABLESPACE ... WITH ...:  Not supported by event triggers right now as it's a global object.

That's really not a whole lot.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Add CREATE support to event triggers

От

Andres Freund

Дата:

08 ноября 2014 г., 21:35:49

On 2014-11-08 10:42:15 -0500, Robert Haas wrote:
> On Sat, Nov 8, 2014 at 4:37 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > I don't understand why this is particularly difficult to regresssion
> > test. It actually is comparatively simple?
> 
> If it is, great.  I previously wrote this email:
> 
> http://www.postgresql.org/message-id/CA+TgmoZ=vZriJMxLkqi_V0jg4k4LEAPmwUSC6RWXS5MquXUJNA@mail.gmail.com
> 
> Alvaro came up with a way of addressing the second point I raised
> there, which I'm quite pleased about, but AFAIK there's been no
> progress on the first one.  Maybe I missed something?

I unfortunately don't think so. And that sounds like a completely
reasonable criticism.

> Just to illustrate the point, consider the CREATE TABLE name OF type
> syntax that Peter added a few years ago.   That patch
> (e7b3349a8ad7afaad565c573fbd65fb46af6abbe) had the following impact on
> gram.y:
> 
>  src/backend/parser/gram.y |   56 ++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 43 insertions(+), 13 deletions(-)

> Now let's have a look at what impact it has on the deparsing code.
> Patch 6 has deparse_ColumnDef_Typed from lines 134 to 193.  There's 3
> or so lines in deparseTableElements that decide whether to call it.
> Patch 8 has more handling for this case, lines 439 to 443 and 463 to
> 490.  So, if this feature had been committed before TABLE OF, it would
> have needed about 100 lines of net new code to handle this case -
> exclusive of refactoring.  The actual size of the patch would probably
> have been modestly larger than that, because some code would need to
> be reindented when it got iffed out, and quite possibly some
> rearrangement would have been needed.  But even ignoring all that, the
> deparse footprint of the patch would have been MORE THAN TWICE the
> parser footprint.

Well, you disregarded the related costs of adjusting pg_dump et al, as
Tom mentioned, that's a significant part. And yes, there's some
additions that aren't entirely trivial to add. But the majority of
additions are pretty simple.

> I think that's going to be typical; and I think the deparse code is
> going to be significantly more labor-intensive to write than bison
> productions are.  Do you really think that's not going to be a burden?

I've looked into a fair number of cases and almost all are vastly
simpler than this. Most of the DDL changes that have been done lately
are things like adding IF NOT EXISTS somewhere; expanding existing
syntax for new types of objects (ALTER ... RENAME for foreign servers,
wrappers; LABEL for new types).

I think given the complexity of newly added features the overhead of
adding deparsing code isn't all that high.

> > Being able to replicate DDL is a feature wish that has been around
> > pretty much since the inception of trigger based replication
> > solution. It's not some current fancy. And the json stuff only got there
> > because people wanted some way to manipulate the names in the replicated
> > - which this abstraction provides them with.
> 
> I understand that being able to replicate DDL is an important need,
> and there may be no better way to do it than this.  But that doesn't
> mean that this code is going to be bug-free or easy to maintain.

Agreed. There's definitely no free lunch (here). This has been discussed
more than once, and so far I've read anything superior that also has a
chance of handling ALTER.

> >> may never be, and which I strongly suspect may be too clever by half.
> >> Once you've committed it, you're going to move onto other things and
> >> leave it to everyone who comes after to try to maintain it.  I bet
> >> we'll still be running into bugs and half-implemented features five
> >> years from now, and maybe in ten.  Ramming through special-purpose
> >> infrastructure instead of generalizing it is merely icing on the cake,
> >> but it's still moving in the wrong direction.
> >
> > You're just as much to blame for not writing a general json abstraction
> > layer for EXPLAIN. I'd say that's not much blame, but still, there's
> > really not much difference there.
> 
> Well, this patch is series is at least an order of magnitude larger,
> and it's apparently doing significantly more complex stuff with JSON,
> because the explain.c patch just writes it out into a StringInfo.

And this code builds up the data in memory and then calls into the json
code to build the json value. And to parse json it uses the functions
underlying the SQL accessors.

There's one function it should either not need anymore (dequote_jsonval)
by using json_object_field_text instead of json_object_field or by
exposing dequote_jsonval's functionality from json.c.

I think the json angle here is a red herring. Sure, a nicer API could
save some FunctionCallInfoData boilerplate, but that's pretty much it.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Add CREATE support to event triggers

От

Robert Haas

Дата:

08 ноября 2014 г., 22:49:39

On Sat, Nov 8, 2014 at 1:05 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> There's nothing to keep you from exposing the parse trees to
>> C functions that can live in an extension, and you can do all of this
>> deparsing there.
>
> Not really. There's some core functions that need to be touched. Like
> most of the stuff in patches 1,2,3,5,16 does.

Patch 1 is fine.  We've done similar things in the past (cf.
c504513f83a9ee8dce4a719746ca73102cae9f13,
82b1b213cad3a69cf5f3dfaa81687c14366960fc).  I'd just commit that.

Patch 2 adds support for GRANT and REVOKE to the event trigger
mechanism.  I wonder if it's a bad idea to make the
ddl_command_start/end events fire for DCL.  We discussed a lot of
these issues when this patch originally went in, and I think it'd be
worth revisiting that discussion.

Patch 3 is the same kind of idea as patch 2, only for COMMENT.

Patch 5 depends on patch 4, which does a bunch of things.  I *think*
the upshot of patch 5 is that we're not currently firing event
triggers in some situations where we should, in which case +1 for
fixing that.  It would help if there were a real commit message,
and/or some more contents, and I think it could be more completely
disentangled from patch 4.

Patch 16 again contains almost no comments and no description of its
specific purpose, but it appears to be similar to patch 1, so probably
mostly uncontroversial.

> We could just integrate those parts, and be done with it. But would that
> actually be a good thing for the community? Then slony needs to do it
> and potentially others as well? Then auditing can't use it? Then
> potential schema tracking solutions can't use it?

Do you think Slony is really going to use this?  I guess we can let
the Slony guys speak for themselves, but I've been skeptical since day
one that this is the best way to do DDL replication, and I still am.
There are lots of ways that a replicated DDL statement can fail on the
replicas, and what are you going to do then?  It's too late to show
the user the error message, so you can throw it in a log someplace and
hope that somebody notices, but that's it.  It makes a lot more sense
to me to use some kind of a tool that applies the DDL in a coordinated
fashion on all nodes - or even just do it manually, since it might
very well be desirable to take the lock on different nodes at widely
different times, separated by a switchover.  I certainly think there's
a use-case for what you're trying to do here, but I don't think it'll
be right for everyone.

Certainly, if the Slony guys - or some other team building an
out-of-core replication solutions says, hey, we really want this in
core, that would considerably strengthen the argument for putting it
there.  But I haven't heard anyone say that yet - unlike logical
decoding, were we did have other people expressing clear interest in
using it.

> There've been people for a long while asking about triggers on catalogs
> for that purpose. IIRC Jan was one of them.

My impression, based on something Christopher Brown said a few years
ago, is that Slony's DDL trigger needs are largely satisfied by the
existing event trigger stuff.  It would be helpful to get confirmation
as to whether that's the case.

>> On the flip side, the *cost* of pushing it into core is that
>> it now becomes the PostgreSQL's community problem to update the
>> deparsing code every time the grammar changes.  That cost-benefit
>> trade-off does not look favorable to me, especially given the absence
>> of any kind of robust testing framework.
>
> I agree that there should be testing in this.
>
> But I think you're quite overstating the effort of maintaining
> this. Looking at gram.y between the stamping of 9.3 devel and 9.4 devel
> for commits that'd require DDL deparsing changes:
> * ALTER TABLE .. ALTER CONSTRAINT: About 7 lines for the deparse code,
>   out of a ~350 line patch
> * REFRESH MATERIALIZED VIEW ... CONCURRENTLY: About 1 line for deparse,
>   out of ~700
> * WITH CHECK OPTION for views: About 3 lines for deparse, out of ~1300
> * REPLICA IDENTITY: About 24 lines for deparse, out of ~950
> * ALTER TABLESPACE MOVE: About 20 lines for deparse, out of
>   ~340. Although that patch was essentially scrapped afterwards. And
>   rewritten differently.
> * CREATE TABLESPACE ... WITH ...:  Not supported by event triggers right
>   now as it's a global object.
>
> That's really not a whole lot.

Those are pretty minor syntax patches, though.  I think it's more
helpful to look at things like the row-level security stuff, or
materialized views per se, or DDL support for collations.  In any
case, the additional coding burden concerns me less than the
additional testing burden - but there's another email further down
that's more to that specific point, so I'll stop here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add CREATE support to event triggers

От

Andres Freund

Дата:

08 ноября 2014 г., 23:24:23

On 2014-11-08 17:49:30 -0500, Robert Haas wrote:
> Patch 2 adds support for GRANT and REVOKE to the event trigger
> mechanism.  I wonder if it's a bad idea to make the
> ddl_command_start/end events fire for DCL.  We discussed a lot of
> these issues when this patch originally went in, and I think it'd be
> worth revisiting that discussion.

Well, it doesn't generally support it for all GRANT statement, but just
for ones that are database local. I think that mostly skirts the
problems from the last round of discussion. But I only vaguely remember
them.

> Patch 5 depends on patch 4, which does a bunch of things.  I *think*
> the upshot of patch 5 is that we're not currently firing event
> triggers in some situations where we should, in which case +1 for
> fixing that.  It would help if there were a real commit message,
> and/or some more contents, and I think it could be more completely
> disentangled from patch 4.

> > We could just integrate those parts, and be done with it. But would that
> > actually be a good thing for the community? Then slony needs to do it
> > and potentially others as well? Then auditing can't use it? Then
> > potential schema tracking solutions can't use it?
> 
> Do you think Slony is really going to use this?

There was a fair amount noise about it at one of the past cluster
hackers thingies.

> I guess we can let
> the Slony guys speak for themselves, but I've been skeptical since day
> one that this is the best way to do DDL replication, and I still am.

Well, I've yet to hear anything that's realistic otherwise.

> There are lots of ways that a replicated DDL statement can fail on the
> replicas, and what are you going to do then?  It's too late to show
> the user the error message, so you can throw it in a log someplace and
> hope that somebody notices, but that's it.

Sure. And absolutely the same is true for DML. And the lack of DDL
integration makes it happen really rather frequently...

> It makes a lot more sense
> to me to use some kind of a tool that applies the DDL in a coordinated
> fashion on all nodes - or even just do it manually, since it might
> very well be desirable to take the lock on different nodes at widely
> different times, separated by a switchover.  I certainly think there's
> a use-case for what you're trying to do here, but I don't think it'll
> be right for everyone.

I agree that it's not the right.

> Certainly, if the Slony guys - or some other team building an
> out-of-core replication solutions says, hey, we really want this in
> core, that would considerably strengthen the argument for putting it
> there.  But I haven't heard anyone say that yet - unlike logical
> decoding, were we did have other people expressing clear interest in
> using it.

As I said, there was clear interest at at least two of the cluster
hackers meetings...

> > There've been people for a long while asking about triggers on catalogs
> > for that purpose. IIRC Jan was one of them.
> 
> My impression, based on something Christopher Brown said a few years
> ago, is that Slony's DDL trigger needs are largely satisfied by the
> existing event trigger stuff.  It would be helpful to get confirmation
> as to whether that's the case.

Oh, that's contrary to what I remember, but yes, it'd be interesting to
hear about htat.

> >> On the flip side, the *cost* of pushing it into core is that
> >> it now becomes the PostgreSQL's community problem to update the
> >> deparsing code every time the grammar changes.  That cost-benefit
> >> trade-off does not look favorable to me, especially given the absence
> >> of any kind of robust testing framework.
> >
> > I agree that there should be testing in this.
> >
> > But I think you're quite overstating the effort of maintaining
> > this. Looking at gram.y between the stamping of 9.3 devel and 9.4 devel
> > for commits that'd require DDL deparsing changes:
> > * ALTER TABLE .. ALTER CONSTRAINT: About 7 lines for the deparse code,
> >   out of a ~350 line patch
> > * REFRESH MATERIALIZED VIEW ... CONCURRENTLY: About 1 line for deparse,
> >   out of ~700
> > * WITH CHECK OPTION for views: About 3 lines for deparse, out of ~1300
> > * REPLICA IDENTITY: About 24 lines for deparse, out of ~950
> > * ALTER TABLESPACE MOVE: About 20 lines for deparse, out of
> >   ~340. Although that patch was essentially scrapped afterwards. And
> >   rewritten differently.
> > * CREATE TABLESPACE ... WITH ...:  Not supported by event triggers right
> >   now as it's a global object.
> >
> > That's really not a whole lot.
> 
> Those are pretty minor syntax patches, though.

Sure, but these are all the ones from the stamping of 9.3devel to
9.4devel. I wanted to look at a release cycle, and that seemed the
easiest way to do it. I didn't choose that cycle, because I knew it was
"light" on ddl, I chose it because it was the last complete one.

> I think it's more helpful to look at things like the row-level
> security stuff, or materialized views per se, or DDL support for
> collations.

I can't imagine that writing the relatively straight forward stuff that
deparsing requires is a noticeable part of a patch like RLS. The actual
coding in anything bigger really is the minor part - especially the
relatively trivial bits. And for prototyping you can just leave it out,
just as frequently done today for pg_dump (which usually is noticeably
more complex).
I can't imagine the MATERIALIZED VIEW stuff to be much different. The
CREATE MATERIALIZED VIEW stuff is basically CREATE VIEW with three extra
clauses. I really can't see that mattering that much in the course of a
4.5k line patch.

The same with collations. The first patch adding collations to
table/index columns, just needs to add print the collation name - that's
it. The patch for the collation DDL support needs to add CREATE
COLLATION (~27 lines, including newlines and function header). DROP is
handled generically. Most of ALTER is as well, and SET SCHEMA can't be
very hard.

Greetings,

Andres Freund

Re: Add CREATE support to event triggers

От

Robert Haas

Дата:

09 ноября 2014 г., 00:05:51

On Sat, Nov 8, 2014 at 4:35 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> > I don't understand why this is particularly difficult to regresssion
>> > test. It actually is comparatively simple?
>>
>> If it is, great.  I previously wrote this email:
>>
>> http://www.postgresql.org/message-id/CA+TgmoZ=vZriJMxLkqi_V0jg4k4LEAPmwUSC6RWXS5MquXUJNA@mail.gmail.com
>>
>> Alvaro came up with a way of addressing the second point I raised
>> there, which I'm quite pleased about, but AFAIK there's been no
>> progress on the first one.  Maybe I missed something?
>
> I unfortunately don't think so. And that sounds like a completely
> reasonable criticism.

I'm glad you agree.  If you can find a way to address that point, I
can live with the rest of it.  I don't think it's dumb to be concerned
about features that increase the cost of adding more features.  But
what really concerns me is that that code won't be well-tested, and if
there are cases missing or somebody forgets to do it altogether, it's
very likely that we won't notice.  That seems like a huge problem from
where I sit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add CREATE support to event triggers

От

Robert Haas

Дата:

09 ноября 2014 г., 01:03:37

On Sat, Nov 8, 2014 at 6:24 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-11-08 17:49:30 -0500, Robert Haas wrote:
>> Patch 2 adds support for GRANT and REVOKE to the event trigger
>> mechanism.  I wonder if it's a bad idea to make the
>> ddl_command_start/end events fire for DCL.  We discussed a lot of
>> these issues when this patch originally went in, and I think it'd be
>> worth revisiting that discussion.
>
> Well, it doesn't generally support it for all GRANT statement, but just
> for ones that are database local. I think that mostly skirts the
> problems from the last round of discussion. But I only vaguely remember
> them.

The issue I was alluding to was terminological: it's not clear that
GRANT and REVOKE should be called DDL rather than DCL, although we do
have precedent in some of the logging settings.

The other issue I remember is that if you have a separate event
trigger for GRANT/REVOKE, you can expose fields like operation/object
type/permission/target.  That's much harder to do for an event trigger
that's very broad.

But anyway, I think it would be worth going back and looking at the
previous discussion.

> As I said, there was clear interest at at least two of the cluster
> hackers meetings...

We need to get some better data here.

But again, my core concern is that we have no good way to test this
code for bugs, including of omission, without which I think we will be
very sad.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add CREATE support to event triggers

От

Christopher Browne

Дата:

10 ноября 2014 г., 22:37:51

On 8 November 2014 17:49, Robert Haas <robertmhaas@gmail.com> wrote:
> > We could just integrate those parts, and be done with it. But would that
> > actually be a good thing for the community? Then slony needs to do it
> > and potentially others as well? Then auditing can't use it? Then
> > potential schema tracking solutions can't use it?
>
> Do you think Slony is really going to use this? I guess we can let
> the Slony guys speak for themselves, but I've been skeptical since day
> one that this is the best way to do DDL replication, and I still am.
> There are lots of ways that a replicated DDL statement can fail on the
> replicas, and what are you going to do then? It's too late to show
> the user the error message, so you can throw it in a log someplace and
> hope that somebody notices, but that's it. It makes a lot more sense
> to me to use some kind of a tool that applies the DDL in a coordinated
> fashion on all nodes - or even just do it manually, since it might
> very well be desirable to take the lock on different nodes at widely
> different times, separated by a switchover. I certainly think there's
> a use-case for what you're trying to do here, but I don't think it'll
> be right for everyone.
>
> Certainly, if the Slony guys - or some other team building an
> out-of-core replication solutions says, hey, we really want this in
> core, that would considerably strengthen the argument for putting it
> there. But I haven't heard anyone say that yet - unlike logical
> decoding, were we did have other people expressing clear interest in
> using it.

> > There've been people for a long while asking about triggers on catalogs
> > for that purpose. IIRC Jan was one of them.
>
> My impression, based on something Christopher Brown said a few years
> ago, is that Slony's DDL trigger needs are largely satisfied by the
> existing event trigger stuff. It would be helpful to get confirmation
> as to whether that's the case.

I'm not sure that a replication system that intends to do partial replication
(e.g. - being selective of what objects are to be replicated) will necessarily
want to use the CREATE event triggers to capture creates.

Several cases pop up with different answers:
a) I certainly don't want to replicate temporary tables
b) I almost certainly don't want to replicate unlogged tables
c) For "more ordinary" tables, I'm not sure I want to extend Slony

to detect them and add them automatically, because there

are annoying sub-cases

c.1) If I'm working on data conversion, I may create not totally

temporary tables that are nonetheless not worthy to replicate.

(I'm working on such right now)

Long and short: it seems likely that I'd frequently NOT want all new tables
added to replication, at least not all of them, all the time.

What would seem valuable, to me, would be to have a CREATE event
trigger that lets me know the OID and/or fully qualified name of the new
object so that perhaps the replication system:

a) Has some kind of rule system to detect if it wants to replicate it,

b) Logs the change so a human might know later that there's new stuff
that probably ought to be replicated

c) Perhaps a human might put replication into a new "suggestive"
mode, a bit akin to Slony's "EXECUTE SCRIPT", but where the human

essentially says, "Here, I'm running DDL against this connection for a
while, and I'd be grateful if Postgres told Slony to capture all the new
tables and sequences and replicated them."

There are kind of two approaches:

a) Just capture the OIDs, and have replication go back later and grab

the table definition once the dust clears on the master

b) We need to capture ALL the DDL, whether CREATE or ALTER, and

forward it, altered to have fully qualified names on everything so that

we don't need to duplicate all the "set search_path" requests and
such.

I suppose there's also a third...

c) Have a capability to put an event trigger function in place that makes
DDL requests fail.

That's more useful than you'd think; if, by default, we make them fail,

and with an error messages such as

"DDL request failed as it was not submitted using slonik DDL TOOL"

then we have protection against uncontrolled application of DDL.

DDL TOOL would switch off the "fail trigger", possibly trying to
capture the DDL, or perhaps just capturing the statements passed
to it so they get passed everywhere. (That heads back to a) and b);

what should get captured...)

I'm not sure that all of that is totally internally coherent, but I hope there
are some ideas worth thinking about.

--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

Re: Add CREATE support to event triggers

От

Petr Jelinek

Дата:

10 ноября 2014 г., 22:57:14

On 10/11/14 23:37, Christopher Browne wrote:
> On 8 November 2014 17:49, Robert Haas <robertmhaas@gmail.com
>  >
>  > My impression, based on something Christopher Brown said a few years
>  > ago, is that Slony's DDL trigger needs are largely satisfied by the
>  > existing event trigger stuff.  It would be helpful to get confirmation
>  > as to whether that's the case.
>
> I'm not sure that a replication system that intends to do partial
> replication
> (e.g. - being selective of what objects are to be replicated) will
> necessarily
> want to use the CREATE event triggers to capture creates.
>
> Several cases pop up with different answers:
> a) I certainly don't want to replicate temporary tables
> b) I almost certainly don't want to replicate unlogged tables
> c) For "more ordinary" tables, I'm not sure I want to extend Slony
>      to detect them and add them automatically, because there
>      are annoying sub-cases
>
>     c.1) If I'm working on data conversion, I may create not totally
>           temporary tables that are nonetheless not worthy to replicate.
>           (I'm working on such right now)
>
> Long and short: it seems likely that I'd frequently NOT want all new tables
> added to replication, at least not all of them, all the time.

I don't see how this is problem with using CREATE event triggers, just 
put logic in your trigger that handles this, you get the object identity 
of the object that is being created/altered so you can get any info 
about it you wish and you can easily filter however you want.

> There are kind of two approaches:
>
> a) Just capture the OIDs, and have replication go back later and grab
> the table definition once the dust clears on the master
>
> b) We need to capture ALL the DDL, whether CREATE or ALTER, and
> forward it, altered to have fully qualified names on everything so that
> we don't need to duplicate all the "set search_path" requests and
> such.
>

This is basically what this patch gives you (actually both the canonized 
command and the identity)?

> I suppose there's also a third...
>
> c) Have a capability to put an event trigger function in place that makes
> DDL requests fail.
>
> That's more useful than you'd think; if, by default, we make them fail,
> and with an error messages such as
>    "DDL request failed as it was not submitted using slonik DDL TOOL"
>

You can do that already, it's even the example in the event trigger 
documentation.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

Re: Add CREATE support to event triggers

От

Andres Freund

Дата:

10 ноября 2014 г., 22:59:00

On 2014-11-10 17:37:40 -0500, Christopher Browne wrote:
> On 8 November 2014 17:49, Robert Haas <robertmhaas@gmail.com> wrote:
> > > We could just integrate those parts, and be done with it. But would that
> > > actually be a good thing for the community? Then slony needs to do it
> > > and potentially others as well? Then auditing can't use it? Then
> > > potential schema tracking solutions can't use it?
> >
> > Do you think Slony is really going to use this?  I guess we can let
> > the Slony guys speak for themselves, but I've been skeptical since day
> > one that this is the best way to do DDL replication, and I still am.
> > There are lots of ways that a replicated DDL statement can fail on the
> > replicas, and what are you going to do then?  It's too late to show
> > the user the error message, so you can throw it in a log someplace and
> > hope that somebody notices, but that's it.  It makes a lot more sense
> > to me to use some kind of a tool that applies the DDL in a coordinated
> > fashion on all nodes - or even just do it manually, since it might
> > very well be desirable to take the lock on different nodes at widely
> > different times, separated by a switchover.  I certainly think there's
> > a use-case for what you're trying to do here, but I don't think it'll
> > be right for everyone.
> >
> > Certainly, if the Slony guys - or some other team building an
> > out-of-core replication solutions says, hey, we really want this in
> > core, that would considerably strengthen the argument for putting it
> > there.  But I haven't heard anyone say that yet - unlike logical
> > decoding, were we did have other people expressing clear interest in
> > using it.
> 
> > > There've been people for a long while asking about triggers on catalogs
> > > for that purpose. IIRC Jan was one of them.
> >
> > My impression, based on something Christopher Brown said a few years
> > ago, is that Slony's DDL trigger needs are largely satisfied by the
> > existing event trigger stuff.  It would be helpful to get confirmation
> > as to whether that's the case.
> 
> I'm not sure that a replication system that intends to do partial
> replication
> (e.g. - being selective of what objects are to be replicated) will
> necessarily
> want to use the CREATE event triggers to capture creates.
> 
> Several cases pop up with different answers:
> a) I certainly don't want to replicate temporary tables
> b) I almost certainly don't want to replicate unlogged tables

Those are quite easy to recognize and skip.

> c) For "more ordinary" tables, I'm not sure I want to extend Slony
>     to detect them and add them automatically, because there
>     are annoying sub-cases
> 
>    c.1) If I'm working on data conversion, I may create not totally
>          temporary tables that are nonetheless not worthy to replicate.
>          (I'm working on such right now)

Sure. you might not want to do it automatically all the time - but I
think it's a very useful default mode. Once you can replicate CREATEs
per se, it's easy to add logic (in a couple lines of plpgsql or
whatever) to only do so in a certain schema or similar.

But the main reason all this is interesting isn't so much CREATE
itself. But that it can be (and Alvaro has mostly done it!) for ALTER as
well. And there it imo becomes really interesting. Because you can quite
easily check whether the affected relation is being replicated you can
just emit the DDL when that's the case. And that makes DDL in a
logically replicated setup *much* easier.

> Long and short: it seems likely that I'd frequently NOT want all new tables
> added to replication, at least not all of them, all the time.

Agreed. That's quite possible with the design here - you get the
creation commands and can decide whether you want to do anything with
them. You're not forced to insert them into your replication queue or
whatever you're using for that.

> What would seem valuable, to me, would be to have a CREATE event
> trigger that lets me know the OID and/or fully qualified name of the new
> object so that perhaps the replication system:
> 
> a) Has some kind of rule system to detect if it wants to replicate it,

Sure.

> b) Logs the change so a human might know later that there's new stuff
> that probably ought to be replicated

Sure.

> c) Perhaps a human might put replication into a new "suggestive"
> mode, a bit akin to Slony's "EXECUTE SCRIPT", but where the human
> essentially says, "Here, I'm running DDL against this connection for a
> while, and I'd be grateful if Postgres told Slony to capture all the new
> tables and sequences and replicated them."

Sure.

Some of that already is possible with the current event triggers - and
all of it would be possible with the suggested functionality here.

An old version of bdr, employing the functionality presented here, had
the following (simplified) event trigger:

CREATE OR REPLACE FUNCTION bdr.queue_commands()RETURNS event_trigger
LANGUAGE plpgsql
AS $function$
DECLARE   r RECORD;
BEGIN   -- don't recursively log ddl commands   IF pg_replication_identifier_is_replaying() THEN      RETURN;   END
IF;
   IF current_setting('bdr.skip_ddl_replication')::boolean THEN       RETURN;   END IF;
   FOR r IN SELECT * FROM pg_event_trigger_get_creation_commands()   LOOP       /* ignore temporary objects */       IF
r.schema= 'pg_temp' THEN           CONTINUE;       END IF;
 
       /* ignore objects that are part of an extension */       IF r.in_extension THEN           CONTINUE;       END
IF;
       INSERT INTO bdr.bdr_queued_commands(           lsn, queued_at,           command_tag, command, executed       )
    VALUES (           pg_current_xlog_location(),           NOW(),           r.command_tag,
pg_catalog.pg_event_trigger_expand_command(r.command),          'false'       );
 
       IF r.command_tag = 'CREATE TABLE' and r.object_type = 'table' THEN           EXECUTE 'CREATE TRIGGER
truncate_triggerAFTER TRUNCATE ON ' ||               r.identity ||               ' FOR EACH STATEMENT EXECUTE PROCEDURE
bdr.queue_truncate()';      END IF;   END LOOP;
 
END;
$function$;

It seems to me that'd pretty much allow all of your wishes above?

> There are kind of two approaches:
> 
> a) Just capture the OIDs, and have replication go back later and grab
> the table definition once the dust clears on the master

That's problematic imo if there's further changes to the table
definition - not exactly a infrequent thing.

> b) We need to capture ALL the DDL, whether CREATE or ALTER, and
> forward it, altered to have fully qualified names on everything so that
> we don't need to duplicate all the "set search_path" requests and
> such.

That's essentially where this patch is going. The submission is only
CREATE, but once the design is agreed upon, ALTER is coming. The patch
we're currently using for BDR has a good chunk of ALTER support.

> I suppose there's also a third...
> 
> c) Have a capability to put an event trigger function in place that makes
> DDL requests fail.

That's actually already quite possible.

CREATE OR REPLACE FUNCTION prevent_ddl_outside_slonik()RETURNS event_trigger
LANGUAGE plpgsql
AS $function$
BEGIN   IF current_setting('slony.inside_slonik')::boolean THEN       RETURN;   END IF;
   RAISE ERROR 'hey there, use slonik!!!';

END $function$;

What's missing is that you probably want to look into the object to see
whether it's temporary, unlogged, et al.

> That's more useful than you'd think; if, by default, we make them fail,
> and with an error messages such as
>   "DDL request failed as it was not submitted using slonik DDL TOOL"
> 
> then we have protection against uncontrolled application of DDL.
> 
> DDL TOOL would switch off the "fail trigger", possibly trying to
> capture the DDL, or perhaps just capturing the statements passed
> to it so they get passed everywhere.   (That heads back to a) and b);
> what should get captured...)
> 
> I'm not sure that all of that is totally internally coherent, but I
> hope there are some ideas worth thinking about.

I think it's actually quite coherent - and at least partially mirrors
the thoughts that have gone into this...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Add CREATE support to event triggers

От

Jim Nasby

Дата:

11 ноября 2014 г., 02:02:32

On 11/10/14, 4:58 PM, Andres Freund wrote:
> But the main reason all this is interesting isn't so much CREATE
> itself. But that it can be (and Alvaro has mostly done it!) for ALTER as
> well. And there it imo becomes really interesting. Because you can quite
> easily check whether the affected relation is being replicated you can
> just emit the DDL when that's the case. And that makes DDL in a
> logically replicated setup*much*  easier.

+1. Adding columns is a PITA, you have to manually ensure you do it on all slaves first.

Drop is somewhat worse, because you have to do it on the master first, opposite of the (more usual) case of adding a
column.

RENAME is a complete disaster.

Handing scripts to your replication system to execute isn't a very good alternative either; it assumes that you
actuallyhave a script (bad assumption with ORMs), and that you have a reasonable way to get that script to wherever you
runyour replication system.

I will also weigh in that there are a LOT of cases that binary replication doesn't cover. I find it interesting that
priorto creating built in replication, the community stance was "We won't build that because there's too many different
usecases", but now some folks are saying that everyone should just use streaming rep and be done with it. :P

-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Add CREATE support to event triggers

От

Robert Haas

Дата:

12 ноября 2014 г., 21:36:38

On Mon, Nov 10, 2014 at 9:02 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> +1. Adding columns is a PITA, you have to manually ensure you do it on all
> slaves first.
>
> Drop is somewhat worse, because you have to do it on the master first,
> opposite of the (more usual) case of adding a column.
>
> RENAME is a complete disaster.
>
> Handing scripts to your replication system to execute isn't a very good
> alternative either; it assumes that you actually have a script (bad
> assumption with ORMs), and that you have a reasonable way to get that script
> to wherever you run your replication system.

I don't disagree with any of that, but running the command on the
master and then propagating it to the slaves where it may succeed or
fail - and if it fails, you won't know unless you're watching the logs
on those machines, and, oh by the way, replication will also be broken
- is not good either.  We would never have shipped physical
replication solution with that kind of limitation.  What has made
streaming replication so popular and successful with PostgreSQL users
over the last five years is that, while it's a bit of a pain to get
set up, once you have it set up, it is rock-solid.  If there were a
series of legal SQL commands that you could execute without error on a
cluster of servers connected by streaming replication such that, when
you got done, replication was broken, our users would scream bloody
murder, or just stop using PostgreSQL.  I think the approach to DDL
replication that Alvaro, Andres, et al. are proposing here is
absolutely fine - even praiseworthy - as an out-of-core solution that
users can adopt if they are willing to accept the associated risks, as
many users probably will be.  But you wouldn't convince me to run it
on any production system for which I was responsible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add CREATE support to event triggers

От

Andres Freund

Дата:

12 ноября 2014 г., 21:58:27

On 2014-11-12 16:36:30 -0500, Robert Haas wrote:
> On Mon, Nov 10, 2014 at 9:02 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> > +1. Adding columns is a PITA, you have to manually ensure you do it on all
> > slaves first.
> >
> > Drop is somewhat worse, because you have to do it on the master first,
> > opposite of the (more usual) case of adding a column.
> >
> > RENAME is a complete disaster.
> >
> > Handing scripts to your replication system to execute isn't a very good
> > alternative either; it assumes that you actually have a script (bad
> > assumption with ORMs), and that you have a reasonable way to get that script
> > to wherever you run your replication system.
> 
> I don't disagree with any of that, but running the command on the
> master and then propagating it to the slaves where it may succeed or
> fail - and if it fails, you won't know unless you're watching the logs
> on those machines, and, oh by the way, replication will also be broken
> - is not good either.

That's already the situation today with all the logical replication
solutions. They *constantly* break in the field. Most commonly because
of DDL differences.

I don't understand why you think it's likely for logical replication to
break due to this? You mean because deparse yielded a invalid statement?
In a normal single master setup there really shouldn't be scenarios
where that happens? Except bugs - but as you know we had more than in
HS/SR as well?

Or are you worried about stuff like ALTER TABLE ... USING()? I think
that's the replication solution's job to take care of/prevent.

For multimaster the situation is more complex, I agree, but I don't
think in core stuff needs to solve that for now?

We are thinking about extending 2PC to be usable across logical
decoding. That's a relatively simple patch. Then it's possible to do the
DDL on the primary, ship it to the standby, apply it there, and only
afterwards commit the prepared xact if that was successfull.  That's
quite cool - but somewhat in the remit of the replication solution.

> I think the approach to DDL
> replication that Alvaro, Andres, et al. are proposing here is
> absolutely fine - even praiseworthy - as an out-of-core solution that
> users can adopt if they are willing to accept the associated risks, as
> many users probably will be.  But you wouldn't convince me to run it
> on any production system for which I was responsible.

The solution here doesn't force you to do that, does it? It's something
that can be used by more than replication solution?

I just don't see the alternative you're proposing? I've so far not even
seen a credible *sketch* of an alternative design that also can handle
ALTER.  The only current alternatives are 1) the user inserts some
events into the queue manually. If they depend on any local state you're
screwed. If they have syntax errors they're often screwed. 2). The user
does all actions on the standby first. Then on the primary. That's hard
for ALTER ADD COLUMN and similar, and just about impossible for renaming
things.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Add CREATE support to event triggers

От

Robert Haas

Дата:

13 ноября 2014 г., 12:17:40

On Wed, Nov 12, 2014 at 4:58 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> That's already the situation today with all the logical replication
> solutions. They *constantly* break in the field. Most commonly because
> of DDL differences.

Right.  And that's why it's cool that logical decoding can operate
through DDL differences.  The apply side might not be able to cope
with what pops out, but that's not logical decoding's fault, and
different apply-sides can adopt different policies as to how to deal
with whatever problems crop up.

> I don't understand why you think it's likely for logical replication to
> break due to this? You mean because deparse yielded a invalid statement?
> In a normal single master setup there really shouldn't be scenarios
> where that happens? Except bugs - but as you know we had more than in
> HS/SR as well?

I don't know exactly what you mean by "a normal single master setup".
Surely the point of logical decoding is that the replica might not be
identical to the master.  And if it isn't, then a command that
succeeded on the master might fail on the standby - for example,
because an object by that name already exists there, or because a type
doesn't exist there.  (Even if you replicate a CREATE EXTENSION
command, there's no guarantee that the .so exists on the target.) Then
what?

This is basically the same problem as multi-master replication
conflicts, except with DDL.  Resolving replication conflicts is not a
very easy thing to get right even if you're only concerned about the
rows in the tables.  It's probably harder if you're worried about the
schema, too.

> We are thinking about extending 2PC to be usable across logical
> decoding. That's a relatively simple patch. Then it's possible to do the
> DDL on the primary, ship it to the standby, apply it there, and only
> afterwards commit the prepared xact if that was successfull.  That's
> quite cool - but somewhat in the remit of the replication solution.

That would certainly make the user aware of a quite a few kinds of
errors that might otherwise go undetected.

> The solution here doesn't force you to do that, does it? It's something
> that can be used by more than replication solution?

In theory, yes.

> I just don't see the alternative you're proposing? I've so far not even
> seen a credible *sketch* of an alternative design that also can handle
> ALTER.  The only current alternatives are 1) the user inserts some
> events into the queue manually. If they depend on any local state you're
> screwed. If they have syntax errors they're often screwed. 2). The user
> does all actions on the standby first. Then on the primary. That's hard
> for ALTER ADD COLUMN and similar, and just about impossible for renaming
> things.

It's a really hard problem.

I don't think it's possible to make statement-based replication
no-fail.  Physical replication is basically no-fail because it just
says, hey, go write these bytes into this page, and we can pretty much
always do that. But statement-based logical replication means
basically executing arbitrary chunks of code all over the backend, and
there is just no way to guarantee that code won't throw an error.  So
the best we can do is to hope that those errors will get reported back
to the user, which is going to require some kind of distributed
transaction.  Your idea to just run the replicated DDL statements on
the standby before committing on the master is one approach to that
problem, and probably the simplest one, but not the only one - one can
imagine something that resembles true clustering, for example.

By the way, the fact that you're planning to do log-based replication
of DML and trigger-based replication of DDL scares the crap out of me.
I'm not sure how that's going to work at all if the two are
interleaved in the same transaction.  Also, relying on triggers for
replication is generally not awesome, because it increases
administrative complexity.  Event triggers are probably better in that
regard than ordinary triggers, because they're database-wide, but I
don't think they solve the problem completely.  But the thing that
scares me even more is that the DDL replication is not only
trigger-based, but statement-based.  Why don't we do logical
replication by recording all of the SQL statements that run on the
master and re-executing them on the standby?  Well, because we all
know that there will be plenty of important cases where that doesn't
yield the same results on both servers.  There's no intrinsic reason
why that shouldn't also be a problem for DDL replication, and indeed
it is.  This patch set is trying to patch around that by finding a way
to emit a revised DDL statement that is guaranteed to do exactly the
same thing on both machines, and it's probably possible to do that in
most cases, but it's probably not possible to do that in all cases.

To be clear, none of this is a reason to reject this patch set; I've
explained my reasons for being unhappy with it elsewhere, and I think
they are valid, but these are not them.  These are just ruminations on
the difficulty of doing truly robust DDL replication.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add CREATE support to event triggers

От

Andres Freund

Дата:

13 ноября 2014 г., 12:45:33

On 2014-11-13 07:17:32 -0500, Robert Haas wrote:
> On Wed, Nov 12, 2014 at 4:58 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > That's already the situation today with all the logical replication
> > solutions. They *constantly* break in the field. Most commonly because
> > of DDL differences.
> 
> Right.  And that's why it's cool that logical decoding can operate
> through DDL differences.  The apply side might not be able to cope
> with what pops out, but that's not logical decoding's fault, and
> different apply-sides can adopt different policies as to how to deal
> with whatever problems crop up.

I think pretty much all of the solutions just say "oops, you're on your
own". And I can't blame them for that. Once there's a schema difference
and it causes problem there's really not much that can be done.

> > I don't understand why you think it's likely for logical replication to
> > break due to this? You mean because deparse yielded a invalid statement?
> > In a normal single master setup there really shouldn't be scenarios
> > where that happens? Except bugs - but as you know we had more than in
> > HS/SR as well?
> 
> I don't know exactly what you mean by "a normal single master setup".
> Surely the point of logical decoding is that the replica might not be
> identical to the master.

I actually think that that's not primary the point if you talk about
individual objects. The majority of objects will be exactly the same on
all nodes. If you actually want to have differening objects on the nodes
you'll have to opt out/in (depending on your solution) of ddl
replication for those objects.

> And if it isn't, then a command that
> succeeded on the master might fail on the standby - for example,
> because an object by that name already exists there, or because a type
> doesn't exist there.  (Even if you replicate a CREATE EXTENSION
> command, there's no guarantee that the .so exists on the target.) Then
> what?

Sure. There's reasons logical replication isn't always a win. But I
don't see why that's a reason not to make it as robust as possible.

Btw, the .so problem exists for wal shipping as as well.

> This is basically the same problem as multi-master replication
> conflicts, except with DDL.  Resolving replication conflicts is not a
> very easy thing to get right even if you're only concerned about the
> rows in the tables.  It's probably harder if you're worried about the
> schema, too.

I don't think it's a sane thing to do multimaster with differing schemas
for individual relations, except maybe additional nonunique indexes.

> > We are thinking about extending 2PC to be usable across logical
> > decoding. That's a relatively simple patch. Then it's possible to do the
> > DDL on the primary, ship it to the standby, apply it there, and only
> > afterwards commit the prepared xact if that was successfull.  That's
> > quite cool - but somewhat in the remit of the replication solution.
> 
> That would certainly make the user aware of a quite a few kinds of
> errors that might otherwise go undetected.

It is (or rather would be) a generally quite cool feature imo ;)

> > The solution here doesn't force you to do that, does it? It's something
> > that can be used by more than replication solution?
> 
> In theory, yes.

What's the practical point here?

> > I just don't see the alternative you're proposing? I've so far not even
> > seen a credible *sketch* of an alternative design that also can handle
> > ALTER.  The only current alternatives are 1) the user inserts some
> > events into the queue manually. If they depend on any local state you're
> > screwed. If they have syntax errors they're often screwed. 2). The user
> > does all actions on the standby first. Then on the primary. That's hard
> > for ALTER ADD COLUMN and similar, and just about impossible for renaming
> > things.
> 
> It's a really hard problem.
> 
> I don't think it's possible to make statement-based replication
> no-fail.

I think generally logical replication has more failure cases than
physical ones. Which you seem to agree with.

> Physical replication is basically no-fail because it just
> says, hey, go write these bytes into this page, and we can pretty much
> always do that. But statement-based logical replication means
> basically executing arbitrary chunks of code all over the backend, and
> there is just no way to guarantee that code won't throw an error.  So
> the best we can do is to hope that those errors will get reported back
> to the user, which is going to require some kind of distributed
> transaction.  Your idea to just run the replicated DDL statements on
> the standby before committing on the master is one approach to that
> problem, and probably the simplest one, but not the only one - one can
> imagine something that resembles true clustering, for example.

I think that's generally not what people need for primary/standby
cases (of subsets of tables). In practice there aren't many failures
like that besides schema differences. And there's just many usecases
that aren't doable with physical replication, so we can't advise people
doing that.

> By the way, the fact that you're planning to do log-based replication
> of DML and trigger-based replication of DDL scares the crap out of me.
> I'm not sure how that's going to work at all if the two are
> interleaved in the same transaction.

Maybe that's based on a misunderstanding. All the event trigger does is
insert a event into a (local) queue. That's then shipped to the other
side using the same replication mechanisms as used for rows. Then, when
receiving changes in that ddl queue the standby performs those actions
before continuing with the replay.
That makes the interleaving on the standby to be pretty much the same as
on the primary.

> Also, relying on triggers for
> replication is generally not awesome, because it increases
> administrative complexity.  Event triggers are probably better in that
> regard than ordinary triggers, because they're database-wide, but I
> don't think they solve the problem completely.

Maybe. It's not that hard to make the DDL solution record dependencies
against the event triggers to prohibit the user from dropping them or
such if that's your worry.

We can easily (additionally) define "implicit" event triggers that are
declared using a hook. Then some replication extension can just force
them to be applied.

> But the thing that
> scares me even more is that the DDL replication is not only
> trigger-based, but statement-based.  Why don't we do logical
> replication by recording all of the SQL statements that run on the
> master and re-executing them on the standby?  Well, because we all
> know that there will be plenty of important cases where that doesn't
> yield the same results on both servers.  There's no intrinsic reason
> why that shouldn't also be a problem for DDL replication, and indeed
> it is.  This patch set is trying to patch around that by finding a way
> to emit a revised DDL statement that is guaranteed to do exactly the
> same thing on both machines, and it's probably possible to do that in
> most cases, but it's probably not possible to do that in all cases.

Yes, it's not trivial. And I think there's some commands where you might
not want to try but either scream ERROR or just rereplicate the whole
relation or such.

I very strongly feel that we (as postgres devs) have a *much* higher
chance of recognizing these cases than either some random users (that
write slonik scripts or similar) or some replication solution authors
that aren't closely involved with -hackers.

> These are just ruminations on the difficulty of doing truly robust DDL
> replication.

Right, it's far from easy. But imo that's an argument for providing the
tools to do it as robust as we can in core.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Add CREATE support to event triggers

От

Robert Haas

Дата:

14 ноября 2014 г., 17:39:05

On Thu, Nov 13, 2014 at 7:45 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> Right.  And that's why it's cool that logical decoding can operate
>> through DDL differences.  The apply side might not be able to cope
>> with what pops out, but that's not logical decoding's fault, and
>> different apply-sides can adopt different policies as to how to deal
>> with whatever problems crop up.
>
> I think pretty much all of the solutions just say "oops, you're on your
> own". And I can't blame them for that. Once there's a schema difference
> and it causes problem there's really not much that can be done.

Agreed.

>> I don't know exactly what you mean by "a normal single master setup".
>> Surely the point of logical decoding is that the replica might not be
>> identical to the master.
>
> I actually think that that's not primary the point if you talk about
> individual objects. The majority of objects will be exactly the same on
> all nodes. If you actually want to have differening objects on the nodes
> you'll have to opt out/in (depending on your solution) of ddl
> replication for those objects.

I'm not saying it's the primary point; I'm just saying that it can happen.

>> And if it isn't, then a command that
>> succeeded on the master might fail on the standby - for example,
>> because an object by that name already exists there, or because a type
>> doesn't exist there.  (Even if you replicate a CREATE EXTENSION
>> command, there's no guarantee that the .so exists on the target.) Then
>> what?
>
> Sure. There's reasons logical replication isn't always a win. But I
> don't see why that's a reason not to make it as robust as possible.

I agree.

> Btw, the .so problem exists for wal shipping as as well.

Not in the same way.  You might not be able to access the data on the
standby if the .so defining the datatype isn't present, but the
replication itself doesn't care.

>> This is basically the same problem as multi-master replication
>> conflicts, except with DDL.  Resolving replication conflicts is not a
>> very easy thing to get right even if you're only concerned about the
>> rows in the tables.  It's probably harder if you're worried about the
>> schema, too.
>
> I don't think it's a sane thing to do multimaster with differing schemas
> for individual relations, except maybe additional nonunique indexes.

But that's exactly what you ARE doing, isn't it?  I mean, if you
replicate in only one direction, nothing keeps someone from modifying
things on the replica independently of BDR, and if you replicate in
both directions, well that's multi-master.

>> > The solution here doesn't force you to do that, does it? It's something
>> > that can be used by more than replication solution?
>>
>> In theory, yes.
>
> What's the practical point here?

I am quite doubtful about whether there will ever be a second, working
implementation, so I see all of this code - and the maintenance effort
associated with it - as something that will really only benefit BDR.
I understand that you don't see it that way, and I'm not saying that
you are offering anything in bad faith, but it looks to me like even
with all of this very substantial amount of infrastructure, you're
still going to need a big pile of additional code inside BDR to
actually make it work, and I don't hear anyone else offering to
develop something similar.

> I think generally logical replication has more failure cases than
> physical ones. Which you seem to agree with.

Yes, I agree with that.

>> Physical replication is basically no-fail because it just
>> says, hey, go write these bytes into this page, and we can pretty much
>> always do that. But statement-based logical replication means
>> basically executing arbitrary chunks of code all over the backend, and
>> there is just no way to guarantee that code won't throw an error.  So
>> the best we can do is to hope that those errors will get reported back
>> to the user, which is going to require some kind of distributed
>> transaction.  Your idea to just run the replicated DDL statements on
>> the standby before committing on the master is one approach to that
>> problem, and probably the simplest one, but not the only one - one can
>> imagine something that resembles true clustering, for example.
>
> I think that's generally not what people need for primary/standby
> cases (of subsets of tables). In practice there aren't many failures
> like that besides schema differences. And there's just many usecases
> that aren't doable with physical replication, so we can't advise people
> doing that.

I can't really follow this, especially the first sentence.

>> By the way, the fact that you're planning to do log-based replication
>> of DML and trigger-based replication of DDL scares the crap out of me.
>> I'm not sure how that's going to work at all if the two are
>> interleaved in the same transaction.
>
> Maybe that's based on a misunderstanding. All the event trigger does is
> insert a event into a (local) queue. That's then shipped to the other
> side using the same replication mechanisms as used for rows. Then, when
> receiving changes in that ddl queue the standby performs those actions
> before continuing with the replay.
> That makes the interleaving on the standby to be pretty much the same as
> on the primary.

OK.  But that's also not something that I can really imagine us ever
adopting in core.  If we were going to do DDL replication in core, I
have to believe we'd find a way to put all of the information in the
WAL stream, not use triggers.

> Yes, it's not trivial. And I think there's some commands where you might
> not want to try but either scream ERROR or just rereplicate the whole
> relation or such.
>
> I very strongly feel that we (as postgres devs) have a *much* higher
> chance of recognizing these cases than either some random users (that
> write slonik scripts or similar) or some replication solution authors
> that aren't closely involved with -hackers.

It's clearly the job of the replication solution authors to figure
these details out.  I'm not going to get into the business of passing
judgement on the relative competence of different groups of
replication developers.  There's no question that 2ndQuadrant is doing
great, exciting work, but we should not tilt the playing field in a
way that crowds other people out, even new players that haven't yet
written their first line of code.

> Right, it's far from easy. But imo that's an argument for providing the
> tools to do it as robust as we can in core.

I agree with that on general principle, but I do not have confidence
in this patch set as the best foundation for DDL replication.  You're
talking about layering a lot of other complexity on top of the
deparsing stuff in this patch to produce a usable solution, and even
then I am not really convinced that it's going to be the final
solution to this problem.  Maybe it will, but I think there are enough
chinks in the armor to be really, really cautious about being certain
this is the right road to go down.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add CREATE support to event triggers

От

Andres Freund

Дата:

14 ноября 2014 г., 18:18:52

On 2014-11-14 12:38:52 -0500, Robert Haas wrote:
> >> This is basically the same problem as multi-master replication
> >> conflicts, except with DDL.  Resolving replication conflicts is not a
> >> very easy thing to get right even if you're only concerned about the
> >> rows in the tables.  It's probably harder if you're worried about the
> >> schema, too.
> >
> > I don't think it's a sane thing to do multimaster with differing schemas
> > for individual relations, except maybe additional nonunique indexes.
> 
> But that's exactly what you ARE doing, isn't it?  I mean, if you
> replicate in only one direction, nothing keeps someone from modifying
> things on the replica independently of BDR, and if you replicate in
> both directions, well that's multi-master.

Meh. By that definition any logical replication solution does multi
master replication. Either you tell your users that that's not allowed,
or you just prevent it by technical means. Absolutely the same is true
for table contents.

FWIW, in BDR we *do* prevent schemas from being modified independently
on different nodes (unless you set the 'running with scissors' guc).

> I am quite doubtful about whether there will ever be a second, working
> implementation, so I see all of this code - and the maintenance effort
> associated with it - as something that will really only benefit BDR.
> I understand that you don't see it that way, and I'm not saying that
> you are offering anything in bad faith, but it looks to me like even
> with all of this very substantial amount of infrastructure, you're
> still going to need a big pile of additional code inside BDR to
> actually make it work, and I don't hear anyone else offering to
> develop something similar.

I don't know what to say about this. I don't see any other realistic way
to perform DDL replication in logical rep, and nearly all my
conversations with users have indicated that they want that.

I think it's a good idea to structure independent features in a way that
other solutions can reuse them. But I sure as hell can't force them to
use it - especially as there's unfortunately not too much development
going on in the existing logical replication solutions for postgres.

Btw, I really think there's quite legitimate use cases for this besides
logical replication solutions - e.g. schema tracking is quite a sensible
use case.

> >> By the way, the fact that you're planning to do log-based replication
> >> of DML and trigger-based replication of DDL scares the crap out of me.
> >> I'm not sure how that's going to work at all if the two are
> >> interleaved in the same transaction.
> >
> > Maybe that's based on a misunderstanding. All the event trigger does is
> > insert a event into a (local) queue. That's then shipped to the other
> > side using the same replication mechanisms as used for rows. Then, when
> > receiving changes in that ddl queue the standby performs those actions
> > before continuing with the replay.
> > That makes the interleaving on the standby to be pretty much the same as
> > on the primary.
> 
> OK.  But that's also not something that I can really imagine us ever
> adopting in core.

Well, that bit really depends on what the user (e.g. a replication
solution, or a schema tracking feature) needs. There's certain things
that you can quite easily do as part of core (e.g. insert something into
the WAL stream), that you just can't as external code.

I don't think there's any external replication solution that won't have
some form of internal queue to manipulate its internal state. For an
external solution such a queue currently pretty much has to be some
table, but for an eventual in core solution it could be done
differently.

> If we were going to do DDL replication in core, I have to believe we'd
> find a way to put all of the information in the WAL stream, not use
> triggers.

I agree that we might want to represent the transport to standbys
differently for something in core. That there's many different ways the
deparsing output could be used imo is a good reason why that part of the
mechanism isn't part of this submission.

I don't really understand the arguments against triggers in general
though. We're already using them quite extensively - I don't see why DDL
replication has to meet some completely different bar than say foreign
key checks or deferred uniqueness checks. We easily can add 'implicit'
event triggers, that aren't defined inside the catalog if we feel like
it. I'm just not sure we really would need/want to.

> > Yes, it's not trivial. And I think there's some commands where you might
> > not want to try but either scream ERROR or just rereplicate the whole
> > relation or such.
> >
> > I very strongly feel that we (as postgres devs) have a *much* higher
> > chance of recognizing these cases than either some random users (that
> > write slonik scripts or similar) or some replication solution authors
> > that aren't closely involved with -hackers.
> 
> It's clearly the job of the replication solution authors to figure
> these details out.  I'm not going to get into the business of passing
> judgement on the relative competence of different groups of
> replication developers.

It's not a question of relative competence. It's a question of the focus
of that competence. There's just not many reasons a replication solution
developer needs to have detailed knowledge about postgres' internal
parsetrees for DDL statements. It's not like extensive documentation
about their meaning exists.

> There's no question that 2ndQuadrant is doing
> great, exciting work, but we should not tilt the playing field in a
> way that crowds other people out, even new players that haven't yet
> written their first line of code.

Huh? I'm not seing how I'm doing that in any way.

> > Right, it's far from easy. But imo that's an argument for providing the
> > tools to do it as robust as we can in core.
> 
> I agree with that on general principle, but I do not have confidence
> in this patch set as the best foundation for DDL replication.

Can you please hint at some other workable design? I don't think there
really is anything else.

How should DDL be replicated in a logical replication solution but via
DDL? It's not like just replicating the catalog rows or anything like
that is something even remotely sane.  As stated before, everything the
user has to do manually is *much* more error prone than what's presented
here.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Add CREATE support to event triggers

От

Robert Haas

Дата:

15 ноября 2014 г., 12:55:20

On Fri, Nov 14, 2014 at 1:18 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> I think it's a good idea to structure independent features in a way that
> other solutions can reuse them. But I sure as hell can't force them to
> use it - especially as there's unfortunately not too much development
> going on in the existing logical replication solutions for postgres.
>
> Btw, I really think there's quite legitimate use cases for this besides
> logical replication solutions - e.g. schema tracking is quite a sensible
> use case.

Well, as I already said, despite my doubts about the general utility
of this feature, I'm willing to see us take it IF we have a testing
framework that will reliably catch bugs, including bugs of omission.
Without that, I'm very confident it's going to be a maintenance
nightmare, and I believe you admitted yourself that that concern was
reasonable.

> Can you please hint at some other workable design? I don't think there
> really is anything else.

I think this really depends on what you mean by "anything else".  Any
DDL replication solution is necessarily going to involve the following
steps:

1. Define some set of primitives such that any imaginable DDL
operation can be expressed as a series of those primitives.
2. Find a way to capture those events as they happen.
3. Serialize them into some wire format and transport that format to
the replica.
4. Apply them, possibly coordinating in some way with the master so
that the user's original request fails if the apply fails.

There are meaningful choices at every step.  You're proposing that the
primitives should be "anything that can be expressed as a complete SQL
command against a single object" (I think - what are you going to emit
for an ALL IN SCHEMA op - that thing itself, or a similar operation
against each object in the schema?); that the capture mechanism should
be an event trigger that inserts into a queue table; that the
serialization format should be a JSON language designed to allow
reassembly of the corresponding SQL statement; and that the apply
coordination mechanism should be 2PC.  But none of that is written in
stone.

As far as deciding what primitives to use, I think the principal
decision to be made here is as to the appropriate level of
granularity.  For example, CREATE TABLE foo (a int DEFAULT 1, b int,
CHECK (b > 42)) could be emitted as a single event saying that a table
was created.  But it could also be emitted as create-table (foo),
add-column (foo, a, int), add-column (foo, b, int), add-column-default
(a, 1), add-check-constraint (foo, b > 42).  The appeal of a more
fine-grained set of primitives is that there might be fewer of them,
and that each of them might be simpler; one of the things that makes
physical replication so reliable is that its primitives are very
simple and thus easy to verify.

The possible event-capture mechanisms seem to be to have either (a)
event trigger or (b) a hook function in some yet-to-be-defined place
or (c) core code which will either (i) write each event to a table,
(ii) write each event directly into WAL, or perhaps (iii) write it
someplace else (file? in-memory queue?  network socket?).

There are lots of possible serialization formats.

Coordinating with the master could involve 2PC, as you propose; or
trying to somehow validate that a given series of events is a valid
state transformation based on the starting state on the standby before
doing the operation on the master; or the use of a distributed
transaction coordinated by something like PG-XC's global transaction
manager; or you can skip it and hope for the best.

In addition to the decisions above, you can try to prevent failures by
restricting certain changes from happening, or you can let users
change what they like and hope for the best.  Different solutions can
have different mechanisms for controlling which objects are under
replication and which changes are not; or even allowing some
individual DDL statements to opt out of replication while forcing
others to participate.  Administratively, solutions can be built to
facilitate easy replication of an entire database to another node, or
more specific applications like sharding, where creating a table on a
master node creates child tables on a bunch of slave nodes, but
they're not all identical, because we're partitioning the data so that
only some of it will be on each node - thus the constraints and so on
will be different.

BDR has one set of answers to all of these questions, and despite my
worries about a few points here and there, they are not stupid
answers.  But they are not the ONLY answers either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add CREATE support to event triggers

От

Bruce Momjian

Дата:

27 ноября 2014 г., 01:17:09

On Sat, Nov  8, 2014 at 05:56:00PM +0100, Andres Freund wrote:
> On 2014-11-08 11:52:43 -0500, Tom Lane wrote:
> > Adding a similar
> > level of burden to support a feature with a narrow use-case seems like
> > a nonstarter from here.
> 
> I don't understand this statement. In my experience the lack of a usable
> replication solution that allows temporary tables and major version
> differences is one of the most, if not *the* most, frequent criticisms
> of postgres I hear. How is this a narrow use case?

How would replicating DDL handle cases where the master and slave
servers have different major versions and the DDL is only supported by
the Postgres version on the master server?  If it would fail, does this
limit the idea that logical replication allows major version-different
replication?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: Add CREATE support to event triggers

От

Alvaro Herrera

Дата:

27 ноября 2014 г., 01:43:12

Bruce Momjian wrote:
> On Sat, Nov  8, 2014 at 05:56:00PM +0100, Andres Freund wrote:
> > On 2014-11-08 11:52:43 -0500, Tom Lane wrote:
> > > Adding a similar level of burden to support a feature with a
> > > narrow use-case seems like a nonstarter from here.
> > 
> > I don't understand this statement. In my experience the lack of a
> > usable replication solution that allows temporary tables and major
> > version differences is one of the most, if not *the* most, frequent
> > criticisms of postgres I hear. How is this a narrow use case?
> 
> How would replicating DDL handle cases where the master and slave
> servers have different major versions and the DDL is only supported by
> the Postgres version on the master server?

Normally you would replicate between an older master and a newer
replica, so this shouldn't be an issue.  I find it unlikely that we
would de-support some syntax that works in an older version: it would
break pg_dump, for one thing.

In other words I view cross-version replication as a mechanism to
upgrade, not something that you would use permanently.  Once you
finish upgrading, promote the newer server and ditch the old master.

> If it would fail, does this limit the idea that logical replication
> allows major version-different replication?

Not in my view, at least.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Add CREATE support to event triggers

От

Stephen Frost

Дата:

27 ноября 2014 г., 02:01:18

* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:
> Bruce Momjian wrote:
> > How would replicating DDL handle cases where the master and slave
> > servers have different major versions and the DDL is only supported by
> > the Postgres version on the master server?
>
> Normally you would replicate between an older master and a newer
> replica, so this shouldn't be an issue.  I find it unlikely that we
> would de-support some syntax that works in an older version: it would
> break pg_dump, for one thing.

While I tend to agree with you that it's not something we're likely to
do, how would it break pg_dump?  We have plenty of version-specific
logic in pg_dump and we could certainly generate a different syntax in
a newer version than we did in an older version, if the newer server was
expecting something different.  We've always held that you should use
the version of pg_dump which match the server you are moving *to*, after
all.

> In other words I view cross-version replication as a mechanism to
> upgrade, not something that you would use permanently.  Once you
> finish upgrading, promote the newer server and ditch the old master.

I agree with this also.

> > If it would fail, does this limit the idea that logical replication
> > allows major version-different replication?
>
> Not in my view, at least.

I'm all for having logical replication be a way to do major version
different replication (particularly for the purposes of upgrades), but
it shouldn't mean we can never de-support a given syntax.

As one example, we've discussed a few times removing certain table-level
privileges on the basis that they practically mean you own the table.
Perhaps that'll never actually happen, but if it does, logical
replication would need to deal with it.
Thanks,
    Stephen

Re: Add CREATE support to event triggers

От

Bruce Momjian

Дата:

27 ноября 2014 г., 13:23:54

On Wed, Nov 26, 2014 at 09:01:13PM -0500, Stephen Frost wrote:
> * Alvaro Herrera (alvherre@2ndquadrant.com) wrote:
> > Bruce Momjian wrote:
> > > How would replicating DDL handle cases where the master and slave
> > > servers have different major versions and the DDL is only supported by
> > > the Postgres version on the master server?
> > 
> > Normally you would replicate between an older master and a newer
> > replica, so this shouldn't be an issue.  I find it unlikely that we
> > would de-support some syntax that works in an older version: it would
> > break pg_dump, for one thing.

I like the idea of older master/new replica, but what about
bidirectional replication?

Would the create/alter/drop WAL locical structure remain consistent
across major versions, or would the code have to read at least one older
version?  Would we limit it to one?

> While I tend to agree with you that it's not something we're likely to
> do, how would it break pg_dump?  We have plenty of version-specific
> logic in pg_dump and we could certainly generate a different syntax in
> a newer version than we did in an older version, if the newer server was
> expecting something different.  We've always held that you should use
> the version of pg_dump which match the server you are moving *to*, after
> all.

pg_upgrade avoids having to deal with major version changes by
leveraging pg_dump.  Is it possible to have the new replica use the new
pg_dump to connect to the old master to get a CREATE command that it can
execute?  Would that avoid having to ship CREATE syntax?  What it
wouldn't help with is ALTER and DROP though.  (We do have ALTER but I
think only for inheritance cases.)

> > In other words I view cross-version replication as a mechanism to
> > upgrade, not something that you would use permanently.  Once you
> > finish upgrading, promote the newer server and ditch the old master.
> 
> I agree with this also.

> > > If it would fail, does this limit the idea that logical replication
> > > allows major version-different replication?
> > 
> > Not in my view, at least.
> 
> I'm all for having logical replication be a way to do major version
> different replication (particularly for the purposes of upgrades), but
> it shouldn't mean we can never de-support a given syntax.
> 
> As one example, we've discussed a few times removing certain table-level
> privileges on the basis that they practically mean you own the table.
> Perhaps that'll never actually happen, but if it does, logical
> replication would need to deal with it.

Should we just tell the user not to modify the database schema during
this period?  Should we have a server mode which disables DDL?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: Add CREATE support to event triggers

От

Robert Haas

Дата:

02 декабря 2014 г., 16:50:00

> Normally you would replicate between an older master and a newer
> replica, so this shouldn't be an issue.  I find it unlikely that we
> would de-support some syntax that works in an older version: it would
> break pg_dump, for one thing.

The most common way in which we break forward-compatibility is by
reserving additional keywords.  Logical replication can deal with this
by quoting all identifiers, so it's not a big issue.

It's not the only possible issue; it is certainly conceivable that we
could change something in the server and then try to change pg_dump to
compensate.  I think we wouldn't do that with anything very big, but
suppose spgist were supplanted by a new and better access method
tsigps.  Well, we might figure that there are few enough people
accustomed to the current syntax that we can get away with hacking the
pg_dump output, and yes, anyone with an existing dump from an older
system will have problems, but there won't be many of them, and they
can always adjust the dump by hand.  If we ever decided to do such a
thing, whatever syntax transformation we did in pg_dump would have to
be mimicked by any logical replication solution.

I think it's legitimate to say that this could be a problem, but I
think it probably won't be a problem very often, and I think when it
is a problem it probably won't be a very big problem, because we
already have good reasons not to break the ability to restore older
dumps on newer servers more often than absolutely necessary.

One of the somewhat disappointing things about this is that we're
talking about adding a lot of new deparsing code to the server that
probably, in some sense, duplicates pg_dump.  Perhaps in an ideal
world there would be a way to avoid that, but in the world we really
live in, there probably isn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add CREATE support to event triggers

От

Michael Paquier

Дата:

08 декабря 2014 г., 00:26:44

On Thu, Nov 27, 2014 at 10:16 AM, Bruce Momjian <bruce@momjian.us> wrote:
> On Sat, Nov  8, 2014 at 05:56:00PM +0100, Andres Freund wrote:
>> On 2014-11-08 11:52:43 -0500, Tom Lane wrote:
>> > Adding a similar
>> > level of burden to support a feature with a narrow use-case seems like
>> > a nonstarter from here.
>>
>> I don't understand this statement. In my experience the lack of a usable
>> replication solution that allows temporary tables and major version
>> differences is one of the most, if not *the* most, frequent criticisms
>> of postgres I hear. How is this a narrow use case?
>
> How would replicating DDL handle cases where the master and slave
> servers have different major versions and the DDL is only supported by
> the Postgres version on the master server?  If it would fail, does this
> limit the idea that logical replication allows major version-different
> replication?
Marking this patch as "Returned with feedback". Even with the
more-fundamental arguments of putting such replication solution
in-core or not (I am skeptical as well btw), on a
code-base-discussion-only I don't think that this patch is acceptable
as-is without more structure of jsonb to do on-memory manipulation of
containers (aka remove ObjTree stuff).
Regards,
-- 
Michael

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Add CREATE support to event triggers

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения