Обсуждение: Add CREATE support to event triggers
Hello, Attached you can find a very-much-WIP patch to add CREATE info support for event triggers (normalized commands). This patch builds mainly on two things: 1. Dimitri's "DDL rewrite" patch he submitted way back, in http://www.postgresql.org/message-id/m2zk1j9c44.fsf@2ndQuadrant.fr I borrowed the whole ddl_rewrite.c code, and tweaked it a bit. There are several things still wrong with it and which will need to be fixed before a final patch can even be contemplated; but there are some questions that require a consensus answer before I go and fix it all, because what it will look like will depend on said answers. 2. The ideas we used to build DROP support. Mainly, the interesting thing here is the fact that we use a SRF to report, at ddl_command_end, all the objects that were created during execution of that command. We do this by collecting them in a list in some raw form somewhere during ProcessUtility, and then spitting them out if the SRF is called. I think the general idea is sound, although of course I admit there might be bugs in the implementation. Note this patch doesn't try to add any kind of ALTER support. I think this is fine in principle, because we agreed that we would attack each kind of command separately (divide to conquer and all that); but there is a slight problem for some kind of objects that are represented partly as ALTER state during creation; for example creating a table with a sequence uses ALTER SEQ/OWNED BY internally at some point. There might be other cases I'm missing, also. (The REFRESH command is nominally also supported.) Now about the questions I mentioned above: a) It doesn't work to reverse-parse the statement nodes in all cases; there are several unfixable bugs if we only do that. In order to create always-correct statements, we need access to the catalogs for the created objects. But if we are doing catalog access, then it seems to me that we can do away with the statement parse nodes completely and just reconstruct the objects from catalog information. Shall we go that route? b) What's the best design of the SRF output? This patch proposes two columns, object identity and create statement. Is there use for anything else? Class/object OIDs perhaps, schema OIDs for objects types that have it? I don't see any immediate need to that info, but perhaps someone does. c) The current patch stashes all objects in a list, whenever there's an event trigger function. But perhaps some users want to have event triggers and not have any use for the CREATE statements. So one idea is to require users that want the objects reported to call a special function in a ddl_command_start event trigger which enables collection; if it's not called, objects are not reported. This eliminates performance impact for people not using it, but then maybe it will be surprising for people that call the SRF and find that it always returns empty. d) There's a new routine uncookConstraintOrDefault. This takes a raw expression, runs transformExpr() on it, and then deparses it (possibly setting up a deparse context based on some relation). This is a somewhat funny thing to be doing, so if there are other ideas on how to handle this, I'm all ears. This patch doesn't include doc changes or regression tests. Those will be added in a later version. For now, you can see this code in action by running installing an event trigger like this: CREATE OR REPLACE FUNCTION snitch() RETURNS event_trigger LANGUAGE plpgsql AS $$ DECLARE r RECORD; BEGIN FOR r IN SELECT * FROM pg_event_trigger_get_normalized_commands() LOOP RAISE NOTICE 'object created: id %, statement %', r.identity, r.command; END LOOP; END; $$; CREATE EVENT TRIGGER snitch ON ddl_command_end EXECUTE PROCEDURE snitch(); And then running the DDL of your preference. Be warned that there are many rough edges, so some objects might be incompletely reported (or bogusly so, in particular with regards to objects in search_path but not in the first position of it); but if you see any crashes, please let me know. Also, many commands are not supported yet by the ddl_rewrite.c code and they return "unsupported FOOBAR". I didn't want to go to the effort of writing code for them that might end up being ripped out if we wanted to go the route of using only syscache to build CREATE statements. That part is pretty mechanical, and not much code anyway. I just left whatever Dimitri already had in his patch. I would have continued working some more on this patch before submitting, except that I'm going on vacations for a few days starting tomorrow and we have a rule that no complex patches can go in at CF4 without it having been discussed in previous commitfests. My intention is that the patch that arrives at CF4 is free of design surprises and ready for a final review before commit, so if there's any disagreement with the direction this is going, please speak up now. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On Fri, Nov 8, 2013 at 10:33 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Attached you can find a very-much-WIP patch to add CREATE info support > for event triggers (normalized commands). This patch builds mainly on > two things: > > 1. Dimitri's "DDL rewrite" patch he submitted way back, in > http://www.postgresql.org/message-id/m2zk1j9c44.fsf@2ndQuadrant.fr > > I borrowed the whole ddl_rewrite.c code, and tweaked it a bit. There > are several things still wrong with it and which will need to be fixed > before a final patch can even be contemplated; but there are some > questions that require a consensus answer before I go and fix it all, > because what it will look like will depend on said answers. I'm still unhappy with this whole concept. It adds a significant maintenance burden that must be carried by everyone who adds new DDL syntax in the future, and it's easy to imagine this area of the code ending up very poorly tested and rife with bugs. After all, event triggers, as nifty as they are, are only going to be used by a small minority of users. And new DDL features are typically going to be things that are fairly obscure, so again they will only be used by a small minority of users. I think we need to avoid the situation where we have bugs that can only get found when those minorities happen to intersect. If we're going to have DDL rewrite code, then we need a way of making sure that it gets tested in a comprehensive way on a regular basis. Here's one idea: create a contrib module that (somehow, via APIs to be invented) runs every DDL command that gets executed through the deparsing code, and then parses the result and executes *that* instead of the original command. Then, add a build target that runs the regression test suite in that mode, and get the buildfarm configured to run that build target regularly on at least some machines. That way, adding syntax to the regular regression test suite also serves to test that the deparsing logic for that syntax is working. If we do this, there's still some maintenance burden associated with having DDL deparsing code, but at least our chances of noticing when we've failed to maintain it should be pretty good. The other thing that bothers me here is that, while a normalized command string sounds great in theory, as soon as you want to allow (for example) mapping schema A on node 1 to schema B on node 2, the wheels come off: you'll have to deparse that normalized command string so you can change out the schema name and then reassemble it back into a command string again. So we're going to parse the user input, then deparse it, hand over the results to the application code, which will then parse it, modify that, and deparse it again. At every step of that process, any mistake will lead to subtle bugs in the resulting system. Larry Wall once wrote (approximately) that a good programming language makes simple things simple and hard things possible; I think this design fails the second prong of that test. Now, I guess I can live with that if it's what everyone else wants, but I don't have a great feeling about the long-term utility of it. Exposing the data from the DDL statement in some structured way - like what we've done with drops, or a JSON blob, or something like that, feels much more robust and useful than a normalized command string to me. If the normalized command string can easily be built up from that data, then you don't really need to expose the command string separately. If it can't, then you're not doing a good job exposing the data in a usable form. Saying "well, people can always parse the normalized command string" is a cop-out. Parsing SQL is *hard*; the only external project I know of that parses our SQL syntax well is pgpool, and that's because they copy our parser wholesale, surely not the sort of solution we want to foist off on event trigger authors. Finally, I'm very skeptical of the word "normalized". To me, that sounds like an alias for "modifying the command string in unspecified ways that big brother thinks will be useful to event trigger authors".Color me skeptical. What if somebody doesn't wanttheir command string normalized? What if they want it normalized in a way that's different from the way that we've chosen to normalize it? I fear that this whole direction amounts to "we don't know how to design a real API so let's just do surgery on the command string and call whatever pops out the API". Maybe that's harsh, but if it is I don't know why.The normalization steps we build into this processconstitute assumptions about how the feature will be used, and they back the user into using that feature in just that way and no other. > 2. The ideas we used to build DROP support. Mainly, the interesting > thing here is the fact that we use a SRF to report, at > ddl_command_end, all the objects that were created during execution > of that command. We do this by collecting them in a list in some raw > form somewhere during ProcessUtility, and then spitting them out if > the SRF is called. I think the general idea is sound, although of > course I admit there might be bugs in the implementation. Agreed. > Note this patch doesn't try to add any kind of ALTER support. I think > this is fine in principle, because we agreed that we would attack each > kind of command separately (divide to conquer and all that); Also agreed. > but there > is a slight problem for some kind of objects that are represented partly > as ALTER state during creation; for example creating a table with a > sequence uses ALTER SEQ/OWNED BY internally at some point. There might > be other cases I'm missing, also. (The REFRESH command is nominally > also supported.) There are lots of places in the DDL code where we pass around constructed parse trees as a substitute for real argument lists. I expect that many of those places will eventually get refactored away, so it's important that this feature does not end up relying on accidents of the current code structure. For example, an AlterTableStmt can actually do a whole bunch of different things in a single statement: SOME of those are handled by a loop in ProcessUtilitySlow() and OTHERS are handled internally by AlterTable. I'm pretty well convinced that that division of labor is a bad design, and I think it's important that this feature doesn't make that dubious design decision into documented behavior. > Now about the questions I mentioned above: > > a) It doesn't work to reverse-parse the statement nodes in all cases; > there are several unfixable bugs if we only do that. In order to create > always-correct statements, we need access to the catalogs for the > created objects. But if we are doing catalog access, then it seems to > me that we can do away with the statement parse nodes completely and > just reconstruct the objects from catalog information. Shall we go that > route? That works well for CREATE and is definitely appealing in some ways; it probably means needing a whole lot of what's in pg_dump in the backend also. Of course, converting the pg_dump code to a library that can be linked into either a client or the server would make a lot of people happy. Making pg_dump much dumber (at least as regards future versions) and relying on new backend code to serve the same purpose would perhaps be reasonable as well, although I know Tom is against it. But having two copies of that code doesn't sound very good; and we'd need some way of thoroughly testing the one living in the backend. > c) The current patch stashes all objects in a list, whenever there's an > event trigger function. But perhaps some users want to have event > triggers and not have any use for the CREATE statements. So one idea is > to require users that want the objects reported to call a special > function in a ddl_command_start event trigger which enables collection; > if it's not called, objects are not reported. This eliminates > performance impact for people not using it, but then maybe it will be > surprising for people that call the SRF and find that it always returns > empty. This seems like premature optimization to me, but I think I lost the last iteration of this argument. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Nov 8, 2013 at 10:33 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Hello, > > Attached you can find a very-much-WIP patch to add CREATE info support > for event triggers (normalized commands). This patch builds mainly on > two things: > > 1. Dimitri's "DDL rewrite" patch he submitted way back, in > http://www.postgresql.org/message-id/m2zk1j9c44.fsf@2ndQuadrant.fr > > I borrowed the whole ddl_rewrite.c code, and tweaked it a bit. There > are several things still wrong with it and which will need to be fixed > before a final patch can even be contemplated; but there are some > questions that require a consensus answer before I go and fix it all, > because what it will look like will depend on said answers. I have tried this out; the patch applies fine. Note that it induces (modulo my environment) a failure in "make check". The "opr_sanity" test fails. postgres@cbbrowne ~/p/s/t/regress> diff expected/opr_sanity.out results/opr_sanity.out 348,350c348,351 < oid | proname < -----+--------- < (0 rows) --- > oid | proname > ------+------------------------------------------ > 3567 | pg_event_trigger_get_normalized_commands > (1 row) That's a minor problem; the trouble there is that the new function is not yet documented. Not a concern at this stage. > 2. The ideas we used to build DROP support. Mainly, the interesting > thing here is the fact that we use a SRF to report, at > ddl_command_end, all the objects that were created during execution > of that command. We do this by collecting them in a list in some raw > form somewhere during ProcessUtility, and then spitting them out if > the SRF is called. I think the general idea is sound, although of > course I admit there might be bugs in the implementation. > > Note this patch doesn't try to add any kind of ALTER support. I think > this is fine in principle, because we agreed that we would attack each > kind of command separately (divide to conquer and all that); but there > is a slight problem for some kind of objects that are represented partly > as ALTER state during creation; for example creating a table with a > sequence uses ALTER SEQ/OWNED BY internally at some point. There might > be other cases I'm missing, also. (The REFRESH command is nominally > also supported.) I imagine that the things we create in earlier stages may help with later stages, so it's worth *some* planning so we can hope not to build bits now that push later enhancements into corners that they can't get out of. But I'm not disagreeing at all. > Now about the questions I mentioned above: > > a) It doesn't work to reverse-parse the statement nodes in all cases; > there are several unfixable bugs if we only do that. In order to create > always-correct statements, we need access to the catalogs for the > created objects. But if we are doing catalog access, then it seems to > me that we can do away with the statement parse nodes completely and > just reconstruct the objects from catalog information. Shall we go that > route? Here's a case where it doesn't work. testevent@localhost-> create schema foo; CREATE SCHEMA testevent@localhost-> create domain foo.bar integer; CREATE DOMAIN testevent@localhost-> CREATE OR REPLACE FUNCTION snitch() RETURNS event_trigger LANGUAGE plpgsql AS $$ testevent$# DECLARE testevent$# r RECORD; testevent$# BEGIN testevent$# FOR r IN SELECT * FROM pg_event_trigger_get_normalized_commands() testevent$# LOOP testevent$# RAISE NOTICE 'object created: id %, statement %', testevent$# r.identity, r.command; testevent$# END LOOP; testevent$# END; testevent$# $$; CREATE FUNCTION testevent@localhost-> CREATE EVENT TRIGGER snitch ON ddl_command_end EXECUTE PROCEDURE snitch(); CREATE EVENT TRIGGER testevent@localhost-> set search_path to public, foo; SET testevent@localhost-> create table foo.foo2 (acolumn bar); NOTICE: object created: id foo.foo2, statement CREATE TABLE foo.foo2 (acolumn bar) CREATE TABLE The trouble is that you have only normalized the table name. The domain, bar, needs its name normalized as well. > b) What's the best design of the SRF output? This patch proposes two > columns, object identity and create statement. Is there use for > anything else? Class/object OIDs perhaps, schema OIDs for objects types > that have it? I don't see any immediate need to that info, but perhaps > someone does. Probably an object type is needed as well, to know if it's a table or a domain or a sequence or whatever. I suspect that what will be needed to make it all usable is some sort of "structured" form. That is in keeping with Robert Haas' discomfort with the normalized form. My "minor" gripe is that you haven't normalized enough (e.g. - it should be CREATE TABLE foo.foo2 (acolumn foo.bar), capturing the normalization of data types that are referenced). But Robert's quite right that users may want more than just to capture that literally; they may want to modify it, for instance, by shifting to another schema. And it will be no fun at all if you have to construct an SQL parser in order to change it. The "big change" in Slony 2.2 was essentially to solve that problem; we had been capturing logged updates as near-raw SQL, and discovered that we really needed to capture it in a form that allows restructuring it without needing to reparse it. It seems to me that the same need applies here. The answer we came up with in Slony was to change from "SQL fragment" to "array of attribute names and attribute values." I expect that is quite unsuitable for CREATE events, being too "flat" a representation. > c) The current patch stashes all objects in a list, whenever there's an > event trigger function. But perhaps some users want to have event > triggers and not have any use for the CREATE statements. So one idea is > to require users that want the objects reported to call a special > function in a ddl_command_start event trigger which enables collection; > if it's not called, objects are not reported. This eliminates > performance impact for people not using it, but then maybe it will be > surprising for people that call the SRF and find that it always returns > empty. Hmm. I'm not sure I follow what the issue is here (I think Robert has the same problem; if that's so, then it's probably worth a separate explanation/discussion). I think there are actually more options here than just worrying about CREATE... I can see caring/not caring about CREATE events. Also caring/not caring about different sorts of objects. Thus, an event trigger might either filter on event type (CREATE versus DROP versus ALTER vs ...), as well as on object type (TABLE vs VIEW vs SCHEMA vs DOMAIN vs ...). I could see that being either part of the trigger definition or as an attribute established within the trigger. Thus... create event trigger foo snitch on ddl_command_end on create, drop, alter for table, view execute procedure snitch(); And having internals create or replace function snitch() returns event_trigger language plpgsql as $$ declare r record; begin for r in select * from pg_event_trigger_get_normalized_commands loop raise notice 'action: % object type % objectname % object id % normalized statement %', r.type, r.identity, r.id, r.command; end loop end; $$; And I suspect that there needs to be another value returned (not necessarily by the same function) that is some form of the parse tree. If the parse tree is already available in memory on the C side of things, then making it accessible by calling a function that allows "walking" the tree and invoking methods on the elements is a perfectly reasonable idea. Ideally, there's something that can be usefully called that is written in pl/pgsql; mandating that we punt to C to do arbitrary magic doesn't seem nice. (If that's not reading coherently, well, blame it on my mayor being on crack! ;-) ) > d) There's a new routine uncookConstraintOrDefault. This takes a raw > expression, runs transformExpr() on it, and then deparses it (possibly > setting up a deparse context based on some relation). This is a > somewhat funny thing to be doing, so if there are other ideas on how to > handle this, I'm all ears. Is that perhaps some of the "magic" to address my perhaps incoherent bit about "punt to C"? It looks a bit small to be that... -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
On Thu, Nov 21, 2013 at 2:36 AM, Christopher Browne <cbbrowne@gmail.com> wrote: >> b) What's the best design of the SRF output? This patch proposes two >> columns, object identity and create statement. Is there use for >> anything else? Class/object OIDs perhaps, schema OIDs for objects types >> that have it? I don't see any immediate need to that info, but perhaps >> someone does. > > Probably an object type is needed as well, to know if it's a table or > a domain or a sequence or whatever. > > I suspect that what will be needed to make it all usable is some sort of > "structured" form. That is in keeping with Robert Haas' discomfort with > the normalized form. > > My "minor" gripe is that you haven't normalized enough (e.g. - it should be > CREATE TABLE foo.foo2 (acolumn foo.bar), capturing the normalization of > data types that are referenced). > > But Robert's quite right that users may want more than just to capture that > literally; they may want to modify it, for instance, by shifting to another > schema. And it will be no fun at all if you have to construct an SQL parser > in order to change it. It's certainly much easier to transform a structured representation into a valid SQL command string than it is to do the inverse. You may be interested in an extension that I'm working on for a client, which provides relation_create, relation_alter, and relation_drop event triggers for 9.3: https://bitbucket.org/malloclabs/pg_schema_triggers I decided to create a composite type for each event, which can be accessed from within the event trigger by calling a special function. For example, the relation_alter event supplies the relation Oid and the "old" and "new" pg_class rows. It's easy to then examine the old vs. new rows and determine what has changed. Kind regards, Andrew Tipton
Robert Haas escribió: > The other thing that bothers me here is that, while a normalized > command string sounds great in theory, as soon as you want to allow > (for example) mapping schema A on node 1 to schema B on node 2, the > wheels come off: you'll have to deparse that normalized command string > so you can change out the schema name and then reassemble it back into > a command string again. So we're going to parse the user input, then > deparse it, hand over the results to the application code, which will > then parse it, modify that, and deparse it again. I have considered several ideas on this front, but most of them turn out to be useless or too cumbersome to use. What seems most adequate is to build a command string containing certain patterns, and an array of replacement values for such patterns; each pattern corresponds to one element that somebody might want modified in the command. As a trivial example, a command such as CREATE TABLE foo (bar INTEGER); would return a string like CREATE TABLE ${table_schema}.${table_name} (bar INTEGER); and the replacement array would be {table_schema => "public", table_name => "foo"} If we additionally provide a function to expand the replacements in the string, we would have the base funcionality of a normalized command string. If somebody wants to move the table to some other schema, they can simply modify the array to suit their taste, and again expand using the provided function; this doesn't require parsing SQL. It's likely that there are lots of fine details that need exploring before this is a fully workable idea -- I have just started work on it, so please bear with me. I think this is basically what you call "a JSON blob". > Finally, I'm very skeptical of the word "normalized". To me, that > sounds like an alias for "modifying the command string in unspecified > ways that big brother thinks will be useful to event trigger authors". > Color me skeptical. What if somebody doesn't want their command > string normalized? What if they want it normalized in a way that's > different from the way that we've chosen to normalize it? I fear that > this whole direction amounts to "we don't know how to design a real > API so let's just do surgery on the command string and call whatever > pops out the API". You might criticize the example above by saying that I haven't considered using a JSON array for the list of table elements; in a sense, I would be being Big Brother and deciding that you (as the user) don't need to mess up with the column/constraints list in a table you're creating. I thought about it and wasn't sure if there was a need to implement that bit in the first iteration of this implementation. One neat thing about this string+replaceables idea is that we can later change what replaceable elements the string has, thus providing more functionality (thus, for example, perhaps the column list can be altered in v2 that was a "constant" in v1), without breaking existing users of the v1. > > but there > > is a slight problem for some kind of objects that are represented partly > > as ALTER state during creation; for example creating a table with a > > sequence uses ALTER SEQ/OWNED BY internally at some point. There might > > be other cases I'm missing, also. (The REFRESH command is nominally > > also supported.) > > There are lots of places in the DDL code where we pass around > constructed parse trees as a substitute for real argument lists. I > expect that many of those places will eventually get refactored away, > so it's important that this feature does not end up relying on > accidents of the current code structure. For example, an > AlterTableStmt can actually do a whole bunch of different things in a > single statement: SOME of those are handled by a loop in > ProcessUtilitySlow() and OTHERS are handled internally by AlterTable. > I'm pretty well convinced that that division of labor is a bad design, > and I think it's important that this feature doesn't make that dubious > design decision into documented behavior. Yeah, the submitted patch took care of these elements by invoking the appropriate collection function at all the right places. Most of it happened right in ProcessUtilitySlow, but other bits were elsewhere (for instance, sub-objects created in a complex CREATE SCHEMA command). I mentioned the ALTER SEQUENCE example above because that happens in a code path that wasn't even close to the rest of the stuff. > > Now about the questions I mentioned above: > > > > a) It doesn't work to reverse-parse the statement nodes in all cases; > > there are several unfixable bugs if we only do that. In order to create > > always-correct statements, we need access to the catalogs for the > > created objects. But if we are doing catalog access, then it seems to > > me that we can do away with the statement parse nodes completely and > > just reconstruct the objects from catalog information. Shall we go that > > route? > > That works well for CREATE and is definitely appealing in some ways; > it probably means needing a whole lot of what's in pg_dump in the > backend also. Of course, converting the pg_dump code to a library > that can be linked into either a client or the server would make a lot > of people happy. Making pg_dump much dumber (at least as regards > future versions) and relying on new backend code to serve the same > purpose would perhaps be reasonable as well, although I know Tom is > against it. But having two copies of that code doesn't sound very > good; and we'd need some way of thoroughly testing the one living in > the backend. Agreed on requiring thorough testing. > > c) The current patch stashes all objects in a list, whenever there's an > > event trigger function. But perhaps some users want to have event > > triggers and not have any use for the CREATE statements. So one idea is > > to require users that want the objects reported to call a special > > function in a ddl_command_start event trigger which enables collection; > > if it's not called, objects are not reported. This eliminates > > performance impact for people not using it, but then maybe it will be > > surprising for people that call the SRF and find that it always returns > > empty. > > This seems like premature optimization to me, but I think I lost the > last iteration of this argument. I don't think you did; we ended up using a design that both doesn't require explicit user action, nor causes a performance hit. I wasn't sure about this, but thought I would mention just it in case. Maybe we will end up doing the same here and have a create_sql_objects event or something like that --- not real sure of this part yet, as there's a lot of infrastructure to design and code first. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jan 3, 2014 at 9:05 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> The other thing that bothers me here is that, while a normalized >> command string sounds great in theory, as soon as you want to allow >> (for example) mapping schema A on node 1 to schema B on node 2, the >> wheels come off: you'll have to deparse that normalized command string >> so you can change out the schema name and then reassemble it back into >> a command string again. So we're going to parse the user input, then >> deparse it, hand over the results to the application code, which will >> then parse it, modify that, and deparse it again. > > I have considered several ideas on this front, but most of them turn out > to be useless or too cumbersome to use. What seems most adequate is to > build a command string containing certain patterns, and an array of > replacement values for such patterns; each pattern corresponds to one > element that somebody might want modified in the command. As a trivial > example, a command such as > > CREATE TABLE foo (bar INTEGER); > > would return a string like > CREATE TABLE ${table_schema}.${table_name} (bar INTEGER); > > and the replacement array would be > {table_schema => "public", table_name => "foo"} > > If we additionally provide a function to expand the replacements in the > string, we would have the base funcionality of a normalized command > string. If somebody wants to move the table to some other schema, they > can simply modify the array to suit their taste, and again expand using > the provided function; this doesn't require parsing SQL. It's likely > that there are lots of fine details that need exploring before this is a > fully workable idea -- I have just started work on it, so please bear > with me. > > I think this is basically what you call "a JSON blob". I think this direction has some potential. I'm not sure it's right in detail. The exact scheme you propose above won't work if you want to leave out the schema name altogether, and more generally it's not going to help very much with anything other than substituting in identifiers. What if you want to add a column called satellite_id to every table that gets created, for example? What if you want to make the tables UNLOGGED? I don't see how that kind of things is going to work at all cleanly. What I can imagine that might work is that you get a JSON blob for a create table statement and then you have a function make_a_create_table_statement(json) that will turn it back into SQL. You can pass a modified blob (adding, removing, or changing the schema_name or any other element) instead and it will do its thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas escribió: > I think this direction has some potential. I'm not sure it's right in > detail. The exact scheme you propose above won't work if you want to > leave out the schema name altogether, and more generally it's not > going to help very much with anything other than substituting in > identifiers. What if you want to add a column called satellite_id to > every table that gets created, for example? What if you want to make > the tables UNLOGGED? I don't see how that kind of things is going to > work at all cleanly. Thanks for the discussion. I am building some basic infrastructure to make this possible, and will explore ideas to cover these oversights (not posting anything concrete yet because I expect several iterations to crash and burn before I have something sensible to post). > What I can imagine that might work is that you get a JSON blob for a > create table statement and then you have a function > make_a_create_table_statement(json) that will turn it back into SQL. > You can pass a modified blob (adding, removing, or changing the > schema_name or any other element) instead and it will do its thing. I agree, except that I would prefer not to have one function for each DDL statement type; instead we would have a single function that knows to expand arbitrary strings using arbitrary JSON parameter objects, and let ddl_rewrite.c (or whatever we call that file) deal with all the ugliness. One function per statement would increase the maintenance cost more, I think (three extra pg_proc entries every time you add a new object type? Ugh.) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jan 6, 2014 at 1:17 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I agree, except that I would prefer not to have one function for each > DDL statement type; instead we would have a single function that knows > to expand arbitrary strings using arbitrary JSON parameter objects, and > let ddl_rewrite.c (or whatever we call that file) deal with all the > ugliness. Yeah, that might work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Alvaro Herrera escribió: > Robert Haas escribió: > > > I think this direction has some potential. I'm not sure it's right in > > detail. The exact scheme you propose above won't work if you want to > > leave out the schema name altogether, and more generally it's not > > going to help very much with anything other than substituting in > > identifiers. What if you want to add a column called satellite_id to > > every table that gets created, for example? What if you want to make > > the tables UNLOGGED? I don't see how that kind of things is going to > > work at all cleanly. > > Thanks for the discussion. I am building some basic infrastructure to > make this possible, and will explore ideas to cover these oversights > (not posting anything concrete yet because I expect several iterations > to crash and burn before I have something sensible to post). Here's a working example. Suppose the user runs CREATE SCHEMA IF NOT EXISTS "some schema" AUTHORIZATION "some guy"; In an event trigger, the function pg_event_trigger_get_creation_commands() returns the following JSON blob: {"authorization":{"authorization_role":"some guy", "output":"AUTHORIZATION %i{authorization_role}"},"if_not_exists":"IFNOT EXISTS","name":"some schema","output":"CREATE SCHEMA %{if_not_exists} %i{name}%{authorization}"} wherein I have chosen to have a JSON element with the hardcoded name of "output" which is what needs to be expanded; for each %{} parameter found in it, there is an equally-named element in the JSON blob. This can be a string, a NULL, or another JSON object. If it's a string, it expands to that value; if it's an object, recursively an "output" element is expanded in the same way, and the expanded string is used. If there's a NULL element when expanding an object, the whole thing expands to empty. For example, if no AUTHORIZATION clause is specified, "authorization" element is still there, but the "authorization_role" element within it is NULL, and so the whole AUTHORIZATION clause expands to empty and the resulting command contains no authorization clause. This is useful to support the case that someone doesn't have an AUTHORIZATION clause in the CREATE SCHEMA command, and the event trigger injects one simply by setting the authorization_role to some role name. IF NOT EXISTS is handled by defining it to either the string IF NOT EXISTS or to empty if no such clause was specified. The user can modify elements in the JSON to get a different version of the command. (I reckon the "output" can also be modified, but this is probably a bad idea in most/all cases. I don't think there's a need to prohibit this explicitely.) Also, someone might define "if_not_exists" to something completely unrelated, but that would be their own fault. (Maybe we can have some cross-check that the if_not_exists element in JSON cannot be anything other than "IF NOT EXISTS" or the empty string; and that the "output" element remains the same at expansion time than it was at generation time. Perhaps we should even hide the "output" element from the user completely and only add them to the JSON at time of expansion. Not sure it's worth the trouble.) There is another function, pg_event_trigger_expand_creation_command(json), which will expand the above JSON blob and return the following text: CREATE SCHEMA IF NOT EXISTS "some schema" AUTHORIZATION "some guy" Note the identifiers are properly quoted (there are quotes in the JSON blob, but they correspond to JSON's own delimiters). I have defined a 'i' modifier to have %i{} elements, which means that the element is an identifier which might need quoting. I have also defined a %d{} modifier that means to use the element to expand a possibly-qualified dotted name. (There would be no "output" element in this case.) This is to support the case where you have CREATE TABLE public.foo which results in {"table_name":{"schema":"public", "relname":"foo"}} and you want to edit the "table_name" element in the root JSON and set the schema to something else (perhaps NULL), so in the event trigger after expansion you can end up with "CREATE TABLE foo" or "CREATE TABLE private.foo" or whatever. Most likely there are some more rules that will need to be created, but so far this looks sensible. I'm going to play some more with the %d{} stuff, and also with the idea of representing table elements such as columns and constraints as an array. In the meantime please let me know whether this makes sense. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
<p dir="ltr">Hello<p dir="ltr">I don't like this direction. What we can do with JSON from plpgsql? More, JSON is not toorobust format against some future changes.<p dir="ltr">Regards<p dir="ltr">Pavel<div class="gmail_quote">Dne 8.1.201421:43 "Alvaro Herrera" <<a href="mailto:alvherre@2ndquadrant.com">alvherre@2ndquadrant.com</a>> napsal(a):<brtype="attribution" /><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Alvaro Herrera escribió:<br /> > Robert Haas escribió:<br /> ><br /> > > I think thisdirection has some potential. I'm not sure it's right in<br /> > > detail. The exact scheme you propose abovewon't work if you want to<br /> > > leave out the schema name altogether, and more generally it's not<br /> >> going to help very much with anything other than substituting in<br /> > > identifiers. What if you wantto add a column called satellite_id to<br /> > > every table that gets created, for example? What if you wantto make<br /> > > the tables UNLOGGED? I don't see how that kind of things is going to<br /> > > work atall cleanly.<br /> ><br /> > Thanks for the discussion. I am building some basic infrastructure to<br /> > makethis possible, and will explore ideas to cover these oversights<br /> > (not posting anything concrete yet becauseI expect several iterations<br /> > to crash and burn before I have something sensible to post).<br /><br /> Here'sa working example. Suppose the user runs<br /><br /> CREATE SCHEMA IF NOT EXISTS "some schema" AUTHORIZATION "someguy";<br /><br /> In an event trigger, the function pg_event_trigger_get_creation_commands()<br /> returns the followingJSON blob:<br /><br /> {"authorization":{"authorization_role":"some guy",<br /> "output":"AUTHORIZATION%i{authorization_role}"},<br /> "if_not_exists":"IF NOT EXISTS",<br /> "name":"some schema",<br/> "output":"CREATE SCHEMA %{if_not_exists} %i{name} %{authorization}"}<br /><br /> wherein I have chosen tohave a JSON element with the hardcoded name of<br /> "output" which is what needs to be expanded; for each %{} parameter<br/> found in it, there is an equally-named element in the JSON blob. This<br /> can be a string, a NULL, or anotherJSON object.<br /><br /> If it's a string, it expands to that value; if it's an object,<br /> recursively an "output"element is expanded in the same way, and the<br /> expanded string is used.<br /><br /> If there's a NULL elementwhen expanding an object, the whole thing<br /> expands to empty. For example, if no AUTHORIZATION<br /> clause isspecified, "authorization" element is still there, but the<br /> "authorization_role" element within it is NULL, and sothe whole<br /> AUTHORIZATION clause expands to empty and the resulting command contains<br /> no authorization clause. This is useful to support the case that<br /> someone doesn't have an AUTHORIZATION clause in the CREATE SCHEMA<br/> command, and the event trigger injects one simply by setting the<br /> authorization_role to some role name.<br/><br /> IF NOT EXISTS is handled by defining it to either the string IF NOT<br /> EXISTS or to empty if no suchclause was specified.<br /><br /> The user can modify elements in the JSON to get a different version of<br /> the command. (I reckon the "output" can also be modified, but this is<br /> probably a bad idea in most/all cases. I don't thinkthere's a need to<br /> prohibit this explicitely.) Also, someone might define "if_not_exists"<br /> to something completelyunrelated, but that would be their own fault.<br /> (Maybe we can have some cross-check that the if_not_existselement in<br /> JSON cannot be anything other than "IF NOT EXISTS" or the empty string;<br /> and that the"output" element remains the same at expansion time than it<br /> was at generation time. Perhaps we should even hidethe "output"<br /> element from the user completely and only add them to the JSON at time<br /> of expansion. Not sureit's worth the trouble.)<br /><br /> There is another function,<br /> pg_event_trigger_expand_creation_command(json),which will expand the<br /> above JSON blob and return the following text:<br/><br /> CREATE SCHEMA IF NOT EXISTS "some schema" AUTHORIZATION "some guy"<br /><br /> Note the identifiers areproperly quoted (there are quotes in the JSON<br /> blob, but they correspond to JSON's own delimiters). I have defineda<br /> 'i' modifier to have %i{} elements, which means that the element is an<br /> identifier which might need quoting.<br/><br /> I have also defined a %d{} modifier that means to use the element to<br /> expand a possibly-qualifieddotted name. (There would be no "output"<br /> element in this case.) This is to support the case whereyou have<br /><br /> CREATE TABLE public.foo<br /> which results in<br /> {"table_name":{"schema":"public",<br /> "relname":"foo"}}<br /><br /> and you want to edit the "table_name" element in the root JSON and set<br />the schema to something else (perhaps NULL), so in the event trigger<br /> after expansion you can end up with "CREATETABLE foo" or "CREATE TABLE<br /> private.foo" or whatever.<br /><br /> Most likely there are some more rules thatwill need to be created, but<br /> so far this looks sensible.<br /><br /> I'm going to play some more with the %d{}stuff, and also with the idea<br /> of representing table elements such as columns and constraints as an<br /> array. In the meantime please let me know whether this makes sense.<br /><br /> --<br /> Álvaro Herrera <ahref="http://www.2ndQuadrant.com/" target="_blank">http://www.2ndQuadrant.com/</a><br /> PostgreSQL Development, 24x7Support, Training & Services<br /><br /><br /> --<br /> Sent via pgsql-hackers mailing list (<a href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br/> To make changes to your subscription:<br/><a href="http://www.postgresql.org/mailpref/pgsql-hackers" target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/></blockquote></div>
Pavel Stehule escribió: > Hello > > I don't like this direction. What we can do with JSON from plpgsql? We have plenty of JSON functions and operators in SQL, and more to come soon. Is that not enough? > More, JSON is not too robust format against some future changes. Not sure what you mean. This JSON is generated and consumed by our own code, so we only need to concern ourselves with making sure that we can consume (in the expansion function) what we generated in the previous phase. There is no transmission to the exterior. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
CC to hackers restored. Pavel Stehule escribió: > Dne 8.1.2014 23:17 "Alvaro Herrera" <alvherre@2ndquadrant.com> napsal(a): > > > > Pavel Stehule escribió: > > > Hello > > > > > > I don't like this direction. What we can do with JSON from plpgsql? > > > > We have plenty of JSON functions and operators in SQL, and more to come > > soon. Is that not enough? > > No, is not. Im sure. It is wrong a request to parse system internal data, > that is available in structured form. You create string that should be > parsed same time. > > Few functions with OUT parameters are beter than any semistructured string. That was shot down, for good reasons: we assume that the users of this are going to want to modify the command before considering it final. Maybe they want to add a column to each table being created, or they want to change the tablespace if the table name ends with "_big", or they want to change the schema in which it is created. This JSON representations lets you receive the table creation data in a well-known JSON schema; you can tweak individual elements without having to parse the SQL command. And when you're done tweaking, there's a function that lets you produce the SQL command that corresponds to the original with the tweaks you just did. (Please note that, thus far, this facility DOES NOT let you change the table that was created, at least not directly: these event triggers are run AFTER the creation command has completed. You can tweak the command that would be sent to a remote server in a replication swarm, for example. Or create a mirror table for audit purposes. Or perhaps even generate an ALTER TABLE command for the new table.) If by "few functions with OUT parameters" you mean that we need to have one record type that is able to receive all possible CREATE TABLE options, so that you can change them as a record in plpgsql, this doesn't sound too good to me, for three reasons: a) it's going to require too many types and functions (one per statement type); b) cramming the stuff in pg_type.h / pg_proc.h is going to be a horrid task; c) any change is going to require an initdb. -- Alvaro Herrera
On 01/09/2014 04:42 AM, Alvaro Herrera wrote: > If there's a NULL element when expanding an object, the whole thing > expands to empty. For example, if no AUTHORIZATION > clause is specified, "authorization" element is still there, but the > "authorization_role" element within it is NULL, and so the whole > AUTHORIZATION clause expands to empty and the resulting command contains > no authorization clause. I'd like to see this applied consistently to argument-less clauses like IF NOT EXISTS too. So the same rules apply. > IF NOT EXISTS is handled by defining it to either the string IF NOT > EXISTS or to empty if no such clause was specified. I'm not keen on this bit. It puts clauses of syntax into value strings other than the special "output" key. Those keys aren't easily distinguished from user data without clause specific knowledge. I'm not keen on that. Instead, can't we use your already proposed subclause structure? {"authorization":{"authorization_role":"some guy", "output":"AUTHORIZATION %i{authorization_role}"},"if_not_exists":{"output": "IF NOT EXISTS"},"name":"some schema","output":"CREATE SCHEMA %{if_not_exists}%i{name} %{authorization}"} i.e. "if_not_exists" becomes an object. All clauses are objects, all non-object values are user data. (right?). If the clause is absent, the "output" key is the empty string. The issue with that (and with your original proposal) is that you can't tell what these clauses are supposed to be if they're not present in the original query. You can't *enable* "IF NOT EXISTS" without pulling knowledge of that syntax from somewhere else. Depending on the problem you intend to solve there, that might be fine. If it isn't, then instead there just needs to be a key to flag such clauses as present or not. {"authorization":{"authorization_role":"some guy", "output":"AUTHORIZATION %i{authorization_role}"},"if_not_exists":{"output": "IF NOT EXISTS" "present": true},"name":"some schema","output":"CREATESCHEMA %{if_not_exists} %i{name} %{authorization}"} Am I just over-complicating something simple here? My reasoning is that it'd be good to be able to easily tell the difference between *structure* and *user data* in these query trees and do so without possibly version-specific and certainly syntax/clause-specific knowledge about the meaning of every key of every clause. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Craig Ringer escribió: > Instead, can't we use your already proposed subclause structure? > > {"authorization":{"authorization_role":"some guy", > "output":"AUTHORIZATION %i{authorization_role}"}, > "if_not_exists": {"output": "IF NOT EXISTS"}, > "name":"some schema", > "output":"CREATE SCHEMA %{if_not_exists} %i{name} %{authorization}"} > > i.e. "if_not_exists" becomes an object. All clauses are objects, all > non-object values are user data. (right?). If the clause is absent, the > "output" key is the empty string. > > The issue with that (and with your original proposal) is that you can't > tell what these clauses are supposed to be if they're not present in the > original query. You can't *enable* "IF NOT EXISTS" without pulling > knowledge of that syntax from somewhere else. > > Depending on the problem you intend to solve there, that might be fine. Hmm. This seems like a reasonable thing to do, except that I would like the "output" to always be the constant, and have some other way to enable the clause or disable it. With your "present" boolean: so "if_not_exists": {"output": "IF NOT EXISTS", "present": true/false} In fact, I'm now wondering whether this is a better idea than not emitting anything when some element in the output expands to NULL; so it would apply to "authorization" as well; if the command includes the clause, it'd be {"authorization":{"authorization_role":"some guy", "present": true, "output":"AUTHORIZATION%i{authorization_role}"}, and if there wasn't anything, you'd have {"authorization":{"authorization_role": null, "present": false, "output":"AUTHORIZATION%i{authorization_role}"}, so if you want to turn it on and it wasn't, you need to change both the present boolean and also set the authorization_role element; and if you want to turn it off when it was present, just set present to false. > Am I just over-complicating something simple here? I think it's a fair point. > My reasoning is that it'd be good to be able to easily tell the > difference between *structure* and *user data* in these query trees and > do so without possibly version-specific and certainly > syntax/clause-specific knowledge about the meaning of every key of every > clause. Sounds reasonable. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jan 8, 2014 at 10:27 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Craig Ringer escribió: >> Instead, can't we use your already proposed subclause structure? >> >> {"authorization":{"authorization_role":"some guy", >> "output":"AUTHORIZATION %i{authorization_role}"}, >> "if_not_exists": {"output": "IF NOT EXISTS"}, >> "name":"some schema", >> "output":"CREATE SCHEMA %{if_not_exists} %i{name} %{authorization}"} >> >> i.e. "if_not_exists" becomes an object. All clauses are objects, all >> non-object values are user data. (right?). If the clause is absent, the >> "output" key is the empty string. >> >> The issue with that (and with your original proposal) is that you can't >> tell what these clauses are supposed to be if they're not present in the >> original query. You can't *enable* "IF NOT EXISTS" without pulling >> knowledge of that syntax from somewhere else. >> >> Depending on the problem you intend to solve there, that might be fine. > > Hmm. This seems like a reasonable thing to do, except that I would like > the "output" to always be the constant, and have some other way to > enable the clause or disable it. With your "present" boolean: > so > > "if_not_exists": {"output": "IF NOT EXISTS", > "present": true/false} Why not: "if_not_exists": true/false -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas escribió: > On Wed, Jan 8, 2014 at 10:27 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > Hmm. This seems like a reasonable thing to do, except that I would like > > the "output" to always be the constant, and have some other way to > > enable the clause or disable it. With your "present" boolean: > > so > > > > "if_not_exists": {"output": "IF NOT EXISTS", > > "present": true/false} > > Why not: > > "if_not_exists": true/false Yeah, that's another option. If we do this, though, the expansion function would have to know that an "if_not_exist" element expands to IF NOT EXISTS. Maybe that's okay. Right now, the expansion function is pretty stupid, which is nice. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 1/9/14, 11:58 AM, Alvaro Herrera wrote: > Robert Haas escribió: >> On Wed, Jan 8, 2014 at 10:27 PM, Alvaro Herrera >> <alvherre@2ndquadrant.com> wrote: > >>> Hmm. This seems like a reasonable thing to do, except that I would like >>> the "output" to always be the constant, and have some other way to >>> enable the clause or disable it. With your "present" boolean: >>> so >>> >>> "if_not_exists": {"output": "IF NOT EXISTS", >>> "present": true/false} >> >> Why not: >> >> "if_not_exists": true/false > > Yeah, that's another option. If we do this, though, the expansion > function would have to know that an "if_not_exist" element expands to IF > NOT EXISTS. Maybe that's okay. Right now, the expansion function is > pretty stupid, which is nice. Yeah, the source side of this will always have to understand the nuances of every command; it'd be really nice to not burdenthe other side with that as well. The only downside I see is a larger JSON output, but meh. Another advantage is if you really wanted to you could modify the output formatting in the JSON doc to do something radicallydifferent if so inclined... -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On 8 January 2014 20:42, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > CREATE SCHEMA IF NOT EXISTS "some schema" AUTHORIZATION "some guy"; Hmm, given in 9.3 it was OK to have only DROP event triggers, I think it should be equally acceptable to have just CREATE, but without every option on CREATE. CREATE SCHEMA is easily the most complex thing here and would be the command/event to deprioritise if we had any issues getting this done/agreeing something for 9.4. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Jan 9, 2014 at 5:17 PM, Jim Nasby <jim@nasby.net> wrote: > On 1/9/14, 11:58 AM, Alvaro Herrera wrote: >> Robert Haas escribió: >>> >>> On Wed, Jan 8, 2014 at 10:27 PM, Alvaro Herrera >>> <alvherre@2ndquadrant.com> wrote: >> >> >>>> Hmm. This seems like a reasonable thing to do, except that I would like >>>> the "output" to always be the constant, and have some other way to >>>> enable the clause or disable it. With your "present" boolean: >>>> so >>>> >>>> "if_not_exists": {"output": "IF NOT EXISTS", >>>> "present": true/false} >>> >>> >>> Why not: >>> >>> "if_not_exists": true/false >> >> >> Yeah, that's another option. If we do this, though, the expansion >> function would have to know that an "if_not_exist" element expands to IF >> NOT EXISTS. Maybe that's okay. Right now, the expansion function is >> pretty stupid, which is nice. > > Yeah, the source side of this will always have to understand the nuances of > every command; it'd be really nice to not burden the other side with that as > well. The only downside I see is a larger JSON output, but meh. > > Another advantage is if you really wanted to you could modify the output > formatting in the JSON doc to do something radically different if so > inclined... Yeah. I wasn't necessarily objecting to the way Alvaro did it, just asking why he did it that way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jan 10, 2014 at 10:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 8 January 2014 20:42, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > >> CREATE SCHEMA IF NOT EXISTS "some schema" AUTHORIZATION "some guy"; > > Hmm, given in 9.3 it was OK to have only DROP event triggers, I think > it should be equally acceptable to have just CREATE, but without every > option on CREATE. CREATE SCHEMA is easily the most complex thing here > and would be the command/event to deprioritise if we had any issues > getting this done/agreeing something for 9.4. I don't know that I agree with that, but I guess we can cross that bridge when we come to it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 10 January 2014 15:48, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jan 10, 2014 at 10:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> On 8 January 2014 20:42, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> >>> CREATE SCHEMA IF NOT EXISTS "some schema" AUTHORIZATION "some guy"; >> >> Hmm, given in 9.3 it was OK to have only DROP event triggers, I think >> it should be equally acceptable to have just CREATE, but without every >> option on CREATE. CREATE SCHEMA is easily the most complex thing here >> and would be the command/event to deprioritise if we had any issues >> getting this done/agreeing something for 9.4. > > I don't know that I agree with that, but I guess we can cross that > bridge when we come to it. We've come to it... You would prefer either everything or nothing?? On what grounds? -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jan 10, 2014 at 10:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 10 January 2014 15:48, Robert Haas <robertmhaas@gmail.com> wrote: >> On Fri, Jan 10, 2014 at 10:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> On 8 January 2014 20:42, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >>> >>>> CREATE SCHEMA IF NOT EXISTS "some schema" AUTHORIZATION "some guy"; >>> >>> Hmm, given in 9.3 it was OK to have only DROP event triggers, I think >>> it should be equally acceptable to have just CREATE, but without every >>> option on CREATE. CREATE SCHEMA is easily the most complex thing here >>> and would be the command/event to deprioritise if we had any issues >>> getting this done/agreeing something for 9.4. >> >> I don't know that I agree with that, but I guess we can cross that >> bridge when we come to it. > > We've come to it... > > You would prefer either everything or nothing?? On what grounds? I hardly think I need to justify that position. That's project policy and always has been. When somebody implements 50% of a feature, or worse yet 95% of a feature, it violates the POLA for users and doesn't always subsequently get completed, leaving us with long-term warts that are hard to eliminate. It's perfectly fine to implement a feature incrementally if the pieces are individually self-consistent and ideally even useful, but deciding to support every command except one because the last one is hard to implement doesn't seem like a principled approach to anything. It's not even obvious to me that CREATE SCHEMA is all that much harder than anything else and Alvaro has not said that that's the only thing he can't implement (or why) so I think it's entirely premature to make the decision now about which way to proceed - but, OK, sure, if you want to force the issue now, then yeah, I think it's better to have everything or nothing than to have support for only some things justified by nothing more than implementation complexity. Aside from the general issue, in this particular case, I have previously and repeatedly expressed concerns about regression test coverage and suggested a path that would guarantee thorough regression testing but which would require that support be complete for everything present in our regression tests. Although there may be some other plan for guaranteeing thorough regression testing not only now but going forward, I have not seen it proposed here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 10 January 2014 17:07, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jan 10, 2014 at 10:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> On 10 January 2014 15:48, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Fri, Jan 10, 2014 at 10:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>>> On 8 January 2014 20:42, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >>>> >>>>> CREATE SCHEMA IF NOT EXISTS "some schema" AUTHORIZATION "some guy"; >>>> >>>> Hmm, given in 9.3 it was OK to have only DROP event triggers, I think >>>> it should be equally acceptable to have just CREATE, but without every >>>> option on CREATE. CREATE SCHEMA is easily the most complex thing here >>>> and would be the command/event to deprioritise if we had any issues >>>> getting this done/agreeing something for 9.4. >>> >>> I don't know that I agree with that, but I guess we can cross that >>> bridge when we come to it. >> >> We've come to it... >> >> You would prefer either everything or nothing?? On what grounds? > > I hardly think I need to justify that position. Yeh, you do. Everybody does. > That's project policy > and always has been. When somebody implements 50% of a feature, or > worse yet 95% of a feature, it violates the POLA for users and doesn't > always subsequently get completed, leaving us with long-term warts > that are hard to eliminate. So why was project policy violated when we released 9.3 with only DROP event support? Surely that was a worse violation of POLA than my suggestion? It's not reasonable to do something yourself and then object when others suggest doing the same thing. After 3 years we need something useful. I think the "perfect being the enemy of the good" argument applies here after this length of time. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jan 10, 2014 at 12:59 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> That's project policy >> and always has been. When somebody implements 50% of a feature, or >> worse yet 95% of a feature, it violates the POLA for users and doesn't >> always subsequently get completed, leaving us with long-term warts >> that are hard to eliminate. > > So why was project policy violated when we released 9.3 with only DROP > event support? Surely that was a worse violation of POLA than my > suggestion? Well, obviously I didn't think so at the time, or I would have objected. I felt, and still feel, that implementing one kind of event trigger (drop) does not necessarily require implementing another kind (create). I think that's clearly different from implementing either one for only some object types. "This event trigger will be called whenever an object is dropped" is a reasonable contract with the user. "This other event trigger will be called whenever an object is created, unless it happens to be a schema" is much less reasonable. At least in my opinion. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 10 January 2014 18:17, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jan 10, 2014 at 12:59 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> That's project policy >>> and always has been. When somebody implements 50% of a feature, or >>> worse yet 95% of a feature, it violates the POLA for users and doesn't >>> always subsequently get completed, leaving us with long-term warts >>> that are hard to eliminate. >> >> So why was project policy violated when we released 9.3 with only DROP >> event support? Surely that was a worse violation of POLA than my >> suggestion? > > Well, obviously I didn't think so at the time, or I would have > objected. I felt, and still feel, that implementing one kind of event > trigger (drop) does not necessarily require implementing another kind > (create). I think that's clearly different from implementing either > one for only some object types. > > "This event trigger will be called whenever an object is dropped" is a > reasonable contract with the user. "This other event trigger will be > called whenever an object is created, unless it happens to be a > schema" is much less reasonable. > > At least in my opinion. In the fullness of time, I agree that is not a restriction we should maintain. Given that CREATE SCHEMA with multiple objects is less well used, its a reasonable restriction to accept for one release, if the alternative is to implement nothing at all of value. Especially since we are now in the third year of development of this set of features, it is time to reduce the scope to ensure delivery. There may be other ways to ensure something of value is added, this was just one suggestion. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Can we please stop arguing over a problem I don't have? I started with CREATE SCHEMA because it is one of the easy cases, not because it was the most difficult case: we only need to deparse the bits of it that don't involve the objects within, because those are reported by the event trigger as separate commands. Each object gets its own creation command, which works pretty nicely. Of course, the deparsed version of the command will not look very much like what was submitted by the user, but that doesn't matter -- what does matter is that the objects created by running the commands reported in the event trigger will be (or should be) the same as those created by the original command. I was showing CREATE SCHEMA as a way to discuss the fine details: how to report identifiers that might need quoting, what to do with optional clauses (AUTHORIZATION), etc. I am past that now. On the subject of testing the triggers, Robert Haas wrote: > Here's one idea: create a contrib module that (somehow, via APIs to be > invented) runs every DDL command that gets executed through the > deparsing code, and then parses the result and executes *that* instead > of the original command. Then, add a build target that runs the > regression test suite in that mode, and get the buildfarm configured > to run that build target regularly on at least some machines. That > way, adding syntax to the regular regression test suite also serves to > test that the deparsing logic for that syntax is working. If we do > this, there's still some maintenance burden associated with having DDL > deparsing code, but at least our chances of noticing when we've failed > to maintain it should be pretty good. I gave this some more thought and hit a snag. The problem here is that by the time the event trigger runs, the original object has already been created. At that point, we can't simply replace the created objects with objects that would hypothetically be created by a command trigger. A couple of very hand-wavy ideas: 1. in the event trigger, DROP the original object and CREATE it as reported by the creation_commands SRF. 2. Have ddl_command_start open a savepoint, and then roll it back in ddl_command_end, then create the object again. Not sure this is doable because of the whole SPI nesting issue .. maybe with C-language event trigger functions? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 1/10/14, 3:40 PM, Simon Riggs wrote: > Given that CREATE SCHEMA with multiple objects is less well used, its > a reasonable restriction to accept for one release, if the alternative > is to implement nothing at all of value. Especially since we are now > in the third year of development of this set of features, it is time > to reduce the scope to ensure delivery. ISTM that since you can nest JSON then CREATE SCHEMA just needs to support providing an array of other JSON definitions,no? And it doesn't seem like it should be that hard to code that up. That said, if it's a choice between getting a bunch of CREATE blah support or getting nothing because of CREATE SCHEMA withobjects then as long as we have a scheme that supports the full CREATE SCHEMA feature we shouldn't hold up just becauseit's not coded. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On 1/10/14, 5:22 PM, Alvaro Herrera wrote: >> Here's one idea: create a contrib module that (somehow, via APIs to be >> >invented) runs every DDL command that gets executed through the >> >deparsing code, and then parses the result and executes*that* instead >> >of the original command. Then, add a build target that runs the >> >regression test suite in that mode, and get the buildfarm configured >> >to run that build target regularly on at least some machines. That >> >way, adding syntax to the regular regression test suite also serves to >> >test that the deparsing logic for that syntax is working. If we do >> >this, there's still some maintenance burden associated with having DDL >> >deparsing code, but at least our chances of noticing when we've failed >> >to maintain it should be pretty good. > I gave this some more thought and hit a snag. The problem here is that > by the time the event trigger runs, the original object has already been > created. At that point, we can't simply replace the created objects > with objects that would hypothetically be created by a command trigger. > > A couple of very hand-wavy ideas: > > 1. in the event trigger, DROP the original object and CREATE it as > reported by the creation_commands SRF. > > 2. Have ddl_command_start open a savepoint, and then roll it back in > ddl_command_end, then create the object again. Not sure this is doable > because of the whole SPI nesting issue .. maybe with C-language event > trigger functions? What if we don't try and do this all in one shot? I'm thinking let the original DDL do it's thing while capturing the re-parsedcommands in order somewhere. Dump those commands into a brand new database and use that for testing. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On 10 January 2014 23:22, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > On the subject of testing the triggers, Robert Haas wrote: > >> Here's one idea: create a contrib module that (somehow, via APIs to be >> invented) runs every DDL command that gets executed through the >> deparsing code, and then parses the result and executes *that* instead >> of the original command. Then, add a build target that runs the >> regression test suite in that mode, and get the buildfarm configured >> to run that build target regularly on at least some machines. That >> way, adding syntax to the regular regression test suite also serves to >> test that the deparsing logic for that syntax is working. If we do >> this, there's still some maintenance burden associated with having DDL >> deparsing code, but at least our chances of noticing when we've failed >> to maintain it should be pretty good. > > I gave this some more thought and hit a snag. The problem here is that > by the time the event trigger runs, the original object has already been > created. At that point, we can't simply replace the created objects > with objects that would hypothetically be created by a command trigger. Yes, all we are doing is firing a trigger that has access to the information about the current object. > A couple of very hand-wavy ideas: > > 1. in the event trigger, DROP the original object and CREATE it as > reported by the creation_commands SRF. > > 2. Have ddl_command_start open a savepoint, and then roll it back in > ddl_command_end, then create the object again. Not sure this is doable > because of the whole SPI nesting issue .. maybe with C-language event > trigger functions? We need to be clear what action we are testing exactly. Once we know that we can come up with a test mechanism. We can come up with various tests that test coverage but that may not be enough. Mostly we're just splitting up fields from the DDL, like object name into its own text field, as well as augmenting it with database name. That's pretty simple and shouldn't require much testing. This code seems likely to be prone to missing features much more so than actual bugs. Missing a piece of useful information in the JSON doesn't mean there is a bug. If all the event trigger does is generate JSON, then all we need to do is test that the trigger fired and generated well-formed JSON with the right information content. That should be very simple and I would say the simpler the better. To do that we can run code to parse the JSON and produce formatted text output like this... Field | Value ---------+----------- XX afagaha YY aamnamna The output can be derived from visual inspection, proving that we generated the required JSON. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera escribió: > In an event trigger, the function pg_event_trigger_get_creation_commands() > returns the following JSON blob: After playing with this for a while, I realized something that must have seemed quite obvious to those paying attention: what this function is, is just a glorified sprintf() for JSON. So I propose we take our existing format(text) and use it to model a new format(json) function, which will be useful to the project at hand and be of more general applicability. To make it a better fit, I have changed the spec slightly. The format string is now the "fmt" element in the topmost JSON. This format string can contain % escapes, which consist of: * the literal % itself * an element name, enclosed in braces { }. The name can optionally be followed by a colon and a possibly-empty array separator. * a format specifier, which can be I (identifier), D (dotted name), or s (string) * Alternatively, %% expands to a literal %, as usual. For each such escape, the JSON object is searched using the element name as key. For identifiers, the element is expected to be a string, and will be quoted per identifier quoting rules. Dotted-names are used to format possibly-qualified relation names and such; the element must be an object with one, two or three string elements, each of which is quoted per identifier rules, and output separated by periods. Finally, for arrays we expand each element in the JSON array element, and separate them with the separator specified in the {} part of the format specifier. For instance, alvherre=# select format(json '{"fmt":"hello, %{who}s! This is %{name}I", "who":"world", "name":"a function"}'); format ------------------------------------------hello, world! This is "a function" Elements can be objects, in which case they are expanded recursively: a "fmt" element is looked up and expanded as described above. I don't yet see a need for %L escapes (that is, literals that can expand to a single-quoted value or to NULL), but if I see it I will add that too. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hello
2014/1/13 Alvaro Herrera <alvherre@2ndquadrant.com>
Alvaro Herrera escribió:
> In an event trigger, the function pg_event_trigger_get_creation_commands()
> returns the following JSON blob:
After playing with this for a while, I realized something that must have
seemed quite obvious to those paying attention: what this function is,
is just a glorified sprintf() for JSON. So I propose we take our
existing format(text) and use it to model a new format(json) function,
which will be useful to the project at hand and be of more general
applicability.
To make it a better fit, I have changed the spec slightly. The format
string is now the "fmt" element in the topmost JSON. This format string
can contain % escapes, which consist of:
* the literal % itself
* an element name, enclosed in braces { }. The name can optionally be
followed by a colon and a possibly-empty array separator.
* a format specifier, which can be I (identifier), D (dotted name), or s
(string)
* Alternatively, %% expands to a literal %, as usual.
For each such escape, the JSON object is searched using the element name
as key. For identifiers, the element is expected to be a string, and
will be quoted per identifier quoting rules. Dotted-names are used to
format possibly-qualified relation names and such; the element must be
an object with one, two or three string elements, each of which is
quoted per identifier rules, and output separated by periods.
Finally, for arrays we expand each element in the JSON array element,
and separate them with the separator specified in the {} part of the
format specifier.
For instance,
alvherre=# select format(json '{"fmt":"hello, %{who}s! This is %{name}I", "who":"world", "name":"a function"}');
format
------------------------------------------
hello, world! This is "a function"
Elements can be objects, in which case they are expanded recursively: a
"fmt" element is looked up and expanded as described above.
I don't yet see a need for %L escapes (that is, literals that can expand
to a single-quoted value or to NULL), but if I see it I will add that
too.
I am not against to this idea, although I don't see a strong benefit. . Just special function can be better - it has minimal relation to variadic "function" format - and nested mixed format can be messy
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jan 10, 2014 at 6:22 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> Here's one idea: create a contrib module that (somehow, via APIs to be >> invented) runs every DDL command that gets executed through the >> deparsing code, and then parses the result and executes *that* instead >> of the original command. Then, add a build target that runs the >> regression test suite in that mode, and get the buildfarm configured >> to run that build target regularly on at least some machines. That >> way, adding syntax to the regular regression test suite also serves to >> test that the deparsing logic for that syntax is working. If we do >> this, there's still some maintenance burden associated with having DDL >> deparsing code, but at least our chances of noticing when we've failed >> to maintain it should be pretty good. > > I gave this some more thought and hit a snag. The problem here is that > by the time the event trigger runs, the original object has already been > created. At that point, we can't simply replace the created objects > with objects that would hypothetically be created by a command trigger. Hmm, so these triggers are firing after the corresponding actions have been performed. Yeah, that's tricky. I don't immediately have an idea how to come up with a comprehensive test framework for that, although I do think that the way you're structuring the JSON blobs makes this a whole lot less error-prone than the approaches that have been explored previously. The big question in my mind is still "if somebody updates the grammar in gram.y, how do we make sure that they notice that this stuff needs to be updated to match?". Ideally the answer is "if they don't, the regression tests fail". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 1/14/14, 2:05 PM, Robert Haas wrote: > On Fri, Jan 10, 2014 at 6:22 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: >>> Here's one idea: create a contrib module that (somehow, via APIs to be >>> invented) runs every DDL command that gets executed through the >>> deparsing code, and then parses the result and executes *that* instead >>> of the original command. Then, add a build target that runs the >>> regression test suite in that mode, and get the buildfarm configured >>> to run that build target regularly on at least some machines. That >>> way, adding syntax to the regular regression test suite also serves to >>> test that the deparsing logic for that syntax is working. If we do >>> this, there's still some maintenance burden associated with having DDL >>> deparsing code, but at least our chances of noticing when we've failed >>> to maintain it should be pretty good. >> >> I gave this some more thought and hit a snag. The problem here is that >> by the time the event trigger runs, the original object has already been >> created. At that point, we can't simply replace the created objects >> with objects that would hypothetically be created by a command trigger. > > Hmm, so these triggers are firing after the corresponding actions have > been performed. Yeah, that's tricky. I don't immediately have an > idea how to come up with a comprehensive test framework for that, > although I do think that the way you're structuring the JSON blobs > makes this a whole lot less error-prone than the approaches that have > been explored previously. The big question in my mind is still "if > somebody updates the grammar in gram.y, how do we make sure that they > notice that this stuff needs to be updated to match?". Ideally the > answer is "if they don't, the regression tests fail". Can't we capture the JSON, use that to create a new database, and then test against that? Unfortunately it'd be easiest todo that at a high level and not for individual commands, but it's a lot better than nothing... -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
Pavel Stehule escribió: > 2014/1/13 Alvaro Herrera <alvherre@2ndquadrant.com> > > After playing with this for a while, I realized something that must have > > seemed quite obvious to those paying attention: what this function is, > > is just a glorified sprintf() for JSON. So I propose we take our > > existing format(text) and use it to model a new format(json) function, > > which will be useful to the project at hand and be of more general > > applicability. > I am not against to this idea, although I don't see a strong benefit. . > Just special function can be better - it has minimal relation to variadic > "function" format - and nested mixed format can be messy Yeah. I eventually realized that I need some very specialized format specifiers here; I invented on %{}T specifier, for instance, which is used to format types. So I think this is better confined to expansion of SQL commands rather than a generic string formatter. So here's a patch implementing the ideas expressed in this thread. There are two new SQL functions: pg_event_trigger_get_creation_commands() Can be called in a ddl_command_end event, and returns a JSON blob for each object created in that command. pg_event_trigger_expand_command() Takes such a JSON blob and turns it back into an executable command. The usefulness of this combination is that the user can edit the JSON between those two calls, say by adding new columns or changing or removing schema specifications, tablespaces, and so on. One interesting bit I had to add was format_type_detailed(). This function returns a type specification in minute detail: schema, type name, typemod, array are all returned separately. This might seem overkill, but if we want to let the user mess with column definitions, I think it's necessary. Some things are uglier than I would like -- one reason is I stayed away from using the JSON things too directly. There are at least two patches in that area, and some helpers might appear that help this patch. However, at the moment I am not sure whether the end result would be better or worse, and I don't want to make this patch depend on some other patch which might or might not end up being applied. In any case, the JSON stuff is pretty localized so it should be reasonably easy to rip out and replace. The first half of deparse_utility.c is concerned with a simple-minded mechanism to accumulate an object hierarchy until it's time to convert it to proper JSON. Perhaps the new JSON stuff will make it possible to rip that all out. The JSON parsing is done in event_trigger.c. This code should probably live elsewhere, but I again hesitate to put it in json.c or jsonfuncs.c, at least until some discussion about its general applicability takes place. The second half of deparse_utility.c is concerned with actually processing the parse nodes to construct objects. There are several cases missing (at the moment, only CREATE SCHEMA, CREATE TABLE, CREATE INDEX and CREATE SEQUENCE are supported, and in each of them there are some things missing). This is code tedious to write but not difficult. To see this stuff in action, an event trigger function such as this is useful: CREATE OR REPLACE FUNCTION snitch() RETURNS event_trigger LANGUAGE plpgsql AS $$ DECLARE r RECORD; BEGIN FOR r IN SELECT * FROM pg_event_trigger_get_creation_commands() LOOP RAISE NOTICE 'JSON blob: %', r.command; RAISE NOTICE 'expanded: %', pg_event_trigger_expand_command(r.command::json); END LOOP; END; $$; CREATE EVENT TRIGGER snitch ON ddl_command_end when tag in ('create schema', 'create table', 'create index', 'create sequence') EXECUTE PROCEDURE snitch(); Then execute commands to your liking. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On 2014-01-15 02:11:11 -0300, Alvaro Herrera wrote: > Then execute commands to your liking. So, I am giving a quick look, given that I very likely will have to write a consumer for it... * deparse_utility_command returns payload via parameter, not return value? * functions beginning with an underscore formally are reserved, we shouldn't add new places using such a convention. * I don't think dequote_jsonval is correct as is, IIRC that won't correctly handle unicode escapes and such. I think json.cneeds to expose functionality for this. * I wonder if expand_jsonval_identifier shouldn't truncate as well? It should get handled by the individual created commands,right? * So, if I read things correctly, identifiers in json are never downcased, is that correct? I.e. they are implicitly quoted? * Why must we not schema qualify system types (c.f. expand_jsonval_typename)? It seems to be perfectly sensible to me tojust use pg_catalog there. * It looks like pg_event_trigger_expand_command will misparse nested {, error out instead? * I wonder if deparseColumnConstraint couldn't somehow share a bit more code with ruleutils.c's pg_get_constraintdef_worker(). * I guess you know, but deparseColumnConstraint() doesn't handle foreign keys yet. * Is tcop/ the correct location for deparse_utility.c? I wonder if it shouldn't be alongside ruleutils.c instead. * shouldn't pg_event_trigger_get_creation_commands return "command" as json instead of text? * Not your department, but a builtin json pretty printer would be really, really handy. Something like CREATE FUNCTION json_prettify(j json) RETURNS TEXT AS $$ import json return json.dumps(json.loads(j), sort_keys=True, indent=4) $$ LANGUAGE PLPYTHONU;makes the json so much more readable. Some minimal tests: * CREATE SEQUENCE errors out with: NOTICE: JSON blob: {"sequence":{"relation":"frakbar2","schema":"public"},"persistence":"","fmt":"CREATE %{persistence}sSEQUENCE %{identity}D"} ERROR: non-existant element "identity" in JSON formatting object *CREATE TABLE frakkbar2(id int); error out with: postgres=# CREATE TABLE frakkbar2(id int); NOTICE: JSON blob: {"on_commit":{"present":false,"on_commit_value":null,"fmt":"ON COMMIT %{on_commit_value}s"},"tablespace":{"present":false,"tablespace":null,"fmt":"TABLESPACE %{tablespace}I"},"inherits":{"present":false,"parents":null,"fmt":"INHERITS(%{parents:, }D)"},"table_elements":[{"collation":{"present":false,"fmt":"COLLATE %{name}I"},"type":{"typmod":"","typename":"integer","is_system":true,"is_array":false},"name":"id","fmt":"%{name}I %{type}T%{collation}s"}],"of_type":{"present":false,"of_type":null,"fmt":"OF %{of_type}T"},"if_not_exists":"","identity":{"relation":"frakkbar2","schema":"public"},"persistence":"","fmt":"CREATE %{persistence}sTABLE %{identity}D %{if_not_exists}s %{of_type}s (%{table_elements:, }s) %{inherits}s %{on_commit}s %{tablespace}s"} ERROR: invalid NULL is_system flag in %T element CONTEXT: PL/pgSQL function snitch() line 8 at RAISE Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund escribió: > * Why must we not schema qualify system types > (c.f. expand_jsonval_typename)? It seems to be perfectly sensible to > me to just use pg_catalog there. So, the reason for doing things this way is to handle cases like "varchar(10)" being turned into "character varying"; and that name requires that the typename NOT be schema-qualified, otherwise it fails. But thinking about this again, I don't see a reason why this can't be returned simply as pg_catalog.varchar(10); this should work fine on the receiving end as well, and give the same result. The other cases I'm worried about are types like bit(1) vs. unadorned bit vs. double-quoted "bit", and "char", etc. I'm not sure I'm dealing with them correctly right now. So even if by the above paragraph I could make the is_system thingy go away, I might still need it to cover this case. Thanks for the review, I will post an updated version later after fixing the other issues you mentioned plus adding support for more commands. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > So, the reason for doing things this way is to handle cases like > "varchar(10)" being turned into "character varying"; and that name > requires that the typename NOT be schema-qualified, otherwise it fails. > But thinking about this again, I don't see a reason why this can't be > returned simply as pg_catalog.varchar(10); this should work fine on the > receiving end as well, and give the same result. I think people would be unhappy if we changed the output of, say, pg_dump that way. But it's presumably not a problem for strings inside event triggers. Once upon a time, the typmods would have been an issue, but now that we support them in generic typename syntax I think we're good. > The other cases I'm worried about are types like bit(1) vs. unadorned > bit vs. double-quoted "bit", and "char", etc. I'm not sure I'm dealing > with them correctly right now. So even if by the above paragraph I > could make the is_system thingy go away, I might still need it to cover > this case. Yeah, there are some weird cases there. I think you can make them go away if you always print the fully qualified type name, ie don't omit "pg_catalog", and use pg_type.typname not any converted version (and don't forget to double-quote anything that might be a reserved word). But I've not looked closely at the code. regards, tom lane
Hi, Alvaro Herrera <alvherre@2ndquadrant.com> writes: > So here's a patch implementing the ideas expressed in this thread. > There are two new SQL functions: I spent some time reviewing the patch and tried to focus on a higher level review, as I saw Andres already began with the lower level stuff. The main things to keep in mind here are: - this patch enables running Event Triggers anytime a new object is created, in a way that the user code is run once theobject already made it through the catalogs; - the Event Trigger code has access to the full details about every created object, so it's not tied to a command butreally the fact that an object just was created in the catalogs; (it's important with serial and primary key sub-commands) - facilities are provided so that it's possible to easily build an SQL command that if executed would create the exactsame object again; - the facilities around passing the created object details and building a SQL command are made in such a way that it'strivially possible to "hack" away the captured objects properties before producing again a new SQL command. After careful study and thinking, it appears to me that the proposed patch addresses the whole range of features we expect here, and is both flexible enough for the users and easy enough to maintain. The event being fired once the objects are available in the catalogs makes it possible for the code providing the details in the JSON format to complete the parsetree with necessary information. Current state of the patch is not ready for commit yet, independant of code details some more high-level work needs to be done: - preliminary commit It might be a good idea to separate away some pre-requisites of this patch and commit them separately: the OID publishingparts and allowing an Event Trigger to get fired after CREATE but without the extra detailed JSON formatedinformation might be good commits already, and later add the much needed details about what did happen. - document the JSON format I vote for going with the proposed format, because it actually allows to implement both the audit and replication featureswe want, with the capability of hacking schema, data types, SQL representation etc; and because I couldn't thinkof anything better than what's proposed here ;-) - other usual documentation I don't suppose I have to expand on what I mean here… - fill-in other commands Not all commands are supported in the submitted patch. I think once we have a clear documentation on the general JSONformating and how to use it as a user, we need to include support for all CREATE commands that we have. I see nothing against extending when this work has to bo done until after the CF, as long as it's fully done beforebeta. After all it's only about filling in minutia at this point. - review the JSON producing code It might be possible to use more of the internal supports for JSON now that the format is freezing. - regression tests The patch will need some. The simpler solution is to add a new regression test entry and exercise all the CREATE commandsin there, in a specific schema, activating an event trigger that outputs the JSON detailed information each time(the snitch() example). Best would be to have some "pretty" indented output of the JSON to help with reviewing diffs, and I have to wonder aboutJSON object inner-ordering if we're going to do that. No other ideas on this topic from me. > The JSON parsing is done in event_trigger.c. This code should probably > live elsewhere, but I again hesitate to put it in json.c or jsonfuncs.c, > at least until some discussion about its general applicability takes > place. I see that as useful enough if it can be made to work without the special "fmt" fields somehow, with a nice default formatting ability. In particular, being able to build some intermediate object with json_agg then call the formating/expanding function on top of that might be quite useful. That said, I don't think we have enough time to attack this problem now, I think it would be wiser to address your immediate problem separately in your patch and clean it later (next release) with sharing code and infrastructure and offering a more generally useful tool. At least we will have some feedback about the Event Trigger specific context then. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Hi, Dimitri, thanks for the review. Here's a new version of this patch. I focused on a small set of commands for now, with the intention of building something that could serve as a useful foundation for an interesting percentage of use cases. There are enough strange things going on in this code that I thought I should go one step further in the foundations before I go much further in supporting the rest of the commands, which will be mostly tedious additions of extracting info from catalogs to immediately emit as JSON blobs. So this version supports the following commands: CREATE SCHEMA CREATE TABLE CREATE SEQUENCE CREATE INDEX CREATE VIEW I have run into some issues, though: 1. certain types, particularly timestamp/timestamptz but really this could happen for any type, have unusual typmod output behavior. For those one cannot just use the schema-qualified catalog names and then append the typmod at the end; because what you end up is something like pg_catalog.timestamptz(4) with time zone because, for whatever reason, the "with time zone" is part of typmod output. But this doesn't work at all for input. I'm not sure how to solve this. 2. I have been having object definitions be emitted complete; in particular, sequences have OWNED BY clauses when they have an owning column. But this doesn't work with a SERIAL column, because we get output like this: alvherre=# CREATE TABLE public.hijo (b serial); NOTICE: expanded: CREATE SEQUENCE public.hijo_b_seq INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START WITH 1CACHE 1 NO CYCLE OWNED BY public.hijo.b NOTICE: expanded: CREATE TABLE public.hijo (b pg_catalog.int4 DEFAULT nextval('hijo_b_seq'::regclass) NOT NULL ) which is all nice, except that the sequence is using the column name as owner before the column has been created in the first place. Both these command will, of course, fail, because both depend on the other to have been executed first. The tie is easy to break in this case: just don't emit the OWNED BY clause .. but it makes me a bit nervous to be hardcoding the decision of parts that might depend on others. OTOH pg_dump already knows how to split objects in constituent parts as necessary; maybe it's not so bad. 3. It is possible to coerce ruleutils.c to emit always-qualified names by using PushOverrideSearchPath() facility; but actually this doesn't always work, because some places in namespace.c believe that PG_CATALOG_NAMESPACE is always visible and so certain objects are not qualified. In particular, text columns using default collation will be emitted as having collation "default" and not pg_catalog.default as I would have initially expected. Right now it doesn't seem like this is a problem, but it is unusual. Dimitri Fontaine wrote: > After careful study and thinking, it appears to me that the proposed > patch addresses the whole range of features we expect here, and is both > flexible enough for the users and easy enough to maintain. Excellent, thanks. > - preliminary commit > > It might be a good idea to separate away some pre-requisites of this > patch and commit them separately: the OID publishing parts and > allowing an Event Trigger to get fired after CREATE but without the > extra detailed JSON formated information might be good commits > already, and later add the much needed details about what did > happen. I agree that perhaps the OID publishing bits should be pushed prior to the rest of the patch. It's two minor changes anyway that only affect CREATE TABLE AS and REFRESH MATVIEW as far as I remember; the rest of the commands already return the OID of the affected object. I'm not sure about pushing the bits to have a trigger to fire before having the deparse utility stuff. I guess it is useful because it will provide classID/objectID/identity for all created objects, even if we don't have the creation command. But changing the API of pg_get_creation_commands() in a later release might be problematic. > - document the JSON format > > I vote for going with the proposed format, because it actually > allows to implement both the audit and replication features we want, > with the capability of hacking schema, data types, SQL > representation etc; and because I couldn't think of anything better > than what's proposed here ;-) That's great to hear. Since the previous patch I have added a %{}O escape that is used to format operators. I think that makes the set of format specifiers mostly complete. > - review the JSON producing code > > It might be possible to use more of the internal supports for JSON > now that the format is freezing. Yeah. I realized that the trick through row_to_json is likely to fail when the objects have more than 1600 columns. Not sure if this is a problem in practice (haven't tried creating a 1600-column table yet, but I think that wouldn't fail because table elements are handed as arrays). Anyway if we want to use the new json_build facilities, we'd need to serialize from the in-memory representation I'm using to the text format, and from there to JSON. Not real sure that that's an improvement ... -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On Tue, Feb 4, 2014 at 12:11 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I have run into some issues, though: > > 1. certain types, particularly timestamp/timestamptz but really this > could happen for any type, have unusual typmod output behavior. For > those one cannot just use the schema-qualified catalog names and then > append the typmod at the end; because what you end up is something like > pg_catalog.timestamptz(4) with time zone > because, for whatever reason, the "with time zone" is part of typmod > output. But this doesn't work at all for input. I'm not sure how to > solve this. How about doing whatever pg_dump does? > 2. I have been having object definitions be emitted complete; in > particular, sequences have OWNED BY clauses when they have an owning > column. But this doesn't work with a SERIAL column, because we get > output like this: > > alvherre=# CREATE TABLE public.hijo (b serial); > NOTICE: expanded: CREATE SEQUENCE public.hijo_b_seq INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START WITH1 CACHE 1 NO CYCLE OWNED BY public.hijo.b > NOTICE: expanded: CREATE TABLE public.hijo (b pg_catalog.int4 DEFAULT nextval('hijo_b_seq'::regclass) NOT NULL ) > > which is all nice, except that the sequence is using the column name as > owner before the column has been created in the first place. Both these > command will, of course, fail, because both depend on the other to have > been executed first. The tie is easy to break in this case: just don't > emit the OWNED BY clause .. but it makes me a bit nervous to be > hardcoding the decision of parts that might depend on others. OTOH > pg_dump already knows how to split objects in constituent parts as > necessary; maybe it's not so bad. Well, the sequence can't depend on a table column that doesn't exist yet, so if it's in effect doing what you've shown there, it's "cheating" by virtue of knowing that nobody can observe the intermediate state. Strictly speaking, there's nothing "wrong" with emitting those commands just as you have them there; they won't run, but if what you want to do is log what's happened rather than replay it, that's OK. Producing output that is actually executable is a strictly harder problem than producing output that accurately describes what happened. As you say, pg_dump already splits things and getting executable output out of this facility will require the same kinds of tricks here. This gets back to my worry about maintaining two or three copies of the code that solve many of the same problems in quite different ways... > 3. It is possible to coerce ruleutils.c to emit always-qualified names > by using PushOverrideSearchPath() facility; but actually this doesn't > always work, because some places in namespace.c believe that > PG_CATALOG_NAMESPACE is always visible and so certain objects are not > qualified. In particular, text columns using default collation will be > emitted as having collation "default" and not pg_catalog.default as I > would have initially expected. Right now it doesn't seem like this is a > problem, but it is unusual. We have a quote_all_identifiers flag. We could have a schema_qualify_all_identifiers flag, too. Then again, why is the behavior of schema-qualifying absolutely everything even desirable? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas escribió: > On Tue, Feb 4, 2014 at 12:11 AM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > I have run into some issues, though: > > > > 1. certain types, particularly timestamp/timestamptz but really this > > could happen for any type, have unusual typmod output behavior. For > > those one cannot just use the schema-qualified catalog names and then > > append the typmod at the end; because what you end up is something like > > pg_catalog.timestamptz(4) with time zone > > because, for whatever reason, the "with time zone" is part of typmod > > output. But this doesn't work at all for input. I'm not sure how to > > solve this. > > How about doing whatever pg_dump does? We use format_type() for that as far as I know. What it does differently is use undecorated names defined by the standard for some types, which are never schema qualified and are never ambiguous because they are reserved words that would require quoting if used by user-defined type names. We can't use that here: somewhere upthread we noticed issues when using those which is why we're now trying to use catalog names instead of those special names. (I don't think it's impossible to use such names: we just need to ensure we handle quoting correctly for the funny cases such as "char" and "bit.) One idea is to chop the typmod output string at the closing parens. That seems to work well for the types that come with core code ... and the rule with the funny string at the end after the parenthised part of the typmod works only by type names with hardcoded productions in gram.y, so there is no danger that outside code will have a problem with that strategy. (I verified PostGIS types with typmods just to see an example of out-of-core code at work, and unsurprisingly it only emits stuff inside parens.) > > 2. I have been having object definitions be emitted complete; in > > particular, sequences have OWNED BY clauses when they have an owning > > column. But this doesn't work with a SERIAL column, because we get > > output like this: > Well, the sequence can't depend on a table column that doesn't exist > yet, so if it's in effect doing what you've shown there, it's > "cheating" by virtue of knowing that nobody can observe the > intermediate state. Strictly speaking, there's nothing "wrong" with > emitting those commands just as you have them there; they won't run, > but if what you want to do is log what's happened rather than replay > it, that's OK. Producing output that is actually executable is a > strictly harder problem than producing output that accurately > describes what happened. As you say, pg_dump already splits things > and getting executable output out of this facility will require the > same kinds of tricks here. Yeah. Moreover this will require that this new event trigger code is able to handle (at least certain kinds of) ALTER, which expands this patch in scope by a wide margin. Producing executable commands is an important goal. > This gets back to my worry about maintaining two or three copies of > the code that solve many of the same problems in quite different > ways... Agreed. It would be real good to be able to use this code for pg_dump too, but it seems dubious. The two environments seem just too different to be able to reuse anything. But if you see a way, by all means shout. > > 3. It is possible to coerce ruleutils.c to emit always-qualified names > > by using PushOverrideSearchPath() facility; but actually this doesn't > > always work, because some places in namespace.c believe that > > PG_CATALOG_NAMESPACE is always visible and so certain objects are not > > qualified. In particular, text columns using default collation will be > > emitted as having collation "default" and not pg_catalog.default as I > > would have initially expected. Right now it doesn't seem like this is a > > problem, but it is unusual. > > We have a quote_all_identifiers flag. We could have a > schema_qualify_all_identifiers flag, too. Actually the bit about collations was just a garden variety bug: turns out I was using a %{}I specifier (and correspondingly only the collation name as a string) instead of %{}D and get_catalog_object_by_oid() to match. I'm not yet sure if this new flag you suggest will really be needed, but thanks for the idea. > Then again, why is the behavior of schema-qualifying absolutely > everything even desirable? Well, someone could create a collation in another schema with the same name as a system collation and the command would become ambiguous. For example, this command create table f (a text collate "POSIX"); creates a column whose collation is either public."POSIX" or the system POSIX collation, depending on whether public appears before pg_catalog in search_path. Having someone create a POSIX collation in a schema that appears earlier than pg_catalog is pretty far-fetched, but ISTM that we really need to cover that scenario. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Robert Haas escribi�: >> How about doing whatever pg_dump does? > We use format_type() for that as far as I know. What it does > differently is use undecorated names defined by the standard for some > types, which are never schema qualified and are never ambiguous because > they are reserved words that would require quoting if used by > user-defined type names. We can't use that here: somewhere upthread we > noticed issues when using those which is why we're now trying to use > catalog names instead of those special names. (I don't think it's > impossible to use such names: we just need to ensure we handle quoting > correctly for the funny cases such as "char" and "bit.) Yeah, but wouldn't that complexity also bubble into user code within the event triggers? Since there's no real need for SQL standard compliance in this context, I think minimizing the number of weird formats is a win. > One idea is to chop the typmod output string at the closing parens. +1. The only reason timestamptypmodout works like that is that we're trying to match the SQL standard's spelling of the type names, and that committee apparently considers it an off day whenever they can't invent some randomly-incompatible-with-everything syntax. regards, tom lane
On Wed, Feb 5, 2014 at 2:26 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> Then again, why is the behavior of schema-qualifying absolutely >> everything even desirable? > > Well, someone could create a collation in another schema with the same > name as a system collation and the command would become ambiguous. For > example, this command > create table f (a text collate "POSIX"); > creates a column whose collation is either public."POSIX" or the system > POSIX collation, depending on whether public appears before pg_catalog > in search_path. Having someone create a POSIX collation in a schema > that appears earlier than pg_catalog is pretty far-fetched, but ISTM > that we really need to cover that scenario. Hmm, good point. I guess we don't worry much about this with pg_dump because we assume that we're restoring into an empty database (and if not, the user gets to keep both pieces). You're applying a higher standard here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Feb 5, 2014 at 2:26 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >>> Then again, why is the behavior of schema-qualifying absolutely >>> everything even desirable? >> Well, someone could create a collation in another schema with the same >> name as a system collation and the command would become ambiguous. > Hmm, good point. I guess we don't worry much about this with pg_dump > because we assume that we're restoring into an empty database (and if > not, the user gets to keep both pieces). You're applying a higher > standard here. Robert, that's just horsepucky. pg_dump is very careful about schemas. It's also careful to not schema-qualify names unnecessarily, which is an intentional tradeoff to improve readability of the dump --- at the cost that the dump might break if restored into a nonempty database with conflicting objects. In the case of data passed to event triggers, there's a different tradeoff to be made: people will probably value consistency over readability, so always-qualify is probably the right choice here. But in neither case are we being sloppy. regards, tom lane
On Thu, Feb 6, 2014 at 12:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Feb 5, 2014 at 2:26 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >>>> Then again, why is the behavior of schema-qualifying absolutely >>>> everything even desirable? > >>> Well, someone could create a collation in another schema with the same >>> name as a system collation and the command would become ambiguous. > >> Hmm, good point. I guess we don't worry much about this with pg_dump >> because we assume that we're restoring into an empty database (and if >> not, the user gets to keep both pieces). You're applying a higher >> standard here. > > Robert, that's just horsepucky. pg_dump is very careful about schemas. > It's also careful to not schema-qualify names unnecessarily, which is an > intentional tradeoff to improve readability of the dump --- at the cost > that the dump might break if restored into a nonempty database with > conflicting objects. In the case of data passed to event triggers, > there's a different tradeoff to be made: people will probably value > consistency over readability, so always-qualify is probably the right > choice here. But in neither case are we being sloppy. I didn't mean to imply otherwise. I know the pg_dump tries to avoid excess schema-qualification for readability among other reasons; what I meant was that Alvaro is applying a higher standard specifically in regards to replayability. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tom Lane escribió: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > One idea is to chop the typmod output string at the closing parens. > > +1. The only reason timestamptypmodout works like that is that we're > trying to match the SQL standard's spelling of the type names, and > that committee apparently considers it an off day whenever they can't > invent some randomly-incompatible-with-everything syntax. Okay, I did it that way. I also fixed the sequence OWNED BY problem simply by adding support for ALTER SEQUENCE. Of course, the intention is that all forms of CREATE and ALTER are supported, but this one seems reasonable standalone because CREATE TABLE uses it internally. So in the attached patch I have fixed all the three problems I reported for the previous version. I also changed the EventTriggerStashCreatedObject() function in two ways: 1. rename it to EventTriggerStashCommand(). This is because it will take ALTER commands as arguments, so the old name seemed a misnomer. (I guess I was initially thinking it'd only handle object creation, but obviously that will not work.) 2. Take ObjectType instead of ObjectClass. I'm not really sure about this change; the old usage of OCLASS at the utility.c level seemed wrong (in particular, in the DefineStmt case it was having to translate the stmt->kind into ObjClass manually). But on the other hand, the new usage of OBJECT_FOO requires a function to translate each enum value to the catalog that contains objects of that type (which is repetitive; the OCLASS_FOO code in dependency.c already has a table with that). And also I had to add OBJECT_USER_MAPPING, which was missing. I think I might end up reverting this bit, unless somebody sees a way to beautify the Objtype->catalog OID transformation. (The bit about missing USER MAPPING stuff seems a bit troubling; EventTriggerSupportsObjectType wasn't aware of that object type.) By way of illustration, here's the output a simple command using the snitch() function previously posted. You can see that the CREATE TABLE command is expanded as three commands: CREATE SEQUENCE, CREATE TABLE, ALTER SEQUENCE. alvherre=# create unlogged table t1 (a serial); NOTICE: JSON blob: { "definition": [ { "clause": "cache", "fmt": "CACHE %{value}s", "value": "1" }, { "clause": "cycle", "fmt": "%{no}s CYCLE", "no": "NO" }, { "clause": "increment_by", "fmt": "INCREMENT BY %{value}s", "value": "1" }, { "clause": "minvalue", "fmt": "MINVALUE %{value}s", "value": "1" }, { "clause": "maxvalue", "fmt": "MAXVALUE %{value}s", "value": "9223372036854775807" }, { "clause": "start", "fmt": "START WITH %{value}s", "value": "1" }, { "clause": "restart", "fmt": "RESTART %{value}s", "value": "1" } ], "fmt": "CREATE %{persistence}s SEQUENCE %{identity}D %{definition: }s", "identity": { "objname": "t1_a_seq", "schemaname": "public" }, "persistence": "" } NOTICE: expanded: CREATE SEQUENCE public.t1_a_seq CACHE 1 NO CYCLE INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807START WITH 1 RESTART 1 NOTICE: JSON blob: { "fmt": "CREATE %{persistence}s TABLE %{identity}D %{if_not_exists}s (%{table_elements:, }s) %{inherits}s %{on_commit}s%{tablespace}s", "identity": { "objname": "t1", "schemaname": "public" }, "if_not_exists": "", "inherits": { "fmt": "INHERITS (%{parents:, }D)", "parents": null, "present": false }, "on_commit": { "fmt": "ON COMMIT %{on_commit_value}s", "on_commit_value": null, "present": false }, "persistence": "UNLOGGED", "table_elements": [ { "collation": { "fmt": "COLLATE %{name}D", "present": false }, "coltype": { "is_array": false, "schemaname": "pg_catalog", "typename": "int4", "typmod": "" }, "default": { "default": "nextval('t1_a_seq'::regclass)", "fmt": "DEFAULT %{default}s" }, "fmt": "%{name}I %{coltype}T %{default}s %{not_null}s %{collation}s", "name": "a", "not_null": "NOT NULL", "type": "column" } ], "table_kind": "plain", "tablespace": { "fmt": "TABLESPACE %{tablespace}I", "present": false, "tablespace": null } } NOTICE: expanded: CREATE UNLOGGED TABLE public.t1 (a pg_catalog.int4 DEFAULT nextval('t1_a_seq'::regclass) NOT NULL ) NOTICE: JSON blob: { "definition": [ { "clause": "owned", "fmt": "OWNED BY %{owner}D", "owner": { "attrname": "a", "objname": "t1", "schemaname": "public" } } ], "fmt": "ALTER SEQUENCE %{identity}D %{definition: }s", "identity": { "objname": "t1_a_seq", "schemaname": "public" } } NOTICE: expanded: ALTER SEQUENCE public.t1_a_seq OWNED BY public.t1.a CREATE TABLE -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
Alvaro Herrera escribió: > I also fixed the sequence OWNED BY problem simply by adding support for > ALTER SEQUENCE. Of course, the intention is that all forms of CREATE > and ALTER are supported, but this one seems reasonable standalone > because CREATE TABLE uses it internally. I have been hacking on this on and off. This afternoon I discovered that interval typmod output can also be pretty unusual. Example: create table a (a interval year to month); For the column, we get this type spec (note the typmod): "coltype": { "is_array": false, "schemaname": "pg_catalog", "typename":"interval", "typmod": " year to month" }, so the whole command output ends up being this: NOTICE: expanded: CREATE TABLE public.a (a pg_catalog."interval" year to month ) WITH (oids=OFF) However, this is not accepted on input: alvherre=# CREATE TABLE public.a (a pg_catalog."interval" year to month ) WITH (oids=OFF); ERROR: syntax error at or near "year" LÍNEA 1: CREATE TABLE public.a (a pg_catalog."interval" year to mon... ^ I'm not too sure what to do about this yet. I checked the catalogs and gram.y, and it seems that interval is the only type that allows such strange games to be played. I would hate to be forced to add a kludge specific to type interval, but that seems to be the only option. (This would involve checking the OID of the type in deparse_utility.c, and if it's INTERVALOID, then omit the schema qualification and quoting on the type name). I have also been working on adding ALTER TABLE support. So far it's pretty simple; here is an example. Note I run a single command which includes a SERIAL column, and on output I get three commands (just like a serial column on create table). alvherre=# alter table tt add column b numeric, add column c serial, alter column a set default extract(epoch from now()); NOTICE: JSON blob: { "definition": [ { "clause": "cache", "fmt": "CACHE %{value}s", "value": "1" }, { "clause": "cycle", "fmt": "%{no}s CYCLE", "no": "NO" }, { "clause": "increment_by", "fmt": "INCREMENT BY %{value}s", "value": "1" }, { "clause": "minvalue", "fmt": "MINVALUE %{value}s", "value": "1" }, { "clause": "maxvalue", "fmt": "MAXVALUE %{value}s", "value": "9223372036854775807" }, { "clause": "start", "fmt": "START WITH %{value}s", "value": "1" }, { "clause": "restart", "fmt": "RESTART %{value}s", "value":"1" } ], "fmt": "CREATE %{persistence}s SEQUENCE %{identity}D %{definition: }s", "identity": { "objname": "tt_c_seq", "schemaname": "public" }, "persistence": "" } NOTICE: expanded: CREATE SEQUENCE public.tt_c_seq CACHE 1 NO CYCLE INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807START WITH 1 RESTART 1 NOTICE: JSON blob: { "fmt": "ALTER TABLE %{identity}D %{subcmds:, }s", "identity": { "objname": "tt", "schemaname": "public" }, "subcmds": [ { "definition": { "collation": { "fmt": "COLLATE %{name}D", "present": false }, "coltype": { "is_array": false, "schemaname": "pg_catalog", "typename": "numeric", "typmod": "" }, "default": { "fmt": "DEFAULT %{default}s", "present": false }, "fmt": "%{name}I %{coltype}T %{default}s%{not_null}s %{collation}s", "name": "b", "not_null": "", "type":"column" }, "fmt": "ADD COLUMN %{definition}s", "type": "add column" }, { "definition": { "collation": { "fmt": "COLLATE %{name}D", "present": false }, "coltype": { "is_array": false, "schemaname": "pg_catalog", "typename": "int4", "typmod": "" }, "default": { "default": "pg_catalog.nextval('public.tt_c_seq'::pg_catalog.regclass)", "fmt": "DEFAULT %{default}s" }, "fmt": "%{name}I %{coltype}T %{default}s %{not_null}s%{collation}s", "name": "c", "not_null": "", "type": "column" }, "fmt": "ADD COLUMN %{definition}s", "type": "add column" }, { "column":"a", "definition": "pg_catalog.date_part('epoch'::pg_catalog.text, pg_catalog.now())", "fmt":"ALTER COLUMN %{column}I SET DEFAULT %{definition}s", "type": "set default" } ] } NOTICE: expanded: ALTER TABLE public.tt ADD COLUMN b pg_catalog."numeric" , ADD COLUMN c pg_catalog.int4 DEFAULT pg_catalog.nextval('public.tt_c_seq'::pg_catalog.regclass) , ALTER COLUMN a SET DEFAULT pg_catalog.date_part('epoch'::pg_catalog.text,pg_catalog.now()) NOTICE: JSON blob: { "definition": [ { "clause": "owned", "fmt": "OWNED BY %{owner}D", "owner": { "attrname": "c", "objname": "tt", "schemaname": "public" } } ], "fmt": "ALTER SEQUENCE %{identity}D %{definition: }s", "identity": { "objname": "tt_c_seq", "schemaname": "public" } } NOTICE: expanded: ALTER SEQUENCE public.tt_c_seq OWNED BY public.tt.c ALTER TABLE Each subcommand is represented separately in a JSON array. Each element in the array has a "type" element indicating (broadly) what it's doing; the "fmt" element has all the details. So things like replication systems might decide to replicate some part of the ALTER or not, depending on the specific type. (And, of course, they can easily decide that replica XYZ must not replay the command because the table is not supposed to exist there; or perhaps it belongs to a replication set that is not the one the current node is origin for.) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Mar 13, 2014 at 5:06 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Alvaro Herrera escribió: > >> I also fixed the sequence OWNED BY problem simply by adding support for >> ALTER SEQUENCE. Of course, the intention is that all forms of CREATE >> and ALTER are supported, but this one seems reasonable standalone >> because CREATE TABLE uses it internally. > > I have been hacking on this on and off. This afternoon I discovered > that interval typmod output can also be pretty unusual. Example: > > create table a (a interval year to month); > > For the column, we get this type spec (note the typmod): > > "coltype": { > "is_array": false, > "schemaname": "pg_catalog", > "typename": "interval", > "typmod": " year to month" > }, > > so the whole command output ends up being this: > > NOTICE: expanded: CREATE TABLE public.a (a pg_catalog."interval" year to month ) WITH (oids=OFF) > > However, this is not accepted on input: > > alvherre=# CREATE TABLE public.a (a pg_catalog."interval" year to month ) WITH (oids=OFF); > ERROR: syntax error at or near "year" > LÍNEA 1: CREATE TABLE public.a (a pg_catalog."interval" year to mon... > ^ > > I'm not too sure what to do about this yet. I checked the catalogs and > gram.y, and it seems that interval is the only type that allows such > strange games to be played. I would hate to be forced to add a kludge > specific to type interval, but that seems to be the only option. (This > would involve checking the OID of the type in deparse_utility.c, and if > it's INTERVALOID, then omit the schema qualification and quoting on the > type name). > > I have also been working on adding ALTER TABLE support. So far it's > pretty simple; here is an example. Note I run a single command which > includes a SERIAL column, and on output I get three commands (just like > a serial column on create table). > > alvherre=# alter table tt add column b numeric, add column c serial, alter column a set default extract(epoch from now()); > NOTICE: JSON blob: { > "definition": [ > { > "clause": "cache", > "fmt": "CACHE %{value}s", > "value": "1" > }, > { > "clause": "cycle", > "fmt": "%{no}s CYCLE", > "no": "NO" > }, > { > "clause": "increment_by", > "fmt": "INCREMENT BY %{value}s", > "value": "1" > }, > { > "clause": "minvalue", > "fmt": "MINVALUE %{value}s", > "value": "1" > }, > { > "clause": "maxvalue", > "fmt": "MAXVALUE %{value}s", > "value": "9223372036854775807" > }, > { > "clause": "start", > "fmt": "START WITH %{value}s", > "value": "1" > }, > { > "clause": "restart", > "fmt": "RESTART %{value}s", > "value": "1" > } > ], > "fmt": "CREATE %{persistence}s SEQUENCE %{identity}D %{definition: }s", > "identity": { > "objname": "tt_c_seq", > "schemaname": "public" > }, > "persistence": "" > } What does the colon-space in %{definition: }s mean? In general, it seems like you're making good progress here, and I'm definitely happier with this than with previous approaches, but I'm still concerned about how maintainable it's going to be. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Here's a refreshed version of this patch. I have split it up in a largish number of pieces, which hopefully makes it easier to understand what is going on. First there are a number of trivial fixups which mostly shouldn't require much of an explanation, or add functionality to ruleutils that later patches need. I also split builtins.h to create a new ruleutils.h header, in which there are several prototypes for functions in ruleutils.c. (The reason for this is that a later patch adds a bunch more prototypes which require other headers to be #included by ruleutils.h, and it didn't seem a good idea to include those in builtins.h. This doesn't affect much -- only 7 .c files had to #include the new file.) The first really interesting patch is 0014, which introduces the concept of a "command queue" in event triggers, where DDL commands are stored as parse nodes plus supporting data. At ddl_command_end, the user code can call the new pg_event_trigger_get_creation_commands() function, which causes the parse node to be deparsed and returned as a JSON blob. This patch also introduces a function to decode the JSON blob back into a text string, possibly after being tweaked by the user code. A bunch of DDL commands are implemented here, such as CREATE SCHEMA, CREATE TABLE, and the like -- the basics. Then we have a bunch more patches which add support for more commands. In these patches you can see how the deparsing code for a certain command can be written. Most are pretty simple, but interesting examples such as CREATE FUNCTION are also included. We also have some commands that are not merely create, such as ALTER..RENAME. Patch 23 is the next interesting one -- it adds support for ALTER TABLE. This one is a lot more involved, because deparsing ALTER TABLE subcommands requires adding a call to collect each subcommand in tablecmds.c itself. In order to implement this, I modified most subcommand functions so that they return the OID of the affected object, or the attribute number of the affected column. For example if you add a constraint, the OID of the constraint is returned. This OID is collected and added to the list of subcommands. At deparse time, the collected commands can be deparsed back into JSON blobs as usual. Since certain commands execute AlterTable internally, I had to add "bracketing" calls to them, which set up a pseudo-ALTER TABLE context onto which the subcommands can be added. For instance, CREATE INDEX, CREATE VIEW and CREATE TABLESPACE MOVE are known to do this. ... then a bunch more boring commands are implemented ... Finally, patch 36 adds support for GRANT and REVOKE deparsing. This one is interesting because it's completely unlike the others. Here we don't have a single OID of an affected object. I chose to support this by exposing the InternalGrantStmt struct (in a new aclchk.h file), which we can deparse easily into the familiar JSON format. (Of course, in an patch earlier in the series I also had to modify event triggers so that grant/revoke fire them as well, which currently they do not.) There are no docs yet. This has been tested in the bidirectional replication suite -- it works by saving the deparsed (SQL, not JSON) command in a replicated table; the remote side executes the command when it receives the tuple. Commands that fail to deparse sanely cause very visible errors. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
- 0001-gram.y-more-psprintf.patch
- 0002-core-use-PG_FUNCNAME_MACRO-to-avoid-stale-name.patch
- 0003-deparse-core-return-OID-in-CREATE-TABLE-AS.patch
- 0004-deparse-core-return-OID-of-matview-in-REFRESH.patch
- 0005-deparse-core-split-builtins.h-to-new-ruleutils.h.patch
- 0006-deparse-core-return-the-base-type-OID-not-the-array-.patch
- 0008-deparse-core-have-ALTER-TABLE-return-OIDs-and-col-of.patch
- 0009-core-support-ALTER-TABLESPACE-MOVE-in-event-trigs.patch
- 0010-deparse-core-event-triggers-support-GRANT-REVOKE.patch
- 0011-deparse-core-add-format_type_detailed.patch
- 0012-deparse-core-add-get_sequence_values.patch
- 0013-deparse-core-PGDLLIMPORT-creating_extension.patch
- 0014-deparse-Initial-support-for-JSON-command-deparsing.patch.gz
- 0015-deparse-support-CREATE-TYPE-AS-RANGE.patch
- 0016-deparse-Add-support-for-CREATE-EXTENSION.patch
- 0017-deparse-add-support-for-CREATE-RULE.patch
- 0018-deparse-support-ALTER-TYPE-ADD-VALUE-for-enums.patch
- 0019-deparse-add-support-for-ALTER-THING-RENAME.patch
- 0020-deparse-support-CREATE-DOMAIN.patch
- 0021-deparse-core-enable-deparse-of-function-defaults-exp.patch
- 0022-deparse-deparse-CREATE-FUNCTION.patch
- 0023-deparse-initial-support-for-ALTER-TABLE.patch
- 0024-deparse-Support-CREATE-OPERATOR-FAMILY.patch
- 0025-deparse-Support-CREATE-CONVERSION.patch
- 0026-deparse-Support-CREATE-OPERATOR-via-DefineStmt.patch
- 0027-deparse-Support-CREATE-COLLATION-via-DefineStmt.patch
- 0028-deparse-Support-CREATE-TEXT-SEARCH-TEMPLATE-via-Defi.patch
- 0029-deparse-Support-CREATE-TEXT-SEARCH-PARSER-via-Define.patch
- 0030-deparse-Support-CREATE-TEXT-SEARCH-DICTIONARY-via-De.patch
- 0031-deparse-Support-CREATE-TYPE-via-DefineStmt.patch
- 0032-deparse-Initial-support-for-CREATE-TEXT-SEARCH-CONFI.patch
- 0033-deparse-Support-CREATE-AGGREGATE.patch
- 0034-deparse-support-ALTER-THING-OWNER-TO.patch
- 0035-deparse-Support-ALTER-EXTENSION-UPDATE-TO.patch
- 0036-deparse-support-GRANT-REVOKE.patch
On 2/6/14, 11:20 AM, Alvaro Herrera wrote: > NOTICE: JSON blob: { > "definition": [ > { > "clause": "owned", > "fmt": "OWNED BY %{owner}D", > "owner": { > "attrname": "a", > "objname": "t1", > "schemaname": "public" > } > } > ], > "fmt": "ALTER SEQUENCE %{identity}D %{definition: }s", > "identity": { > "objname": "t1_a_seq", > "schemaname": "public" > } > } > NOTICE: expanded: ALTER SEQUENCE public.t1_a_seq OWNED BY public.t1.a Apologies if this has been discussed and I missed it, but shouldn't part of the JSON be a field that indicates what commandis being run? It doesn't seem wise to conflate detecting what the command is with the overall format string. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
Jim Nasby wrote: > On 2/6/14, 11:20 AM, Alvaro Herrera wrote: > >NOTICE: JSON blob: { > > "definition": [ > > { > > "clause": "owned", > > "fmt": "OWNED BY %{owner}D", > > "owner": { > > "attrname": "a", > > "objname": "t1", > > "schemaname": "public" > > } > > } > > ], > > "fmt": "ALTER SEQUENCE %{identity}D %{definition: }s", > > "identity": { > > "objname": "t1_a_seq", > > "schemaname": "public" > > } > >} > >NOTICE: expanded: ALTER SEQUENCE public.t1_a_seq OWNED BY public.t1.a > > Apologies if this has been discussed and I missed it, but shouldn't part of the JSON be a field that indicates what commandis being run? It doesn't seem wise to conflate detecting what the command is with the overall format string. That's reported as a separate field by the pg_event_trigger_creation_commands function. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Jun 14, 2014 at 5:31 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Alvaro,Here's a refreshed version of this patch. I have split it up in a
largish number of pieces, which hopefully makes it easier to understand
what is going on.
Could you confirm that the patches you just committed are 1, 3 and 6?
Regards,
--
Michael
Michael
Michael Paquier wrote: > On Sat, Jun 14, 2014 at 5:31 AM, Alvaro Herrera <alvherre@2ndquadrant.com> > wrote: > > > Here's a refreshed version of this patch. I have split it up in a > > largish number of pieces, which hopefully makes it easier to understand > > what is going on. > > > Alvaro, > > Could you confirm that the patches you just committed are 1, 3 and 6? And 4. Yes, they are. I wanted to get trivial stuff out of the way while I had some other trivial patch at hand. I'm dealing with another patch from the commitfest now, so I'm not posting a rebased version right away, apologies. How do people like this patch series? It would be much easier for me to submit a single patch, but I feel handing it in little pieces makes it easier for reviewers. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Aug 26, 2014 at 5:33 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Michael Paquier wrote: > And 4. Yes, they are. I wanted to get trivial stuff out of the way > while I had some other trivial patch at hand. I'm dealing with another > patch from the commitfest now, so I'm not posting a rebased version > right away, apologies. No problems. I imagine that most of the patches still apply. > How do people like this patch series? It would be much easier for me to > submit a single patch, but I feel handing it in little pieces makes it > easier for reviewers. Well, I like the patch series for what it counts as long as you can submit it as such. That's far easier to test and certainly helps in spotting issues when kicking different code paths. -- Michael
On Tue, Aug 26, 2014 at 8:10 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > Well, I like the patch series for what it counts as long as you can > submit it as such. That's far easier to test and certainly helps in > spotting issues when kicking different code paths. So, for patch 2, which is a cosmetic change for pg_event_trigger_dropped_objects: =# select pg_event_trigger_dropped_objects(); ERROR: 0A000: pg_event_trigger_dropped_objects can only be called in a sql_drop event trigger function LOCATION: pg_event_trigger_dropped_objects, event_trigger.c:1220 This drops "()" from the error message with the function name. I guess that this is fine. But PG_FUNCNAME_MACRO is used nowhere except elog.h, and can as well be NULL. So if that's really necessary shouldn't we use FunctionCallInfo instead. It is not as well not that bad to hardcode the function name in the error message as well IMO. For patch 5: +1 for this move. When working on Postgres-XC a couple of years back I wondered why this distinction was not made. Wouldn't it make sense to move as well the declaration of quote_all_identifiers to ruleutils.h. That's a GUC and moving it out of builtins.h would make sense IMO. Patch 8 needs a rebase (patch independent on 1~6 it seems): 1 out of 57 hunks FAILED -- saving rejects to file src/backend/commands/tablecmds.c.rej (Stripping trailing CRs from patch.) Patch 9: 1) It needs a rebase, AlterTableMoveAllStmt has been renamed to AlterTableMoveAllStmt by commit 3c4cf08 2) Table summarizing event trigger firing needs to be updated with the new command supported (src/sgml/event-trigger.sgml) Patch 10, similar problems as patch 9: 1) Needs a rebase 2) table summarizing supported commands should be updated. You could directly group patches 9 and 10 in the final commit IMO. GRANT/REVOKE would also be the first command that would be supported by event triggers that is not of the type CREATE/DROP/ALTER, hence once it is rebased I would like to do some testing with it (same with patch 9 btw) and see how it reacts with the event sql_drop particularly (it should error out but still). Patch 11: applies with some hunks. So... This patch introduces an operation performing doing reverse engineering of format_type_internal... I think that this would be more adapted with a couple of new routines in lsyscache.[c|h] instead: - One for the type name - One for typmodout - One for is_array - One for its namespace TBH, I wanted those routines a couple of times when working on XC and finished coding them at the end, but in XC only :) Patch 12: patch applies correctly. Form_pg_sequence is already exposed in sequence.h even if it is only used in sequence.c, so yes it seems to be the correct way to do it here assuming that we need this data to rebuild a DDL. Why is ACL_UPDATE needed when checking permissions? This new routine only reads the values and does not update it. And a confirmation: ACL_USAGE is used to make this routine usable for PL languages in this case, right? I think that you should mention at the top of get_sequence_values that it returns a palloc'd result, and that it is the responsibility of caller to free it. That's all I have for now. Regards, -- Michael
On Wed, Aug 27, 2014 at 1:10 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Tue, Aug 26, 2014 at 8:10 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> Well, I like the patch series for what it counts as long as you can >> submit it as such. That's far easier to test and certainly helps in >> spotting issues when kicking different code paths. > > So, for patch 2, which is a cosmetic change for > pg_event_trigger_dropped_objects: > =# select pg_event_trigger_dropped_objects(); > ERROR: 0A000: pg_event_trigger_dropped_objects can only be called in > a sql_drop event trigger function > LOCATION: pg_event_trigger_dropped_objects, event_trigger.c:1220 > This drops "()" from the error message with the function name. I guess > that this is fine. But PG_FUNCNAME_MACRO is used nowhere except > elog.h, and can as well be NULL. So if that's really necessary > shouldn't we use FunctionCallInfo instead. It is not as well not that > bad to hardcode the function name in the error message as well IMO. > > For patch 5: > +1 for this move. When working on Postgres-XC a couple of years back I > wondered why this distinction was not made. Wouldn't it make sense to > move as well the declaration of quote_all_identifiers to ruleutils.h. > That's a GUC and moving it out of builtins.h would make sense IMO. > > Patch 8 needs a rebase (patch independent on 1~6 it seems): > 1 out of 57 hunks FAILED -- saving rejects to file > src/backend/commands/tablecmds.c.rej > (Stripping trailing CRs from patch.) > > Patch 9: > 1) It needs a rebase, AlterTableMoveAllStmt has been renamed to > AlterTableMoveAllStmt by commit 3c4cf08 > 2) Table summarizing event trigger firing needs to be updated with the > new command supported (src/sgml/event-trigger.sgml) > > Patch 10, similar problems as patch 9: > 1) Needs a rebase > 2) table summarizing supported commands should be updated. > You could directly group patches 9 and 10 in the final commit IMO. > GRANT/REVOKE would also be the first command that would be supported > by event triggers that is not of the type CREATE/DROP/ALTER, hence > once it is rebased I would like to do some testing with it (same with > patch 9 btw) and see how it reacts with the event sql_drop > particularly (it should error out but still). > > Patch 11: applies with some hunks. > So... This patch introduces an operation performing doing reverse > engineering of format_type_internal... I think that this would be more > adapted with a couple of new routines in lsyscache.[c|h] instead: > - One for the type name > - One for typmodout > - One for is_array > - One for its namespace > TBH, I wanted those routines a couple of times when working on XC and > finished coding them at the end, but in XC only :) > > Patch 12: patch applies correctly. > Form_pg_sequence is already exposed in sequence.h even if it is only > used in sequence.c, so yes it seems to be the correct way to do it > here assuming that we need this data to rebuild a DDL. Why is > ACL_UPDATE needed when checking permissions? This new routine only > reads the values and does not update it. And a confirmation: ACL_USAGE > is used to make this routine usable for PL languages in this case, > right? > I think that you should mention at the top of get_sequence_values that > it returns a palloc'd result, and that it is the responsibility of > caller to free it. And a last one before lunch, closing the review for all the basic things... Patch 13: Could you explain why this is necessary? +extern PGDLLIMPORT bool creating_extension; It may make sense by looking at the core features (then why isn't it with the core features?), but I am trying to review the patches in order. -- Michael
On Tue, Aug 26, 2014 at 11:14 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > And a last one before lunch, closing the review for all the basic things... > Patch 13: Could you explain why this is necessary? > +extern PGDLLIMPORT bool creating_extension; > It may make sense by looking at the core features (then why isn't it > with the core features?), but I am trying to review the patches in > order. Those patches have been reviewed up to number 14. Some of them could be applied as-is as they are useful taken independently, but most of them need a rebase, hence marking it as returned with feedback. -- Michael
Michael Paquier wrote: > On Tue, Aug 26, 2014 at 8:10 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: > > Well, I like the patch series for what it counts as long as you can > > submit it as such. That's far easier to test and certainly helps in > > spotting issues when kicking different code paths. > > So, for patch 2, which is a cosmetic change for > pg_event_trigger_dropped_objects: > =# select pg_event_trigger_dropped_objects(); > ERROR: 0A000: pg_event_trigger_dropped_objects can only be called in > a sql_drop event trigger function > LOCATION: pg_event_trigger_dropped_objects, event_trigger.c:1220 > This drops "()" from the error message with the function name. I guess > that this is fine. But PG_FUNCNAME_MACRO is used nowhere except > elog.h, and can as well be NULL. So if that's really necessary > shouldn't we use FunctionCallInfo instead. It is not as well not that > bad to hardcode the function name in the error message as well IMO. You're right. Dropped this patch. > For patch 5: > +1 for this move. When working on Postgres-XC a couple of years back I > wondered why this distinction was not made. Wouldn't it make sense to > move as well the declaration of quote_all_identifiers to ruleutils.h. > That's a GUC and moving it out of builtins.h would make sense IMO. I had it that way initially, but decided against it, because it creates too much churn; there are way too many .c files that depend on the quoting stuff. I don't want to repeat the htup_details.h disaster .. > Patch 8 needs a rebase (patch independent on 1~6 it seems): > 1 out of 57 hunks FAILED -- saving rejects to file > src/backend/commands/tablecmds.c.rej > (Stripping trailing CRs from patch.) Fixed. > Patch 9: > 1) It needs a rebase, AlterTableMoveAllStmt has been renamed to > AlterTableMoveAllStmt by commit 3c4cf08 Fixed. > 2) Table summarizing event trigger firing needs to be updated with the > new command supported (src/sgml/event-trigger.sgml) Will add. > Patch 10, similar problems as patch 9: > 1) Needs a rebase Done. > 2) table summarizing supported commands should be updated. Will add. > You could directly group patches 9 and 10 in the final commit IMO. > GRANT/REVOKE would also be the first command that would be supported > by event triggers that is not of the type CREATE/DROP/ALTER, hence > once it is rebased I would like to do some testing with it (same with > patch 9 btw) and see how it reacts with the event sql_drop > particularly (it should error out but still). Actually, I think I would commit most of this stuff not in the same way I'm submitting; I've split it up to ease review only. > Patch 11: applies with some hunks. > So... This patch introduces an operation performing doing reverse > engineering of format_type_internal... I think that this would be more > adapted with a couple of new routines in lsyscache.[c|h] instead: > - One for the type name > - One for typmodout > - One for is_array > - One for its namespace > TBH, I wanted those routines a couple of times when working on XC and > finished coding them at the end, but in XC only :) Not real sure about this. Being able to do the whole thing in one fell swoop seems more sensible to me. Did you need each of those things separately, or together? > Patch 12: patch applies correctly. > Form_pg_sequence is already exposed in sequence.h even if it is only > used in sequence.c, so yes it seems to be the correct way to do it > here assuming that we need this data to rebuild a DDL. Why is > ACL_UPDATE needed when checking permissions? This new routine only > reads the values and does not update it. And a confirmation: ACL_USAGE > is used to make this routine usable for PL languages in this case, > right? Hm, I guess I just copied it from some other routine. Will fix. > I think that you should mention at the top of get_sequence_values that > it returns a palloc'd result, and that it is the responsibility of > caller to free it. Will add. > And a last one before lunch, closing the review for all the basic things... > Patch 13: Could you explain why this is necessary? > +extern PGDLLIMPORT bool creating_extension; > It may make sense by looking at the core features (then why isn't it > with the core features?), but I am trying to review the patches in > order. It's needed in MSVC build; I merged it with the next patch, which actually uses it. The point is to detect whether a command is being executed as part of an extension; we set a flag in the output for this. In BDR, we emit only the CREATE EXTENSION command, not the individual commands that the extension creates. I would guess that most other replication systems will want to do the same. I attach a rebased version of this. The patch structure is a bit different; patch 4 (which used to be 14) introduces the infrastructure for DDL deparsing to JSON, but no users of it; patch 5 introduces the first few commands that actually produce deparse output. I will add this patch series to the next commitfest. Thanks for reviewing. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
- 0001-deparse-core-split-builtins.h-to-new-ruleutils.h.patch
- 0002-deparse-core-have-ALTER-TABLE-return-OIDs-and-col-of.patch
- 0003-deparse-core-event-triggers-support-GRANT-REVOKE.patch
- 0004-deparse-infrastructure-needed-for-command-deparsing.patch
- 0005-deparse-initial-set-of-supported-commands.patch
- 0006-deparse-Support-CREATE-TYPE-AS-RANGE.patch
- 0007-deparse-Support-CREATE-EXTENSION.patch
- 0008-deparse-Support-CREATE-RULE.patch
- 0009-deparse-Support-ALTER-TYPE-ADD-VALUE-enums.patch
- 0010-deparse-Support-for-ALTER-OBJECT-RENAME.patch
- 0011-deparse-Support-CREATE-DOMAIN.patch
- 0012-deparse-Support-CREATE-FUNCTION.patch
- 0013-deparse-Support-ALTER-TABLE.patch
- 0014-deparse-Support-CREATE-VIEW.patch
- 0015-deparse-Support-CREATE-OPERATOR-FAMILY.patch
- 0016-deparse-Support-CREATE-CONVERSION.patch
- 0017-deparse-Support-CREATE-OPERATOR-via-DefineStmt.patch
- 0018-deparse-Support-CREATE-COLLATION-via-DefineStmt.patch
- 0019-deparse-Support-CREATE-TEXT-SEARCH-TEMPLATE-via-Defi.patch
- 0020-deparse-Support-CREATE-TEXT-SEARCH-PARSER-via-Define.patch
- 0021-deparse-Support-CREATE-TEXT-SEARCH-DICTIONARY-via-De.patch
- 0022-deparse-Support-CREATE-TYPE-via-DefineStmt.patch
- 0023-deparse-Support-CREATE-TEXT-SEARCH-CONFIGURATION.patch
- 0024-deparse-Support-CREATE-AGGREGATE.patch
- 0025-deparse-support-ALTER-THING-OWNER-TO.patch
- 0026-deparse-Support-ALTER-EXTENSION-UPDATE-TO.patch
- 0027-deparse-Support-GRANT-REVOKE.patch
- 0028-deparse-Support-ALTER-FUNCTION.patch
- 0029-deparse-Support-COMMENT-ON.patch
- 0030-deparse-Support-ALTER-TABLE-ALL-IN-TABLESPACE.patch
Actually here's a different split of these patches, which I hope makes more sense. My intention here is that patches 0001 to 0004 are simple changes that can be pushed right away; they are not directly related to the return-creation-command feature. Patches 0005 to 0027 implement that feature incrementally. You can see in patch 0005 the DDL commands that are still not implemented in deparse (they are the ones that have an elog(ERROR) rather than a "command = NULL"). Patch 0006 adds calls in ProcessUtilitySlow() to each command, so that the object(s) being touched are added to the event trigger command stash. Patches from 0007 to 0027 (excepting patch 0017) implement one or a small number of commands in deparse. Patch 0017 is necessary infrastructure in ALTER TABLE to support deparsing that one. My intention with the later patches is that they would all be pushed as a single commit, i.e. the deparse support would be implemented for all commands in a fell swoop rather than piecemeal -- except possibly patch 0017 (the ALTER TABLE infrastructure). I split them up only for ease of review. Of course, before pushing we (I) need to implement deparsing for all the remaining commands. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
- 0001-deparse-core-split-builtins.h-to-new-ruleutils.h.patch
- 0002-deparse-core-have-RENAME-return-attribute-number.patch
- 0003-deparse-core-event-triggers-support-GRANT-REVOKE.patch
- 0004-deparse-core-event-triggers-support-COMMENT.patch
- 0005-deparse-infrastructure-needed-for-command-deparsing.patch
- 0006-deparse-sprinkle-EventTriggerStashCommand-calls.patch
- 0007-deparse-Support-CREATE-TYPE-AS.patch
- 0008-deparse-Support-CREATE-TYPE-AS-ENUM.patch
- 0009-deparse-Support-CREATE-SCHEMA-TABLE-SEQUENCE-INDEX-T.patch
- 0010-deparse-Support-CREATE-TYPE-AS-RANGE.patch
- 0011-deparse-Support-CREATE-EXTENSION.patch
- 0012-deparse-Support-CREATE-RULE.patch
- 0013-deparse-Support-ALTER-TYPE-ADD-VALUE-enums.patch
- 0014-deparse-Support-for-ALTER-OBJECT-RENAME.patch
- 0015-deparse-Support-CREATE-DOMAIN.patch
- 0016-deparse-Support-CREATE-FUNCTION.patch
- 0017-deparse-core-have-ALTER-TABLE-return-OIDs-and-col-of.patch
- 0018-deparse-Support-ALTER-TABLE.patch
- 0019-deparse-Support-CREATE-VIEW.patch
- 0020-deparse-Support-CREATE-OPERATOR-FAMILY.patch
- 0021-deparse-Support-CREATE-CONVERSION.patch
- 0022-deparse-Support-DefineStmt-commands.patch
- 0023-deparse-support-ALTER-THING-OWNER-TO.patch
- 0024-deparse-Support-ALTER-EXTENSION-UPDATE-TO.patch
- 0025-deparse-Support-GRANT-REVOKE.patch
- 0026-deparse-Support-ALTER-FUNCTION.patch
- 0027-deparse-support-COMMENT-ON.patch
On 2014-09-25 18:59:31 -0300, Alvaro Herrera wrote: > /* tid.c */ > diff --git a/src/include/utils/ruleutils.h b/src/include/utils/ruleutils.h > new file mode 100644 > index 0000000..520b066 > --- /dev/null > +++ b/src/include/utils/ruleutils.h > @@ -0,0 +1,34 @@ > +/*------------------------------------------------------------------------- > + * > + * ruleutils.h > + * Declarations for ruleutils.c > + * > + * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group > + * Portions Copyright (c) 1994, Regents of the University of California > + * > + * src/include/ruleutils.h > + * > + *------------------------------------------------------------------------- > + */ > +#ifndef RULEUTILS_H > +#define RULEUTILS_H > + > +#include "nodes/nodes.h" > +#include "nodes/parsenodes.h" > +#include "nodes/pg_list.h" > + > + > +extern char *pg_get_indexdef_string(Oid indexrelid); > +extern char *pg_get_indexdef_columns(Oid indexrelid, bool pretty); > + > +extern char *pg_get_constraintdef_string(Oid constraintId); > +extern char *deparse_expression(Node *expr, List *dpcontext, > + bool forceprefix, bool showimplicit); > +extern List *deparse_context_for(const char *aliasname, Oid relid); > +extern List *deparse_context_for_planstate(Node *planstate, List *ancestors, > + List *rtable, List *rtable_names); > +extern List *select_rtable_names_for_explain(List *rtable, > + Bitmapset *rels_used); > +extern char *generate_collation_name(Oid collid); > + > +#endif /* RULEUTILS_H */ > -- > 1.9.1 > I wondered for a minute whether any of these are likely to cause problems for code just including builtins.h - but I think there will be sufficiently few callers for them to make that not much of a concern. > From 5e3555058a1bbb93e25a3a104c26e6b96dce994d Mon Sep 17 00:00:00 2001 > From: Alvaro Herrera <alvherre@alvh.no-ip.org> > Date: Thu, 25 Sep 2014 16:34:50 -0300 > Subject: [PATCH 02/27] deparse/core: have RENAME return attribute number Looks "harmless". I.e. +1. > From 72c4d0ef875f9e9b0f3b424499738fd1bd946c32 Mon Sep 17 00:00:00 2001 > From: Alvaro Herrera <alvherre@alvh.no-ip.org> > Date: Fri, 9 May 2014 18:32:23 -0400 > Subject: [PATCH 03/27] deparse/core: event triggers support GRANT/REVOKE Hm. The docs don't mention whether these only work for database local objects or also shared objects. Generally event trigger are only triggered for the former... But the code doesn't seem to make a difference? > From 5bb35d2e51fe6c6e219fe5cf7ddbab5423e6be1b Mon Sep 17 00:00:00 2001 > From: Alvaro Herrera <alvherre@alvh.no-ip.org> > Date: Thu, 25 Sep 2014 15:44:44 -0300 > Subject: [PATCH 04/27] deparse/core: event triggers support COMMENT > > --- > doc/src/sgml/event-trigger.sgml | 8 +++++++- > src/backend/commands/comment.c | 5 ++++- > src/backend/commands/event_trigger.c | 1 + > src/backend/tcop/utility.c | 21 +++++++++++++++++---- > src/include/commands/comment.h | 2 +- > 5 files changed, 30 insertions(+), 7 deletions(-) > > diff --git a/doc/src/sgml/event-trigger.sgml b/doc/src/sgml/event-trigger.sgml > index 49b8384..39ecd94 100644 > --- a/doc/src/sgml/event-trigger.sgml > +++ b/doc/src/sgml/event-trigger.sgml > @@ -37,7 +37,7 @@ > <para> > The <literal>ddl_command_start</> event occurs just before the > execution of a <literal>CREATE</>, <literal>ALTER</>, <literal>DROP</>, > - <literal>GRANT</> or <literal>REVOKE</> > + <literal>COMMENT</>, <literal>GRANT</> or <literal>REVOKE</> > command. No check whether the affected object exists or doesn't exist is > performed before the event trigger fires. > As an exception, however, this event does not occur for > @@ -281,6 +281,12 @@ > <entry align="center"><literal>-</literal></entry> > </row> > <row> > + <entry align="left"><literal>COMMENT</literal></entry> > + <entry align="center"><literal>X</literal></entry> > + <entry align="center"><literal>X</literal></entry> > + <entry align="center"><literal>-</literal></entry> > + </row> > + <row> Hm. No drop support because COMMENTs are odd and use odd NULL semantics. Fair enough. > From 098f5acabd774004dc5d9c750d55e7c9afa60238 Mon Sep 17 00:00:00 2001 > From: Alvaro Herrera <alvherre@alvh.no-ip.org> > Date: Wed, 24 Sep 2014 15:53:04 -0300 > Subject: [PATCH 05/27] deparse: infrastructure needed for command deparsing > --- > src/backend/catalog/objectaddress.c | 115 +++++ > src/backend/commands/event_trigger.c | 941 ++++++++++++++++++++++++++++++++++- > src/backend/tcop/Makefile | 2 +- > src/backend/tcop/deparse_utility.c | 877 ++++++++++++++++++++++++++++++++ > src/backend/tcop/utility.c | 2 + > src/backend/utils/adt/format_type.c | 113 ++++- > src/include/catalog/objectaddress.h | 2 + > src/include/catalog/pg_proc.h | 4 + > src/include/commands/event_trigger.h | 3 + > src/include/commands/extension.h | 2 +- > src/include/nodes/parsenodes.h | 2 + > src/include/tcop/deparse_utility.h | 60 +++ > src/include/utils/builtins.h | 5 + > 13 files changed, 2117 insertions(+), 11 deletions(-) > create mode 100644 src/backend/tcop/deparse_utility.c > create mode 100644 src/include/tcop/deparse_utility.h > > diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c > index b69b75b..a2445f1 100644 > --- a/src/backend/catalog/objectaddress.c > +++ b/src/backend/catalog/objectaddress.c > @@ -723,6 +723,121 @@ get_object_address(ObjectType objtype, List *objname, List *objargs, > } > > /* > + * Return the OID of the catalog corresponding to the given object type > + */ > +Oid > +get_objtype_catalog_oid(ObjectType objtype) > +{ > + case OBJECT_ATTRIBUTE: > + catalog_id = TypeRelationId; /* XXX? */ > + break; I'm indeed not clear why thats a meaningful choice? > +/* XXX merge this with ObjectTypeMap? */ hm. I think that's obsolete by now? > @@ -924,6 +929,7 @@ EventTriggerSupportsObjectType(ObjectType obtype) > case OBJECT_ATTRIBUTE: > case OBJECT_CAST: > case OBJECT_COLUMN: > + case OBJECT_COMPOSITE: Is it actually correct that we return false here up to now? I.e. isn't this a bug? > +/* > + * EventTriggerStashCommand > + * Save data about a simple DDL command that was just executed > + */ > +void > +EventTriggerStashCommand(Oid objectId, uint32 objectSubId, ObjectType objtype, > + Node *parsetree) > +{ > + MemoryContext oldcxt; > + StashedCommand *stashed; > + > + oldcxt = MemoryContextSwitchTo(currentEventTriggerState->cxt); > + > + stashed = palloc(sizeof(StashedCommand)); > + > + stashed->type = SCT_Simple; I'm not really sure why that subtype is called 'simple'? > +Datum > +pg_event_trigger_get_creation_commands(PG_FUNCTION_ARGS) > +{ > + ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; > + TupleDesc tupdesc; > + Tuplestorestate *tupstore; > + MemoryContext per_query_ctx; > + MemoryContext oldcontext; > + ListCell *lc; > + > + /* > + * Protect this function from being called out of context > + */ > + if (!currentEventTriggerState) > + ereport(ERROR, > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > + errmsg("%s can only be called in an event trigger function", > + "pg_event_trigger_get_creation_commands()"))); I wonder wether we shouldn't introduce ERRCODE_WRONG_CONTEXT or such. Seems useful here and in a fair number of other situations. But not really required, just something that annoyed me before. > + foreach(lc, currentEventTriggerState->stash) > + { > + StashedCommand *cmd = lfirst(lc); > + char *command; > + > + /* > + * For IF NOT EXISTS commands that attempt to create an existing > + * object, the returned OID is Invalid; in those cases, return an empty > + * command instead of trying to soldier on. > + * > + * XXX an alternative would be to look up the Oid of the existing > + * object and run the deparse with that. But since the parse tree > + * might be different from the one that created the object in the first > + * place, we might not end up in a consistent state anyway. > + */ > + if (cmd->type == SCT_Simple && > + !OidIsValid(cmd->d.simple.objectId)) > + continue; I'd replace the XXX by 'One might think that'. You've perfectly explained why that's not a viable alternative. > + command = deparse_utility_command(cmd); > + > + /* > + * Some parse trees return NULL when deparse is attempted; we don't > + * emit anything for them. > + */ > + if (command != NULL) > + { > + Datum values[9]; > + bool nulls[9]; > + ObjectAddress addr; > + int i = 0; > + > + MemSet(nulls, 0, sizeof(nulls)); > + > + if (cmd->type == SCT_Simple) > + { > + Oid classId; > + Oid objId; > + uint32 objSubId; > + const char *tag; > + char *identity; > + char *type; > + char *schema = NULL; > + > + if (cmd->type == SCT_Simple) > + { > + classId = get_objtype_catalog_oid(cmd->d.simple.objtype); > + objId = cmd->d.simple.objectId; > + objSubId = cmd->d.simple.objectSubId; > + } > + > + tag = CreateCommandTag(cmd->parsetree); > + addr.classId = classId; > + addr.objectId = objId; > + addr.objectSubId = objSubId; > + > + type = getObjectTypeDescription(&addr); > + identity = getObjectIdentity(&addr); > + > + /* > + * Obtain schema name, if any ("pg_temp" if a temp object) > + */ > + if (is_objectclass_supported(addr.classId)) > + { > + AttrNumber nspAttnum; > + > + nspAttnum = get_object_attnum_namespace(addr.classId); > + if (nspAttnum != InvalidAttrNumber) > + { > + Relation catalog; > + HeapTuple objtup; > + Oid schema_oid; > + bool isnull; > + > + catalog = heap_open(addr.classId, AccessShareLock); > + objtup = get_catalog_object_by_oid(catalog, > + addr.objectId); > + if (!HeapTupleIsValid(objtup)) > + elog(ERROR, "cache lookup failed for object %u/%u", > + addr.classId, addr.objectId); > + schema_oid = heap_getattr(objtup, nspAttnum, > + RelationGetDescr(catalog), &isnull); > + if (isnull) > + elog(ERROR, "invalid null namespace in object %u/%u/%d", > + addr.classId, addr.objectId, addr.objectSubId); > + if (isAnyTempNamespace(schema_oid)) > + schema = pstrdup("pg_temp"); > + else > + schema = get_namespace_name(schema_oid); > + > + heap_close(catalog, AccessShareLock); > + } > + } > + > + /* classid */ > + values[i++] = ObjectIdGetDatum(addr.classId); > + /* objid */ > + values[i++] = ObjectIdGetDatum(addr.objectId); > + /* objsubid */ > + values[i++] = Int32GetDatum(addr.objectSubId); > + /* command tag */ > + values[i++] = CStringGetTextDatum(tag); > + /* object_type */ > + values[i++] = CStringGetTextDatum(type); > + /* schema */ > + if (schema == NULL) > + nulls[i++] = true; > + else > + values[i++] = CStringGetTextDatum(schema); > + /* identity */ > + values[i++] = CStringGetTextDatum(identity); > + /* in_extension */ > + values[i++] = BoolGetDatum(cmd->in_extension); > + /* command */ > + values[i++] = CStringGetTextDatum(command); > + } > + > + tuplestore_putvalues(tupstore, tupdesc, values, nulls); So: a) we only support SCT_Simple here. b) we add commands whether they're supported or not? And afaics the only way to discern that is by looking at 'schema'? If we're not skipping unsupported stuff, shouldn't we at least set a column indicating that? > +/* ************************* JSON STUFF FROM HERE ************************* * > + * Code below is used to decode blobs returned by deparse_utility_command * > + * */ > + > +/* > + * Note we only support types that are valid in command representation from > + * deparse_utility_command. > + */ > +typedef enum > +{ > + JsonTypeArray, > + JsonTypeObject, > + JsonTypeString > +} JsonType; There indeed seems to be no existing enum/define for this. Odd. > +typedef enum > +{ > + SpecTypename, > + SpecOperatorname, > + SpecDottedName, > + SpecString, > + SpecStringLiteral, > + SpecIdentifier > +} convSpecifier; Ok, these are all pretty specific to the usecase. > +/* > + * Extract the named json field, which must be of type string, from the given > + * JSON datum, which must be of type object. If the field doesn't exist, > + * NULL is returned. Otherwise the string value is returned. > + */ > +static char * > +expand_get_strval(Datum json, char *field_name) > +{ > + FunctionCallInfoData fcinfo; > + Datum result; > + char *value_str; > + > + InitFunctionCallInfoData(fcinfo, NULL, 2, InvalidOid, NULL, NULL); > + > + fcinfo.arg[0] = json; > + fcinfo.argnull[0] = false; > + fcinfo.arg[1] = CStringGetTextDatum(field_name); > + fcinfo.argnull[1] = false; > + > + result = (*json_object_field_text) (&fcinfo); I've not looked it up, but is it actually legal to call functions without flinfo setup? It seems nicer to revamp this to use FunctionCallInfoInvoke() rather than doing it ourselves. > +/* > + * Given a JSON value, return its type. > + * > + * We return both a JsonType (for easy control flow), and a string name (for > + * error reporting). > + */ > +static JsonType > +jsonval_get_type(Datum jsonval, char **typename) > +{ > + JsonType json_elt_type; > + Datum paramtype_datum; > + char *paramtype; > + > + paramtype_datum = DirectFunctionCall1(json_typeof, jsonval); > + paramtype = TextDatumGetCString(paramtype_datum); So, here you suddenly use a different method of invoking functions? > + if (strcmp(paramtype, "array") == 0) > + json_elt_type = JsonTypeArray; > + else if (strcmp(paramtype, "object") == 0) > + json_elt_type = JsonTypeObject; > + else if (strcmp(paramtype, "string") == 0) > + json_elt_type = JsonTypeString; > + else > + /* XXX improve this; need to specify array index or param name */ > + elog(ERROR, "unexpected JSON element type %s", > + paramtype); What about that XXX? Seems like the current usage allows that to be something for the future? > +/* > + * dequote_jsonval > + * Take a string value extracted from a JSON object, and return a copy of it > + * with the quoting removed. > + * > + * Another alternative to this would be to run the extraction routine again, > + * using the "_text" variant which returns the value without quotes; but this > + * is expensive, and moreover it complicates the logic a lot because not all > + * values are extracted in the same way (some are extracted using > + * json_object_field, others using json_array_element). Dequoting the object > + * already at hand is a lot easier. > + */ > +static char * > +dequote_jsonval(char *jsonval) > +{ > + char *result; > + int inputlen = strlen(jsonval); > + int i; > + int j = 0; > + > + result = palloc(strlen(jsonval) + 1); > + > + /* skip the start and end quotes right away */ > + for (i = 1; i < inputlen - 1; i++) > + { > + if (jsonval[i] == '\\') > + { > + i++; > + > + /* This uses same logic as json.c */ > + switch (jsonval[i]) > + { > + case 'b': > + result[j++] = '\b'; > + continue; > + case 'f': > + result[j++] = '\f'; > + continue; > + case 'n': > + result[j++] = '\n'; > + continue; > + case 'r': > + result[j++] = '\r'; > + continue; > + case 't': > + result[j++] = '\t'; > + continue; > + case '"': > + case '\\': > + case '/': > + break; > + default: > + /* XXX: ERROR? */ > + break; > + } > + } > + > + result[j++] = jsonval[i]; > + } > + result[j] = '\0'; > + > + return result; > +} This looks like something that really should live at least partially in json.c? > +/* > + * Expand a json value as a dotted-name. The value must be of type object > + * and must contain elements "schemaname" (optional), "objname" (mandatory), > + * "attrname" (optional). > + * > + * XXX do we need a "catalogname" as well? > + */ I'd replace the XXX by 'One day we might need "catalogname" as well, but no current ..." > +static void > +expand_jsonval_dottedname(StringInfo buf, Datum jsonval) > +{ > + char *schema; > + char *objname; > + char *attrname; > + const char *qschema; > + const char *qname; > + > + schema = expand_get_strval(jsonval, "schemaname"); > + objname = expand_get_strval(jsonval, "objname"); > + if (objname == NULL) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("invalid NULL object name in %%D element"))); > + qname = quote_identifier(objname); > + if (schema == NULL) > + { > + appendStringInfo(buf, "%s", qname); *appendStringInfoString(). > + } > + else > + { > + qschema = quote_identifier(schema); > + appendStringInfo(buf, "%s.%s", > + qschema, qname); > + if (qschema != schema) > + pfree((char *) qschema); > + pfree(schema); > + } > + > + attrname = expand_get_strval(jsonval, "attrname"); > + if (attrname) > + { > + const char *qattr; > + > + qattr = quote_identifier(attrname); > + appendStringInfo(buf, ".%s", qattr); > + if (qattr != attrname) > + pfree((char *) qattr); > + pfree(attrname); > + } Hm. Does specifying schema, object and attrname result ins oemthing valid with this? Might not ever be required, but in that case it should probably error out > +/* > + * expand a json value as a type name. > + */ > +static void > +expand_jsonval_typename(StringInfo buf, Datum jsonval) > +{ > + char *schema = NULL; > + char *typename; > + char *typmodstr; > + bool array_isnull; > + bool is_array; > + > + typename = expand_get_strval(jsonval, "typename"); > + if (typename == NULL) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("invalid NULL type name in %%T element"))); > + typmodstr = expand_get_strval(jsonval, "typmod"); /* OK if null */ > + is_array = expand_get_boolval(jsonval, "is_array", &array_isnull); > + schema = expand_get_strval(jsonval, "schemaname"); > + > + /* > + * If schema is NULL, then don't schema qualify, but do quote the type > + * name; if the schema is empty instead, then we don't quote the type name. > + * This is our (admittedly quite ugly) way of dealing with type names that > + * might require special treatment. > + */ Indeed, really quite ugly. What's it needed for? Maybe reference the function that needs it to explain. > + > +/* > + * Expand a json value as an operator name > + */ > +static void > +expand_jsonval_operator(StringInfo buf, Datum jsonval) > +{ > + char *schema = NULL; > + char *operator; > + > + operator = expand_get_strval(jsonval, "objname"); > + if (operator == NULL) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("invalid NULL operator name in %%O element"))); > + schema = expand_get_strval(jsonval, "schemaname"); > + > + /* schema might be NULL or empty */ Why can it be empty here? > + if (schema == NULL || schema[0] == '\0') > + appendStringInfo(buf, "%s", operator); > + else > + appendStringInfo(buf, "%s.%s", > + quote_identifier(schema), > + operator); > +} > +/* > + * Expand a json value as a string. The value must be of type string or of > + * type object, in which case it must contain a "fmt" element which will be > + * recursively expanded; also, if the object contains an element "present" > + * and it is set to false, the expansion is the empty string. > + */ > +static void > +expand_jsonval_string(StringInfo buf, Datum jsonval, JsonType json_elt_type) > +{ > + if (json_elt_type == JsonTypeString) > + { > + char *str; > + char *unquoted; > + > + str = TextDatumGetCString(jsonval); > + unquoted = dequote_jsonval(str); > + appendStringInfo(buf, "%s", unquoted); > + pfree(str); > + pfree(unquoted); > + } > + else if (json_elt_type == JsonTypeObject) > + { > + bool present; > + bool isnull; > + present = expand_get_boolval(jsonval, "present", &isnull); Nitpick, but I find isnull slightly confusing as a name here. > + if (isnull || present) > + { > + Datum inner; > + char *str; > + > + inner = DirectFunctionCall1(pg_event_trigger_expand_command, > + jsonval); > + str = TextDatumGetCString(inner); > + > + appendStringInfoString(buf, str); > + pfree(DatumGetPointer(inner)); > + pfree(str); > + } > + } > +} > +/* > + * Expand a json value as a string literal > + */ > +static void > +expand_jsonval_strlit(StringInfo buf, Datum jsonval) > +{ > + char *str; > + char *unquoted; > + StringInfoData dqdelim; > + static const char dqsuffixes[] = "_XYZZYX_"; > + int dqnextchar = 0; > + > + /* obtain the string, and remove the JSON quotes and stuff */ > + str = TextDatumGetCString(jsonval); > + unquoted = dequote_jsonval(str); > + > + /* easy case: if there are no ' and no \, just use a single quote */ > + if (strchr(unquoted, '\'') == NULL && > + strchr(unquoted, '\\') == NULL) > + { > + appendStringInfo(buf, "'%s'", unquoted); > + return; > + } > + > + /* Find a useful dollar-quote delimiter */ > + initStringInfo(&dqdelim); > + appendStringInfoString(&dqdelim, "$"); > + while (strstr(unquoted, dqdelim.data) != NULL) > + { > + appendStringInfoChar(&dqdelim, dqsuffixes[dqnextchar++]); > + dqnextchar %= sizeof(dqsuffixes) - 1; > + } > + /* add trailing $ */ > + appendStringInfoChar(&dqdelim, '$'); > + > + /* And finally produce the quoted literal into the output StringInfo */ > + appendStringInfo(buf, "%s%s%s", dqdelim.data, unquoted, dqdelim.data); > +} Ugh. Do we really need this logic here? Doesn't that already exist somewhere? > +/* > + * Expand one json element according to rules. > + */ What are rules? > +static void > +expand_one_element(StringInfo buf, char *param, > + Datum jsonval, char *valtype, JsonType json_elt_type, > + convSpecifier specifier) > +{ > + /* > + * Validate the parameter type. If dotted-name was specified, then a JSON > + * object element is expected; if an identifier was specified, then a JSON > + * string is expected. If a string was specified, then either a JSON > + * object or a string is expected. > + */ > + if (specifier == SpecDottedName && json_elt_type != JsonTypeObject) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("expected JSON object for %%D element \"%s\", got %s", > + param, valtype))); > + if (specifier == SpecTypename && json_elt_type != JsonTypeObject) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("expected JSON object for %%T element \"%s\", got %s", > + param, valtype))); > + if (specifier == SpecOperatorname && json_elt_type != JsonTypeObject) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("expected JSON object for %%O element \"%s\", got %s", > + param, valtype))); > + if (specifier == SpecIdentifier && json_elt_type != JsonTypeString) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("expected JSON string for %%I element \"%s\", got %s", > + param, valtype))); > + if (specifier == SpecStringLiteral && json_elt_type != JsonTypeString) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("expected JSON string for %%L element \"%s\", got %s", > + param, valtype))); > + if (specifier == SpecString && > + json_elt_type != JsonTypeString && json_elt_type != JsonTypeObject) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("expected JSON string or object for %%s element \"%s\", got %s", > + param, valtype))); Isn't this essentially a duplication of the switch below? > + switch (specifier) > + { > + case SpecIdentifier: > + expand_jsonval_identifier(buf, jsonval); > + break; > + > + case SpecDottedName: > + expand_jsonval_dottedname(buf, jsonval); > + break; > + > + case SpecString: > + expand_jsonval_string(buf, jsonval, json_elt_type); > + break; > + > + case SpecStringLiteral: > + expand_jsonval_strlit(buf, jsonval); > + break; > + > + case SpecTypename: > + expand_jsonval_typename(buf, jsonval); > + break; > + > + case SpecOperatorname: > + expand_jsonval_operator(buf, jsonval); > + break; > + } > +} > + > +/* > + * Expand one JSON array element according to rules. > + */ The generic "rules" again... > +#define ADVANCE_PARSE_POINTER(ptr,end_ptr) \ > + do { \ > + if (++(ptr) >= (end_ptr)) \ > + ereport(ERROR, \ > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), \ > + errmsg("unterminated format specifier"))); \ > + } while (0) > + > +/*------ > + * Returns a formatted string from a JSON object. > + * > + * The starting point is the element named "fmt" (which must be a string). > + * This format string may contain zero or more %-escapes, which consist of an > + * element name enclosed in { }, possibly followed by a conversion modifier, > + * followed by a conversion specifier. Possible conversion specifiers are: > + * > + * % expand to a literal %. > + * I expand as a single, non-qualified identifier > + * D expand as a possibly-qualified identifier > + * T expand as a type name > + * O expand as an operator name > + * L expand as a string literal (quote using single quotes) > + * s expand as a simple string (no quoting) > + * > + * The element name may have an optional separator specification preceded > + * by a colon. Its presence indicates that the element is expected to be > + * an array; the specified separator is used to join the array elements. > + * > + * XXX the current implementation works fine, but is likely to be slow: for > + * each element found in the fmt string we parse the JSON object once. It > + * might be better to use jsonapi.h directly so that we build a hash or tree of > + * elements and their values once before parsing the fmt string, and later scan > + * fmt using the tree. > + *------ > + */ > +Datum > +pg_event_trigger_expand_command(PG_FUNCTION_ARGS) > +{ > + text *json = PG_GETARG_TEXT_P(0); > + char *fmt_str; > + int fmt_len; > + const char *cp; > + const char *start_ptr; > + const char *end_ptr; > + StringInfoData str; > + > + fmt_str = expand_get_strval(PointerGetDatum(json), "fmt"); > + if (fmt_str == NULL) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("invalid NULL format string"))); > + fmt_len = strlen(fmt_str); > + > + start_ptr = fmt_str; > + end_ptr = start_ptr + fmt_len; > + initStringInfo(&str); > + > + for (cp = start_ptr; cp < end_ptr; cp++) > + { I haven't checked, but I'm not sure the code here handles multibyte stuff correctly. IIRC it's legal for multibyte characters to appear directly in json? > + /* > + * Obtain the element to be expanded. Note we cannot use > + * DirectFunctionCall here, because the element might not exist. > + */ "and the return value thus could be NULL." > +/* > + * Append a string parameter to a tree. > + * > + * Note: we don't pstrdup the source string. Caller must ensure the > + * source string lives long enough. > + */ > +static void Why this difference to most other functions? > +/* Allocate a new object parameter */ > +static ObjElem * > +new_object_object(char *name, ObjTree *value) > +{ > + ObjElem *param; > + > + param = palloc0(sizeof(ObjElem)); > + > + param->name = name ? pstrdup(name) : NULL; > + param->objtype = ObjTypeObject; > + param->obj_value = value; /* XXX not duped */ Aha? Why? Because it's freshly allocated anyhow? In which case the XXX doesn't seem warranted? > +/* > + * Create a JSON blob from our ad-hoc representation. > + * > + * Note this leaks some memory; caller is responsible for later clean up. > + * > + * XXX this implementation will fail if there are more JSON objects in the tree > + * than the maximum number of columns in a heap tuple. To fix we would first call > + * construct_md_array and then json_object. > + */ I have no idea what that XXX is supposed to say to me. Note that there's no construct_md_array in the patchset. > +static char * > +jsonize_objtree(ObjTree *tree) > +{ > + slist_foreach(iter, &tree->params) > + { > + > + nulls[i - 1] = false; > + switch (object->objtype) > + { > + case ObjTypeNull: > + nulls[i - 1] = true; > + break; > + case ObjTypeBool: > + values[i - 1] = BoolGetDatum(object->bool_value); > + break; > + case ObjTypeString: > + values[i - 1] = CStringGetTextDatum(object->str_value); > + break; > + case ObjTypeArray: > + { > + ArrayType *arrayt; > + Datum *arrvals; > + Datum jsonary; > + ListCell *cell; > + int length = list_length(object->array_value); > + int j; > + > + /* > + * Arrays are stored as Lists up to this point, with each > + * element being a ObjElem; we need to construct an > + * ArrayType with them to turn the whole thing into a JSON > + * array. > + */ > + j = 0; > + arrvals = palloc(sizeof(Datum) * length); > + foreach(cell, object->array_value) > + { > + ObjElem *elem = lfirst(cell); > + > + switch (elem->objtype) > + { > + case ObjTypeString: > + arrvals[j] = > + /* > + * XXX need quotes around the value. This > + * needs to be handled differently because > + * it will fail for values of anything but > + * trivial complexity. > + */ > + CStringGetTextDatum(psprintf("\"%s\"", > + elem->str_value)); > + break; Didn't you earlier add a function doing most of this? That XXX looks like something that definitely needs to be addressed before the commit. > From ea8997de828931e954e1a52821ffc834370859d2 Mon Sep 17 00:00:00 2001 > From: Alvaro Herrera <alvherre@alvh.no-ip.org> > Date: Thu, 25 Sep 2014 15:50:37 -0300 > Subject: [PATCH 06/27] deparse: sprinkle EventTriggerStashCommand() calls Seems like a odd separate commit ;). What's the point of not squashing it? XXX: I'm now too tired to continue honestly. Man, that's a fair amount of code... Will try to continue it. Once I've made it through once I hopefully also give some sensible higher level comments. Greetings, Andres Freund
On Fri, Sep 26, 2014 at 6:59 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Actually here's a different split of these patches, which I hope makes > more sense. My intention here is that patches 0001 to 0004 are simple > changes that can be pushed right away; they are not directly related to > the return-creation-command feature. Patches 0005 to 0027 implement > that feature incrementally. You can see in patch 0005 the DDL commands > that are still not implemented in deparse (they are the ones that have > an elog(ERROR) rather than a "command = NULL"). Patch 0006 adds calls > in ProcessUtilitySlow() to each command, so that the object(s) being > touched are added to the event trigger command stash. > > Patches from 0007 to 0027 (excepting patch 0017) implement one or a > small number of commands in deparse. Patch 0017 is necessary > infrastructure in ALTER TABLE to support deparsing that one. > > My intention with the later patches is that they would all be pushed as > a single commit, i.e. the deparse support would be implemented for all > commands in a fell swoop rather than piecemeal -- except possibly patch > 0017 (the ALTER TABLE infrastructure). I split them up only for ease of > review. Of course, before pushing we (I) need to implement deparsing > for all the remaining commands. Thanks for the update. This stuff is still on my TODO list and I was planning to look at it at some extent today btw :) Andres has already sent some comments (20140925235724.GH16581@awork2.anarazel.de) though, I'll try to not overlap with what has already been mentioned. Patch 1: I still like this patch as it gives a clear separation of the built-in functions and the sub-functions of ruleutils.c that are completely independent. Have you considered adding the external declaration of quote_all_identifiers as well? It is true that this impacts extensions (some of my stuff as well), but my point is to bite the bullet and make the separation cleaner between builtins.h and ruleutils.h. Attached is a patch that can be applied on top of patch 1 doing so... Feel free to discard for the potential breakage this would create though. Patch 2: 1) Declarations of renameatt are inconsistent: @tablecmds.c: -renameatt(RenameStmt *stmt) +renameatt(RenameStmt *stmt, int32 *objsubid) @tablecmds.h: -extern Oid renameatt(RenameStmt *stmt); +extern Oid renameatt(RenameStmt *stmt, int *attnum); Looking at this code, for correctness renameatt should use everywhere "int *attnum" and ExecRenameStmt should use "int32 *subobjid". 2) Just a smallish picky formatting remark, I would have done that on a single line: + attnum = + renameatt_internal(relid, 3) in ExecRenameStmt, I think that you should set subobjid for aggregate, collations, fdw, etc. There is an access to ObjectAddress so that's easy to set using address.objectSubId. 4) This comment is misleading, as for an attribute what is returned is actually an attribute number: + * Return value is the OID of the renamed object. The objectSubId, if any, + * is returned in objsubid. */ 5) Perhaps it is worth mentioning at the top of renameatt that the attribute number is returned as well? Patch 3: Looks fine, a bit of testing is showing up that this works as expected: =# CREATE ROLE foo; CREATE ROLE =# CREATE TABLE aa (a int); CREATE TABLE =# CREATE OR REPLACE FUNCTION abort_grant() RETURNS event_trigger AS $$ BEGIN RAISE NOTICE 'Event: %', tg_event; RAISE EXCEPTION 'Execution of command % forbidden', tg_tag; END; $$ LANGUAGE plpgsql; CREATE FUNCTION =# CREATE EVENT TRIGGER abort_grant_trigger ON ddl_command_start WHEN TAG IN ('GRANT', 'REVOKE') EXECUTE PROCEDURE abort_grant(); CREATE EVENT TRIGGER =# GRANT SELECT ON aa TO foo; NOTICE: 00000: Event: ddl_command_start LOCATION: exec_stmt_raise, pl_exec.c:3068 ERROR: P0001: Execution of command GRANT forbidden LOCATION: exec_stmt_raise, pl_exec.c:3068 =# DROP EVENT TRIGGER abort_grant_trigger; DROP EVENT TRIGGER =# CREATE EVENT TRIGGER abort_grant_trigger ON ddl_command_end WHEN TAG IN ('GRANT', 'REVOKE') EXECUTE PROCEDURE abort_grant(); CREATE EVENT TRIGGER =# REVOKE SELECT ON aa FROM foo; NOTICE: 00000: Event: ddl_command_end LOCATION: exec_stmt_raise, pl_exec.c:3068 ERROR: P0001: Execution of command REVOKE forbidden LOCATION: exec_stmt_raise, pl_exec.c:3068 Patch 4: Shouldn't int32 be used instead of uint32 in the declaration of CommentObject? And yes, adding support for only ddl_command_start and ddl_command_end is enough. Let's not play with dropped objects in this area... Similarly to the test above: =# comment on table aa is 'test'; NOTICE: 00000: Event: ddl_command_start LOCATION: exec_stmt_raise, pl_exec.c:3068 ERROR: P0001: Execution of command COMMENT forbidden LOCATION: exec_stmt_raise, pl_exec.c:3068 Regards, -- Michael
Вложения
About patch 1, which I just pushed, there were two reviews: Andres Freund wrote: > On 2014-09-25 18:59:31 -0300, Alvaro Herrera wrote: > > diff --git a/src/include/utils/ruleutils.h b/src/include/utils/ruleutils.h > > new file mode 100644 > > index 0000000..520b066 > > --- /dev/null > > +++ b/src/include/utils/ruleutils.h > I wondered for a minute whether any of these are likely to cause > problems for code just including builtins.h - but I think there will be > sufficiently few callers for them to make that not much of a concern. Great. Michael Paquier wrote: > Patch 1: I still like this patch as it gives a clear separation of the > built-in functions and the sub-functions of ruleutils.c that are > completely independent. Have you considered adding the external > declaration of quote_all_identifiers as well? It is true that this > impacts extensions (some of my stuff as well), but my point is to bite > the bullet and make the separation cleaner between builtins.h and > ruleutils.h. Attached is a patch that can be applied on top of patch 1 > doing so... Feel free to discard for the potential breakage this would > create though. Yes, I did consider moving the quoting function definitions and the global out of builtins.h. It is much more disruptive, however, because it affects a lot of external code not only our own; and given the looks I got after people had to mess with adding the htup_details.h header to external projects due to the refactoring I did to htup.h in an earlier release, I'm not sure about it. However, if I were to do it, I would instead create a quote.h file and would also add the quote_literal_cstr() prototype to it, perhaps even move the implementations from their current ruleutils.c location to quote.c. (Why is half the stuff in ruleutils.c rather than quote.c is beyond me.) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Thu, Oct 9, 2014 at 6:26 AM, Alvaro Herrera<span dir="ltr"><<a href="mailto:alvherre@2ndquadrant.com" target="_blank">alvherre@2ndquadrant.com</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px#ccc solid;padding-left:1ex">About patch 1, which I just pushed, there were two reviews:<br /><span class=""><br/> Andres Freund wrote:<br /> > On 2014-09-25 18:59:31 -0300, Alvaro Herrera wrote:<br /></span><span class="">>> diff --git a/src/include/utils/ruleutils.h b/src/include/utils/ruleutils.h<br /> > > new file mode100644<br /> > > index 0000000..520b066<br /> > > --- /dev/null<br /> > > +++ b/src/include/utils/ruleutils.h<br/><br /></span><span class="">> I wondered for a minute whether any of these are likelyto cause<br /> > problems for code just including builtins.h - but I think there will be<br /> > sufficientlyfew callers for them to make that not much of a concern.<br /><br /></span>Great.<br /><span class=""><br />Michael Paquier wrote:<br /><br /> > Patch 1: I still like this patch as it gives a clear separation of the<br /> >built-in functions and the sub-functions of ruleutils.c that are<br /> > completely independent. Have you consideredadding the external<br /> > declaration of quote_all_identifiers as well? It is true that this<br /> > impactsextensions (some of my stuff as well), but my point is to bite<br /> > the bullet and make the separation cleanerbetween builtins.h and<br /> > ruleutils.h. Attached is a patch that can be applied on top of patch 1<br /> >doing so... Feel free to discard for the potential breakage this would<br /> > create though.<br /><br /></span>Yes,I did consider moving the quoting function definitions and the<br /> global out of builtins.h. It is much moredisruptive, however, because<br /> it affects a lot of external code not only our own; and given the looks<br /> I gotafter people had to mess with adding the htup_details.h header to<br /> external projects due to the refactoring I didto htup.h in an earlier<br /> release, I'm not sure about it.<br /><br /> However, if I were to do it, I would insteadcreate a quote.h file and<br /> would also add the quote_literal_cstr() prototype to it, perhaps even<br /> move theimplementations from their current ruleutils.c location to<br /> quote.c. (Why is half the stuff in ruleutils.c ratherthan quote.c is<br /> beyond me.)<br /></blockquote>Yes, an extra header file would be cleaner. Now if we are worriedabout creating much incompatibilities with existing extension (IMO that's not something core should worry much about),then it is better not to do it. Now, if I can vote on it, I think it is worth doing in order to have builtins.h onlycontain only Datum foo(PG_FUNCTION_ARGS), and move all the quote-related things in quote.c.<br /></div>-- <br />Michael<br/></div></div>
On Thu, Oct 9, 2014 at 8:29 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Thu, Oct 9, 2014 at 6:26 AM, Alvaro Herrera <alvherre@2ndquadrant.com> > wrote: >> However, if I were to do it, I would instead create a quote.h file and >> would also add the quote_literal_cstr() prototype to it, perhaps even >> move the implementations from their current ruleutils.c location to >> quote.c. (Why is half the stuff in ruleutils.c rather than quote.c is >> beyond me.) > > Yes, an extra header file would be cleaner. Now if we are worried about > creating much incompatibilities with existing extension (IMO that's not > something core should worry much about), then it is better not to do it. > Now, if I can vote on it, I think it is worth doing in order to have > builtins.h only contain only Datum foo(PG_FUNCTION_ARGS), and move all the > quote-related things in quote.c. The attached patch implements this idea, creating a new header quote.h in include/utils that groups all the quote-related functions. This makes the code more organized. Regards, -- Michael
Вложения
Alvaro Herrera wrote: > Actually here's a different split of these patches, which I hope makes > more sense. My intention here is that patches 0001 to 0004 are simple > changes that can be pushed right away; they are not directly related to > the return-creation-command feature. Patches 0005 to 0027 implement > that feature incrementally. You can see in patch 0005 the DDL commands > that are still not implemented in deparse (they are the ones that have > an elog(ERROR) rather than a "command = NULL"). Patch 0006 adds calls > in ProcessUtilitySlow() to each command, so that the object(s) being > touched are added to the event trigger command stash. > > Patches from 0007 to 0027 (excepting patch 0017) implement one or a > small number of commands in deparse. Patch 0017 is necessary > infrastructure in ALTER TABLE to support deparsing that one. > > My intention with the later patches is that they would all be pushed as > a single commit, i.e. the deparse support would be implemented for all > commands in a fell swoop rather than piecemeal -- except possibly patch > 0017 (the ALTER TABLE infrastructure). I split them up only for ease of > review. Of course, before pushing we (I) need to implement deparsing > for all the remaining commands. Here's a new version of this series. The main change is that I've changed deparse_utility.c to generate JSON, and the code that was in commands/event_trigger.c to decode that JSON, so that it uses the new Jsonb API instead. In addition, I've moved the new code that was in commands/event_trigger.c to utils/adt/ddl_json.c. (The only entry point of the new file is the SQL-callable pg_event_trigger_expand_command() function, and its purpose is to expand a JSON object emitted by the deparse_utility.c code back into a plain text SQL command.) I have also cleaned up the code per comments from Michael Paquier and Andres Freund: * the GRANT support for event triggers now correctly ignores global objects. * COMMENT ON .. IS NULL no longer causes a crash * renameatt() and ExecRenameStmt are consistent in returning the objSubId as an "int" (not int32). This is what is used as objectSubId in ObjectAddress, which is what we're using this value for. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
- 0001-deparse-core-have-RENAME-return-attribute-number.patch
- 0002-deparse-core-event-triggers-support-GRANT-REVOKE.patch
- 0003-deparse-core-event-triggers-support-COMMENT.patch
- 0004-deparse-infrastructure-needed-for-command-deparsing.patch
- 0005-deparse-sprinkle-EventTriggerStashCommand-calls.patch
- 0006-deparse-Support-CREATE-TYPE-AS.patch
- 0007-deparse-Support-CREATE-TYPE-AS-ENUM.patch
- 0008-deparse-Support-CREATE-SCHEMA-TABLE-SEQUENCE-INDEX-T.patch
- 0009-deparse-Support-CREATE-TYPE-AS-RANGE.patch
- 0010-deparse-Support-CREATE-EXTENSION.patch
- 0011-deparse-Support-CREATE-RULE.patch
- 0012-deparse-Support-ALTER-TYPE-ADD-VALUE-enums.patch
- 0013-deparse-Support-for-ALTER-OBJECT-RENAME.patch
- 0014-deparse-Support-CREATE-DOMAIN.patch
- 0015-deparse-Support-CREATE-FUNCTION.patch
- 0016-deparse-core-have-ALTER-TABLE-return-OIDs-and-col-of.patch
- 0017-deparse-Support-ALTER-TABLE.patch
- 0018-deparse-Support-CREATE-VIEW.patch
- 0019-deparse-Support-CREATE-OPERATOR-FAMILY.patch
- 0020-deparse-Support-CREATE-CONVERSION.patch
- 0021-deparse-Support-DefineStmt-commands.patch
- 0022-deparse-support-ALTER-THING-OWNER-TO.patch
- 0023-deparse-Support-ALTER-EXTENSION-UPDATE-TO.patch
- 0024-deparse-Support-GRANT-REVOKE.patch
- 0025-deparse-Support-ALTER-FUNCTION.patch
- 0026-deparse-support-COMMENT-ON.patch
On Mon, Oct 13, 2014 at 12:45 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Here's a new version of this series. The main change is that I've > changed deparse_utility.c to generate JSON, and the code that was in > commands/event_trigger.c to decode that JSON, so that it uses the new > Jsonb API instead. In addition, I've moved the new code that was in > commands/event_trigger.c to utils/adt/ddl_json.c. (The only entry point > of the new file is the SQL-callable pg_event_trigger_expand_command() > function, and its purpose is to expand a JSON object emitted by the > deparse_utility.c code back into a plain text SQL command.) > I have also cleaned up the code per comments from Michael Paquier and > Andres Freund: > > * the GRANT support for event triggers now correctly ignores global > objects. > > * COMMENT ON .. IS NULL no longer causes a crash Why would this not fail? Patch 3 in this set is identical to the last one. I tested that and it worked well... > * renameatt() and ExecRenameStmt are consistent in returning the > objSubId as an "int" (not int32). This is what is used as objectSubId > in ObjectAddress, which is what we're using this value for. In patch 1, ExecRenameStmt returns objsubid for an attribute name when rename is done on an attribute. Now could you clarify why we skip this list of objects even if their sub-object ID is available with address.objectSubId? case OBJECT_AGGREGATE: case OBJECT_COLLATION: case OBJECT_CONVERSION: case OBJECT_EVENT_TRIGGER: case OBJECT_FDW: case OBJECT_FOREIGN_SERVER: case OBJECT_FUNCTION: case OBJECT_OPCLASS: case OBJECT_OPFAMILY: case OBJECT_LANGUAGE: case OBJECT_TSCONFIGURATION: case OBJECT_TSDICTIONARY: case OBJECT_TSPARSER: case OBJECT_TSTEMPLATE: Patch 2 fails on make check for the tests privileges and foreign_data by showing up double amount warnings for some object types: *** /Users/ioltas/git/postgres/src/test/regress/expected/privileges.out Fri Oct 10 14:34:10 2014 --- /Users/ioltas/git/postgres/src/test/regress/results/privileges.outThu Oct 16 15:47:42 2014 *************** *** 118,123 **** --- 118,124 ---- ERROR: permission denied for relation atest2 GRANT ALL ON atest1 TO PUBLIC; -- fail WARNING: no privilegeswere granted for "atest1" + WARNING: no privileges were granted for "atest1" -- checks in subquery, both ok SELECT * FROM atest1 WHERE ( b IN ( SELECTcol1 FROM atest2 ) ); a | b EventTriggerSupportsGrantObjectType is fine to remove ACL_OBJECT_DATABASE and ACL_OBJECT_TABLESPACE from the list of supported objects. That's as well in line with the current firing matrix. I think that it would be appropriate to add a comment on top of this function. Patch 3 looks good. Regards, -- Michael
On Thu, Oct 16, 2014 at 4:01 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Mon, Oct 13, 2014 at 12:45 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Here's a new version of this series.
Here is some input on patch 4:
1) Use of OBJECT_ATTRIBUTE:
+ case OBJECT_ATTRIBUTE:
+ catalog_id = TypeRelationId; /* XXX? */
+ break;
+ catalog_id = TypeRelationId; /* XXX? */
+ break;
I think that the XXX mark could be removed, using TypeRelationId is correct IMO as OBJECT ATTRIBUTE is used when managing an attribute object, which is a custom type used for CREATE/ALTER TYPE.
2) This patch is showing up many warnings, among them:
event_trigger.c:1460:20: note: uninitialized use occurs here
addr.classId = classId;
^~~~~~~
event_trigger.c:1446:21: note: initialize the variable 'objSubId' to silence this warning
uint32 objSubId;
^
= 0
event_trigger.c:1460:20: note: uninitialized use occurs here
addr.classId = classId;
^~~~~~~
event_trigger.c:1446:21: note: initialize the variable 'objSubId' to silence this warning
uint32 objSubId;
^
= 0
Or:
deparse_utility.c:301:1: warning: unused function 'append_object_object' [-Wunused-function]
append_object_object(ObjTree *tree, char *name, ObjTree *value)
^
deparse_utility.c:301:1: warning: unused function 'append_object_object' [-Wunused-function]
append_object_object(ObjTree *tree, char *name, ObjTree *value)
^
In the 2nd case though I imagine that those functions in deparse_utility.c are used afterwards... There are a couple of other warning related to SCT_Simple but that's normal as long as 17 or 24 are not combined with it.
3) What's that actually?
+/* XXX merge this with ObjectTypeMap? */
static event_trigger_support_data event_trigger_support[] = {
3) What's that actually?
+/* XXX merge this with ObjectTypeMap? */
static event_trigger_support_data event_trigger_support[] = {
We may be able to do something smarter with event_trigger_support[], but as this consists in a mapping of subcommands associated with CREATE/DROP/ALTER and is only a reverse engineering operation of somewhat AlterObjectTypeCommandTag or CreateCommandTag, I am not sure how you could merge that. Some input perhaps?
4)
+/*
+ * EventTriggerStashCommand
+ * Save data about a simple DDL command that was just executed
+ */
4)
+/*
+ * EventTriggerStashCommand
+ * Save data about a simple DDL command that was just executed
+ */
Shouldn't this be "single" instead of "simple"?
5) I think that SCT_Simple should be renamed as SCT_Single
+typedef enum StashedCommandType
+{
+ SCT_Simple,
+} StashedCommandType;
+{
+ SCT_Simple,
+} StashedCommandType;
This comment holds as well for deparse_simple_command.
6)
+ command = deparse_utility_command(cmd);
+
+ /*
+ * Some parse trees return NULL when deparse is attempted; we don't
+ * emit anything for them.
+ */
+ if (command != NULL)
+ {
+
+ /*
+ * Some parse trees return NULL when deparse is attempted; we don't
+ * emit anything for them.
+ */
+ if (command != NULL)
+ {
Small detail, but you may here just use a continue to make the code a bit more readable after deparsing the command.
7) pg_event_trigger_get_creation_commands is modified as well in patches 17 and 24. You may as well use an enum on cmd->type.
7) pg_event_trigger_get_creation_commands is modified as well in patches 17 and 24. You may as well use an enum on cmd->type.
8) Rejoining a comment done earlier by Andres, it would be nice to have ERRCODE_WRONG_CONTEXT (unrelated to this patch). ERRCODE_FEATURE_NOT_SUPPORTED seems rather a different error type...
9) Those declarations are not needed in event_trigger.c:
+#include "utils/json.h"
+#include "utils/jsonb.h"
9) Those declarations are not needed in event_trigger.c:
+#include "utils/json.h"
+#include "utils/jsonb.h"
10) Would you mind explaining what means "fmt"?
+ * Allocate a new object tree to store parameter values -- varargs version.
+ *
+ * The "fmt" argument is used to append as a "fmt" element in the output blob.
+ * Allocate a new object tree to store parameter values -- varargs version.
+ *
+ * The "fmt" argument is used to append as a "fmt" element in the output blob.
11) deparse_utility.c mentions here and there JSON objects, but what is created are JSONB objects. I'd rather be clear here.
12) Already mentioned before, but this reverse engineering machine for types would be more welcome as a set of functions in lsyscache (one for namespace, one for type name and one for is_array). For typemodstr the need is different as it uses printTypmod.
+void
+format_type_detailed(Oid type_oid, int32 typemod,
+ Oid *nspid, char **typname, char **typemodstr,
+ bool *is_array)
13) This change seems unrelated to this patch...
- int type = 0;
+ JsonbIteratorToken type = WJB_DONE;
JsonbValue v;
int level = 0;
bool redo_switch = false;
@@ -454,7 +454,7 @@ JsonbToCString(StringInfo out, JsonbContainer *in, int estimated_len)
first = false;
break;
default:
- elog(ERROR, "unknown flag of jsonb iterator");
+ elog(ERROR, "unknown jsonb iterator token type");
+format_type_detailed(Oid type_oid, int32 typemod,
+ Oid *nspid, char **typname, char **typemodstr,
+ bool *is_array)
13) This change seems unrelated to this patch...
- int type = 0;
+ JsonbIteratorToken type = WJB_DONE;
JsonbValue v;
int level = 0;
bool redo_switch = false;
@@ -454,7 +454,7 @@ JsonbToCString(StringInfo out, JsonbContainer *in, int estimated_len)
first = false;
break;
default:
- elog(ERROR, "unknown flag of jsonb iterator");
+ elog(ERROR, "unknown jsonb iterator token type");
14) This could already be pushed as a separate commit:
-extern bool creating_extension;
+extern PGDLLIMPORT bool creating_extension;
+extern PGDLLIMPORT bool creating_extension;
A couple of comments: this patch introduces a basic infrastructure able to do the following set of operations:
- Obtention of parse tree using StashedCommand
- Reorganization of parse tree to become an ObjTree, with boolean, array
- Reorganization of ObjTree to a JsonB value
I am actually a bit doubtful about why we actually need this intermediate ObjTree step... What would be wrong in manipulating a JSON(B) parsed tree, that is first empty, and pushing key/value objects to it when processing each command? Something moving toward in this direction is that ObjTree has some logic to manipulate booleans, null values, etc. This results in some duplication with what jsonb and json can actually do when creating when manipulating strings, as well as the extra processing like objtree_to_jsonb_element. ddl_json.c (should be ddl_jsonb.c?) would as well take more its sense as it directly manipulates JSONB containers.- Reorganization of ObjTree to a JsonB value
Regards,
--
Michael
Michael
Hi Michael, thanks for the review. Michael Paquier wrote: > On Thu, Oct 16, 2014 at 4:01 PM, Michael Paquier <michael.paquier@gmail.com> > wrote: > > > On Mon, Oct 13, 2014 at 12:45 PM, Alvaro Herrera > > <alvherre@2ndquadrant.com> wrote: > > > Here's a new version of this series. > > Here is some input on patch 4: > > 1) Use of OBJECT_ATTRIBUTE: > + case OBJECT_ATTRIBUTE: > + catalog_id = TypeRelationId; /* XXX? */ > + break; > I think that the XXX mark could be removed, using TypeRelationId is correct > IMO as OBJECT ATTRIBUTE is used when managing an attribute object, which is > a custom type used for CREATE/ALTER TYPE. Agreed. > 2) This patch is showing up many warnings, among them: Will check. > 3) What's that actually? > +/* XXX merge this with ObjectTypeMap? */ > static event_trigger_support_data event_trigger_support[] = { > We may be able to do something smarter with event_trigger_support[], but as > this consists in a mapping of subcommands associated with CREATE/DROP/ALTER > and is only a reverse engineering operation of somewhat > AlterObjectTypeCommandTag or CreateCommandTag, I am not sure how you could > merge that. Some input perhaps? ObjectTypeMap is part of the patch that handles DROP for replication; see my other patch in commitfest. I am also not sure how to merge all this stuff; with these patches, we would have three "do event triggers support this object type" functions, so I was thinking in having maybe a text file from which these functions are generated from a perl script or something. But for now I think it's okay to keep things like this. That comment is only there to remind me that some cleanup might be in order. > 4) > +/* > + * EventTriggerStashCommand > + * Save data about a simple DDL command that was just executed > + */ > Shouldn't this be "single" instead of "simple"? In an older version it was "basic". Not wedded to "simple", but I don't think "single" is the right thing. A later patch in the series introduces type Grant, and there's also type AlterTable. The idea behind Simple is to include command types that do not require special handling; but all these commands are single commands. > 6) > + command = deparse_utility_command(cmd); > + > + /* > + * Some parse trees return NULL when deparse is attempted; > we don't > + * emit anything for them. > + */ > + if (command != NULL) > + { > Small detail, but you may here just use a continue to make the code a bit > more readable after deparsing the command. Will check. > 9) Those declarations are not needed in event_trigger.c: > +#include "utils/json.h" > +#include "utils/jsonb.h" Will check. I split ddl_json.c at the last minute and I may have forgotten to remove these. > 10) Would you mind explaining what means "fmt"? > + * Allocate a new object tree to store parameter values -- varargs version. > + * > + * The "fmt" argument is used to append as a "fmt" element in the output > blob. "fmt" is equivalent to sprintf and friends' fmt argument. I guess this commands needs to be expanded a bit. > 11) deparse_utility.c mentions here and there JSON objects, but what is > created are JSONB objects. I'd rather be clear here. Good point. > 12) Already mentioned before, but this reverse engineering machine for > types would be more welcome as a set of functions in lsyscache (one for > namespace, one for type name and one for is_array). For typemodstr the need > is different as it uses printTypmod. > +void > +format_type_detailed(Oid type_oid, int32 typemod, > + Oid *nspid, char **typname, char > **typemodstr, > + bool *is_array) I am unsure about this. Other things that require many properties of the same object do a single lookup and return all of them in a single call, rather than repeated calls. > 13) This change seems unrelated to this patch... > - int type = 0; > + JsonbIteratorToken type = WJB_DONE; > JsonbValue v; > int level = 0; > bool redo_switch = false; > @@ -454,7 +454,7 @@ JsonbToCString(StringInfo out, JsonbContainer *in, int > estimated_len) > first = false; > break; > default: > - elog(ERROR, "unknown flag of jsonb > iterator"); > + elog(ERROR, "unknown jsonb iterator token > type"); Yes, sorry. I was trying to figure out how to use the jsonb stuff and I found this error message was quite unclear. In general, jsonb code seems to have random warts ... > A couple of comments: this patch introduces a basic infrastructure able to > do the following set of operations: > - Obtention of parse tree using StashedCommand > - Reorganization of parse tree to become an ObjTree, with boolean, array > - Reorganization of ObjTree to a JsonB value > I am actually a bit doubtful about why we actually need this intermediate > ObjTree step... What would be wrong in manipulating a JSON(B) parsed tree, > that is first empty, and pushing key/value objects to it when processing > each command? Something moving toward in this direction is that ObjTree has > some logic to manipulate booleans, null values, etc. This results in some > duplication with what jsonb and json can actually do when creating when > manipulating strings, as well as the extra processing like > objtree_to_jsonb_element. ddl_json.c (should be ddl_jsonb.c?) would as well > take more its sense as it directly manipulates JSONB containers. Uhm. Obviously we didn't have jsonb when I started this and we do have them now, so I could perhaps see about updating the patch to do things this way; but I'm not totally sold on that idea, as my ObjTree stuff is a lot easier to manage and the jsonb API is pretty ugly. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2014-10-28 05:30:43 -0300, Alvaro Herrera wrote: > > A couple of comments: this patch introduces a basic infrastructure able to > > do the following set of operations: > > - Obtention of parse tree using StashedCommand > > - Reorganization of parse tree to become an ObjTree, with boolean, array > > - Reorganization of ObjTree to a JsonB value > > I am actually a bit doubtful about why we actually need this intermediate > > ObjTree step... What would be wrong in manipulating a JSON(B) parsed tree, > > that is first empty, and pushing key/value objects to it when processing > > each command? Something moving toward in this direction is that ObjTree has > > some logic to manipulate booleans, null values, etc. This results in some > > duplication with what jsonb and json can actually do when creating when > > manipulating strings, as well as the extra processing like > > objtree_to_jsonb_element. ddl_json.c (should be ddl_jsonb.c?) would as well > > take more its sense as it directly manipulates JSONB containers. > > Uhm. Obviously we didn't have jsonb when I started this and we do have > them now, so I could perhaps see about updating the patch to do things > this way; but I'm not totally sold on that idea, as my ObjTree stuff is > a lot easier to manage and the jsonb API is pretty ugly. I looked at this as well, and I think trying to do so would not result in readable code. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Oct 28, 2014 at 6:00 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> Uhm. Obviously we didn't have jsonb when I started this and we do have >> them now, so I could perhaps see about updating the patch to do things >> this way; but I'm not totally sold on that idea, as my ObjTree stuff is >> a lot easier to manage and the jsonb API is pretty ugly. > > I looked at this as well, and I think trying to do so would not result > in readable code. That doesn't speak very well of jsonb. :-( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Oct 30, 2014 at 2:40 AM, Robert Haas <robertmhaas@gmail.com> wrote:
Just did the same and I played a bit with the APIs. And I am getting the impression that the jsonb API is currently focused on the fact of deparsing and parsing Jsonb strings to/from containers but there is no real interface that allows to easily manipulate the containers where the values are located. So, what I think is missing is really a friendly interface to manipulate JsonbContainers directly, and I think that we are not far from it with something like this set, roughly:
On Tue, Oct 28, 2014 at 6:00 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> Uhm. Obviously we didn't have jsonb when I started this and we do have
>> them now, so I could perhaps see about updating the patch to do things
>> this way; but I'm not totally sold on that idea, as my ObjTree stuff is
>> a lot easier to manage and the jsonb API is pretty ugly.
>
> I looked at this as well, and I think trying to do so would not result
> in readable code.
That doesn't speak very well of jsonb. :-(
- Initialization of an empty container
- Set of APIs to directly push a value to a container (boolean, array, null, string, numeric or other jsonb object)
- Initialization of JsonbValue objects
With this basic set of APIs patch 4 could for example use JsonbToCString to then convert the JSONB bucket back to a string it sends to client. Note as well that there is already findJsonbValueFromContainer present to get back a value in a container.
In short, my point is: instead of re-creating the wheel like what this series of patch is trying to do with ObjTree, I think that it would be more fruitful to have a more solid in-place JSONB infrastructure that allows to directly manipulate JSONB objects. This feature as well as future extensions could benefit from that.
Feel free to comment.
Regards,
--
Michael
On Fri, Oct 31, 2014 at 11:27 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
So, what I think is missing is really a friendly interface to manipulate JsonbContainers directly, and I think that we are not far from it with something like this set, roughly:- Initialization of an empty container- Set of APIs to directly push a value to a container (boolean, array, null, string, numeric or other jsonb object)- Initialization of JsonbValue objects
Here are more thoughts among those lines looking at the current state of the patch 4 that introduces the infrastructure of the whole feature. This would make possible in-memory manipulation of jsonb containers without relying on a 3rd-part set of APIs like what this patch is doing with ObjTree to deparse the DDL parse trees.
1) Set of constructor functions for JsonbValue: null, bool, string, array, JSONB object for nested values. Note that keys for can be used as Jsonb string objects
2) Lookup functions for values in a JsonbContainer. Patch 4 is introducing that with find_string_in_jsonbcontainer and find_bool_in_jsonbcontainer. We may as well extend it to be able to look for another Jsonb object for nested searches for example.
3) Functions to push JsonbValue within a container, using a key and a value. This is where most of the work would be necessary, for bool, null, string, Jsonb object and numeric.
This infrastructure would allow in-memory manipulation of jsonb containers. Containers that can then be easily be manipulated to be changed back to strings and for value lookups using key strings.
Regards,
--
Michael
Michael Paquier wrote: > Here are more thoughts among those lines looking at the current state of > the patch 4 that introduces the infrastructure of the whole feature. This > would make possible in-memory manipulation of jsonb containers without > relying on a 3rd-part set of APIs like what this patch is doing with > ObjTree to deparse the DDL parse trees. Thanks for the thoughts. I have to say that I have no intention of reworking the jsonb code. If somebody else wants to do the legwork and add that API as you suggest, I'm happy to remove all the ObjTree stuff from this patch. I don't expect this to happen too soon, though, so I would instead consider committing this patch based on ObjTree. Later, when somebody comes around to reworking jsonb, we can rip ObjTree out. > This infrastructure would allow in-memory manipulation of jsonb containers. > Containers that can then be easily be manipulated to be changed back to > strings and for value lookups using key strings. Honestly, I had hoped that the jsonb code would have already included this kind of functionality. I wasn't too happy when I discovered that I needed to keep the ObjTree crap. But I don't want to do that work myself. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Nov 7, 2014 at 10:45 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Michael Paquier wrote: > >> Here are more thoughts among those lines looking at the current state of >> the patch 4 that introduces the infrastructure of the whole feature. This >> would make possible in-memory manipulation of jsonb containers without >> relying on a 3rd-part set of APIs like what this patch is doing with >> ObjTree to deparse the DDL parse trees. > > Thanks for the thoughts. I have to say that I have no intention of > reworking the jsonb code. If somebody else wants to do the legwork and > add that API as you suggest, I'm happy to remove all the ObjTree stuff > from this patch. I don't expect this to happen too soon, though, so I > would instead consider committing this patch based on ObjTree. Later, > when somebody comes around to reworking jsonb, we can rip ObjTree out. > >> This infrastructure would allow in-memory manipulation of jsonb containers. >> Containers that can then be easily be manipulated to be changed back to >> strings and for value lookups using key strings. > > Honestly, I had hoped that the jsonb code would have already included > this kind of functionality. I wasn't too happy when I discovered that I > needed to keep the ObjTree crap. But I don't want to do that work > myself. If we're going to have infrastructure for this in core, we really ought to make the effort to make it general instead of not. I still think this whole idea is a mess. It adds what looks to be a LOT of code that, at least as I understand it, we have no compelling way to regression test and which will likely receive very limited testing from users to support a feature that is not in core, may never be, and which I strongly suspect may be too clever by half. Once you've committed it, you're going to move onto other things and leave it to everyone who comes after to try to maintain it. I bet we'll still be running into bugs and half-implemented features five years from now, and maybe in ten. Ramming through special-purpose infrastructure instead of generalizing it is merely icing on the cake, but it's still moving in the wrong direction. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-11-07 21:41:17 -0500, Robert Haas wrote: > On Fri, Nov 7, 2014 at 10:45 AM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > Michael Paquier wrote: > > > >> Here are more thoughts among those lines looking at the current state of > >> the patch 4 that introduces the infrastructure of the whole feature. This > >> would make possible in-memory manipulation of jsonb containers without > >> relying on a 3rd-part set of APIs like what this patch is doing with > >> ObjTree to deparse the DDL parse trees. > > > > Thanks for the thoughts. I have to say that I have no intention of > > reworking the jsonb code. If somebody else wants to do the legwork and > > add that API as you suggest, I'm happy to remove all the ObjTree stuff > > from this patch. I don't expect this to happen too soon, though, so I > > would instead consider committing this patch based on ObjTree. Later, > > when somebody comes around to reworking jsonb, we can rip ObjTree out. > > > >> This infrastructure would allow in-memory manipulation of jsonb containers. > >> Containers that can then be easily be manipulated to be changed back to > >> strings and for value lookups using key strings. > > > > Honestly, I had hoped that the jsonb code would have already included > > this kind of functionality. I wasn't too happy when I discovered that I > > needed to keep the ObjTree crap. But I don't want to do that work > > myself. > > If we're going to have infrastructure for this in core, we really > ought to make the effort to make it general instead of not. > > I still think this whole idea is a mess. It adds what looks to be a > LOT of code that, at least as I understand it, we have no compelling > way to regression test I don't understand why this is particularly difficult to regresssion test. It actually is comparatively simple? > and which will likely receive very limited > testing from users to support a feature that is not in core, Just like half of the features you worked on yourself lately? Why is this an argument? Being able to replicate DDL is a feature wish that has been around pretty much since the inception of trigger based replication solution. It's not some current fancy. And the json stuff only got there because people wanted some way to manipulate the names in the replicated - which this abstraction provides them with. > may never be, and which I strongly suspect may be too clever by half. > Once you've committed it, you're going to move onto other things and > leave it to everyone who comes after to try to maintain it. I bet > we'll still be running into bugs and half-implemented features five > years from now, and maybe in ten. Ramming through special-purpose > infrastructure instead of generalizing it is merely icing on the cake, > but it's still moving in the wrong direction. You're just as much to blame for not writing a general json abstraction layer for EXPLAIN. I'd say that's not much blame, but still, there's really not much difference there. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Nov 8, 2014 at 12:45 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Michael Paquier wrote: > >> Here are more thoughts among those lines looking at the current state of >> the patch 4 that introduces the infrastructure of the whole feature. This >> would make possible in-memory manipulation of jsonb containers without >> relying on a 3rd-part set of APIs like what this patch is doing with >> ObjTree to deparse the DDL parse trees. > > Thanks for the thoughts. I have to say that I have no intention of > reworking the jsonb code. If somebody else wants to do the legwork and > add that API as you suggest, I'm happy to remove all the ObjTree stuff > from this patch. I don't expect this to happen too soon, though, so I > would instead consider committing this patch based on ObjTree. Later, > when somebody comes around to reworking jsonb, we can rip ObjTree out. The thing freaking me out in this case is when would that really happen? Maybe years from now, and perhaps at that point we would regret to not have put in place the infrastructure that we knew we could have done. >> This infrastructure would allow in-memory manipulation of jsonb containers. >> Containers that can then be easily be manipulated to be changed back to >> strings and for value lookups using key strings. > > Honestly, I had hoped that the jsonb code would have already included > this kind of functionality. I wasn't too happy when I discovered that I > needed to keep the ObjTree crap. But I don't want to do that work > myself. I can't blame you for that. In any case, if this code goes in as-is (I am against it at this point but I am just giving my opinion as a reviewer here), I think that at least the following things could be done with a minimal effort: - Provide the set of constructor functions for JsonbValue - Move the jsonb APIs (find_*_in_jsonbcontainer) doing value lookups using keys from ddl_json.c to jsonb_util.c, rename them to something more consistent and complete the set for the other object types. - Add a TODO item in the wiki, and a TODO item in where ObjTree is defined. Thanks, -- Michael
On Sat, Nov 8, 2014 at 4:37 AM, Andres Freund <andres@2ndquadrant.com> wrote: > I don't understand why this is particularly difficult to regresssion > test. It actually is comparatively simple? If it is, great. I previously wrote this email: http://www.postgresql.org/message-id/CA+TgmoZ=vZriJMxLkqi_V0jg4k4LEAPmwUSC6RWXS5MquXUJNA@mail.gmail.com Alvaro came up with a way of addressing the second point I raised there, which I'm quite pleased about, but AFAIK there's been no progress on the first one. Maybe I missed something? >> and which will likely receive very limited >> testing from users to support a feature that is not in core, > > Just like half of the features you worked on yourself lately? Why is > this an argument? Because the stuff I'm adding doesn't break every time someone adds a new DDL command, something that we do regularly. If it were a question of writing this code once and being done with it, that would be unobjectionable in my view. But it isn't. Practically every change to gram.y is going to require a corresponding change to this stuff. As far as I can see, nobody except me has commented on the burden that places on everyone who may wish to add syntax support for a new construct in the future, which might mean that I'm worrying about something that isn't worth worrying about, but what I think is more likely is that nobody's worrying about it right now because they haven't had to do it yet. Just to illustrate the point, consider the CREATE TABLE name OF type syntax that Peter added a few years ago. That patch (e7b3349a8ad7afaad565c573fbd65fb46af6abbe) had the following impact on gram.y: src/backend/parser/gram.y | 56 ++++++++++++++++++++++++++++++++++-----------1 file changed, 43 insertions(+), 13 deletions(-) Now let's have a look at what impact it has on the deparsing code. Patch 6 has deparse_ColumnDef_Typed from lines 134 to 193. There's 3 or so lines in deparseTableElements that decide whether to call it. Patch 8 has more handling for this case, lines 439 to 443 and 463 to 490. So, if this feature had been committed before TABLE OF, it would have needed about 100 lines of net new code to handle this case - exclusive of refactoring. The actual size of the patch would probably have been modestly larger than that, because some code would need to be reindented when it got iffed out, and quite possibly some rearrangement would have been needed. But even ignoring all that, the deparse footprint of the patch would have been MORE THAN TWICE the parser footprint. I think that's going to be typical; and I think the deparse code is going to be significantly more labor-intensive to write than bison productions are. Do you really think that's not going to be a burden? > Being able to replicate DDL is a feature wish that has been around > pretty much since the inception of trigger based replication > solution. It's not some current fancy. And the json stuff only got there > because people wanted some way to manipulate the names in the replicated > - which this abstraction provides them with. I understand that being able to replicate DDL is an important need, and there may be no better way to do it than this. But that doesn't mean that this code is going to be bug-free or easy to maintain. >> may never be, and which I strongly suspect may be too clever by half. >> Once you've committed it, you're going to move onto other things and >> leave it to everyone who comes after to try to maintain it. I bet >> we'll still be running into bugs and half-implemented features five >> years from now, and maybe in ten. Ramming through special-purpose >> infrastructure instead of generalizing it is merely icing on the cake, >> but it's still moving in the wrong direction. > > You're just as much to blame for not writing a general json abstraction > layer for EXPLAIN. I'd say that's not much blame, but still, there's > really not much difference there. Well, this patch is series is at least an order of magnitude larger, and it's apparently doing significantly more complex stuff with JSON, because the explain.c patch just writes it out into a StringInfo. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > If it were a question of writing this code once and being done with > it, that would be unobjectionable in my view. But it isn't. > Practically every change to gram.y is going to require a corresponding > change to this stuff. As far as I can see, nobody except me has > commented on the burden that places on everyone who may wish to add > syntax support for a new construct in the future, which might mean > that I'm worrying about something that isn't worth worrying about, but > what I think is more likely is that nobody's worrying about it right > now because they haven't had to do it yet. I haven't been paying much attention to this thread, but I concur with Robert that adding yet another set of overhead requirements to any addition of new SQL is not a good thing. > Just to illustrate the point, consider the CREATE TABLE name OF type > syntax that Peter added a few years ago. That patch > (e7b3349a8ad7afaad565c573fbd65fb46af6abbe) had the following impact on > gram.y: This analysis is kind of cheating, because adding new syntax hasn't been only a matter of touching gram.y for a very long time. You invariably have to touch pg_dump, and you have to touch ruleutils.c unless it's strictly a DDL-command change. But people are used to those, and the value of keeping pg_dump working is clear to everybody. Adding a similar level of burden to support a feature with a narrow use-case seems like a nonstarter from here. ESPECIALLY if we also have to manually add regression test cases because there's no easy way to test it directly. regards, tom lane
On 2014-11-08 11:52:43 -0500, Tom Lane wrote: > Adding a similar > level of burden to support a feature with a narrow use-case seems like > a nonstarter from here. I don't understand this statement. In my experience the lack of a usable replication solution that allows temporary tables and major version differences is one of the most, if not *the* most, frequent criticisms of postgres I hear. How is this a narrow use case? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-11-08 11:52:43 -0500, Tom Lane wrote: >> Adding a similar >> level of burden to support a feature with a narrow use-case seems like >> a nonstarter from here. > I don't understand this statement. In my experience the lack of a usable > replication solution that allows temporary tables and major version > differences is one of the most, if not *the* most, frequent criticisms > of postgres I hear. How is this a narrow use case? [ shrug... ] I don't personally give a damn about logical replication, especially not logical replication implemented in this fashion. It looks large and rickety (ie full of opportunities for bugs) and I would never trust data I cared about to it. Or in short: AFAICS you're not building the next WAL-shipping replication solution, you're building the next Slony, and Slony never has and never will be more than a niche use-case. Putting half of it into core wouldn't fix that, it would just put a lot more maintenance burden on core developers. Core developers are entitled to push back on such proposals. regards, tom lane
On 2014-11-08 12:07:41 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2014-11-08 11:52:43 -0500, Tom Lane wrote: > >> Adding a similar > >> level of burden to support a feature with a narrow use-case seems like > >> a nonstarter from here. > > > I don't understand this statement. In my experience the lack of a usable > > replication solution that allows temporary tables and major version > > differences is one of the most, if not *the* most, frequent criticisms > > of postgres I hear. How is this a narrow use case? > > [ shrug... ] I don't personally give a damn about logical replication, > especially not logical replication implemented in this fashion. "In this fashion" meaning ddl replication via event triggers? If you have an actual suggestion how to do it better I'm all ears. So far nobody has come up with anything. > Or in short: AFAICS you're not building the next WAL-shipping replication > solution, you're building the next Slony, and Slony never has and never > will be more than a niche use-case. A good number of the sites out there use either londiste or slony. Not because they like it, but because there's no other alternative. I'd love to simply say that we can make WAL based replication work across versions, platforms and subsets of relations in PG clusters. Since that seems quite unrealistic people have to go different ways. > Putting half of it into core wouldn't fix that, it would just put a > lot more maintenance burden on core developers. Imo stuff that can't be done sanely outside core needs to be put into core if it's actually desired by many users. And working DDL replication for logical replication solutions surely is. > Core developers are entitled to push back on such proposals. I'm not saying "core developers" (whover that is) aren't allowed to do so. But just because they think something is (too) invasive doesn't make it a niche application. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Nov 8, 2014 at 12:20 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> Putting half of it into core wouldn't fix that, it would just put a >> lot more maintenance burden on core developers. > > Imo stuff that can't be done sanely outside core needs to be put into > core if it's actually desired by many users. And working DDL replication > for logical replication solutions surely is. I don't buy it. This patch series is *all about* transferring the maintenance burden of this feature from the BDR developers to the core project. There's nothing to keep you from exposing the parse trees to C functions that can live in an extension, and you can do all of this deparsing there. Nobody will stop you, and when it breaks (not if) you can fix it in your code. The overhead of deparsing new types of parse nodes can be born by you. The only benefit of pushing it into core is that some other logical replication solution could also take advantage of that, but I know of nobody with any plans to do such a thing. On the flip side, the *cost* of pushing it into core is that it now becomes the PostgreSQL's community problem to update the deparsing code every time the grammar changes. That cost-benefit trade-off does not look favorable to me, especially given the absence of any kind of robust testing framework. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-11-08 12:30:29 -0500, Robert Haas wrote: > On Sat, Nov 8, 2014 at 12:20 PM, Andres Freund <andres@2ndquadrant.com> wrote: > >> Putting half of it into core wouldn't fix that, it would just put a > >> lot more maintenance burden on core developers. > > > > Imo stuff that can't be done sanely outside core needs to be put into > > core if it's actually desired by many users. And working DDL replication > > for logical replication solutions surely is. > > I don't buy it. This patch series is *all about* transferring the > maintenance burden of this feature from the BDR developers to the core > project. What? Not at all. It's *much* less work to do these kind of things out of core. As you probably have experienced more than once. If it were possible to do this entirely in a extension I'm pretty sure nobody would have bothered. > There's nothing to keep you from exposing the parse trees to > C functions that can live in an extension, and you can do all of this > deparsing there. Not really. There's some core functions that need to be touched. Like most of the stuff in patches 1,2,3,5,16 does. We could just integrate those parts, and be done with it. But would that actually be a good thing for the community? Then slony needs to do it and potentially others as well? Then auditing can't use it? Then potential schema tracking solutions can't use it? > Nobody will stop you, and when it breaks (not if) > you can fix it in your code. The overhead of deparsing new types of > parse nodes can be born by you. The only benefit of pushing it into > core is that some other logical replication solution could also take > advantage of that, but I know of nobody with any plans to do such a > thing. There've been people for a long while asking about triggers on catalogs for that purpose. IIRC Jan was one of them. > On the flip side, the *cost* of pushing it into core is that > it now becomes the PostgreSQL's community problem to update the > deparsing code every time the grammar changes. That cost-benefit > trade-off does not look favorable to me, especially given the absence > of any kind of robust testing framework. I agree that there should be testing in this. But I think you're quite overstating the effort of maintaining this. Looking at gram.y between the stamping of 9.3 devel and 9.4 devel for commits that'd require DDL deparsing changes: * ALTER TABLE .. ALTER CONSTRAINT: About 7 lines for the deparse code, out of a ~350 line patch * REFRESH MATERIALIZED VIEW ... CONCURRENTLY: About 1 line for deparse, out of ~700 * WITH CHECK OPTION for views: About 3 lines for deparse, out of ~1300 * REPLICA IDENTITY: About 24 lines for deparse, out of ~950 * ALTER TABLESPACE MOVE: About 20 lines for deparse, out of ~340. Although that patch was essentially scrapped afterwards.And rewritten differently. * CREATE TABLESPACE ... WITH ...: Not supported by event triggers right now as it's a global object. That's really not a whole lot. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2014-11-08 10:42:15 -0500, Robert Haas wrote: > On Sat, Nov 8, 2014 at 4:37 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > I don't understand why this is particularly difficult to regresssion > > test. It actually is comparatively simple? > > If it is, great. I previously wrote this email: > > http://www.postgresql.org/message-id/CA+TgmoZ=vZriJMxLkqi_V0jg4k4LEAPmwUSC6RWXS5MquXUJNA@mail.gmail.com > > Alvaro came up with a way of addressing the second point I raised > there, which I'm quite pleased about, but AFAIK there's been no > progress on the first one. Maybe I missed something? I unfortunately don't think so. And that sounds like a completely reasonable criticism. > Just to illustrate the point, consider the CREATE TABLE name OF type > syntax that Peter added a few years ago. That patch > (e7b3349a8ad7afaad565c573fbd65fb46af6abbe) had the following impact on > gram.y: > > src/backend/parser/gram.y | 56 ++++++++++++++++++++++++++++++++++----------- > 1 file changed, 43 insertions(+), 13 deletions(-) > Now let's have a look at what impact it has on the deparsing code. > Patch 6 has deparse_ColumnDef_Typed from lines 134 to 193. There's 3 > or so lines in deparseTableElements that decide whether to call it. > Patch 8 has more handling for this case, lines 439 to 443 and 463 to > 490. So, if this feature had been committed before TABLE OF, it would > have needed about 100 lines of net new code to handle this case - > exclusive of refactoring. The actual size of the patch would probably > have been modestly larger than that, because some code would need to > be reindented when it got iffed out, and quite possibly some > rearrangement would have been needed. But even ignoring all that, the > deparse footprint of the patch would have been MORE THAN TWICE the > parser footprint. Well, you disregarded the related costs of adjusting pg_dump et al, as Tom mentioned, that's a significant part. And yes, there's some additions that aren't entirely trivial to add. But the majority of additions are pretty simple. > I think that's going to be typical; and I think the deparse code is > going to be significantly more labor-intensive to write than bison > productions are. Do you really think that's not going to be a burden? I've looked into a fair number of cases and almost all are vastly simpler than this. Most of the DDL changes that have been done lately are things like adding IF NOT EXISTS somewhere; expanding existing syntax for new types of objects (ALTER ... RENAME for foreign servers, wrappers; LABEL for new types). I think given the complexity of newly added features the overhead of adding deparsing code isn't all that high. > > Being able to replicate DDL is a feature wish that has been around > > pretty much since the inception of trigger based replication > > solution. It's not some current fancy. And the json stuff only got there > > because people wanted some way to manipulate the names in the replicated > > - which this abstraction provides them with. > > I understand that being able to replicate DDL is an important need, > and there may be no better way to do it than this. But that doesn't > mean that this code is going to be bug-free or easy to maintain. Agreed. There's definitely no free lunch (here). This has been discussed more than once, and so far I've read anything superior that also has a chance of handling ALTER. > >> may never be, and which I strongly suspect may be too clever by half. > >> Once you've committed it, you're going to move onto other things and > >> leave it to everyone who comes after to try to maintain it. I bet > >> we'll still be running into bugs and half-implemented features five > >> years from now, and maybe in ten. Ramming through special-purpose > >> infrastructure instead of generalizing it is merely icing on the cake, > >> but it's still moving in the wrong direction. > > > > You're just as much to blame for not writing a general json abstraction > > layer for EXPLAIN. I'd say that's not much blame, but still, there's > > really not much difference there. > > Well, this patch is series is at least an order of magnitude larger, > and it's apparently doing significantly more complex stuff with JSON, > because the explain.c patch just writes it out into a StringInfo. And this code builds up the data in memory and then calls into the json code to build the json value. And to parse json it uses the functions underlying the SQL accessors. There's one function it should either not need anymore (dequote_jsonval) by using json_object_field_text instead of json_object_field or by exposing dequote_jsonval's functionality from json.c. I think the json angle here is a red herring. Sure, a nicer API could save some FunctionCallInfoData boilerplate, but that's pretty much it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Nov 8, 2014 at 1:05 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> There's nothing to keep you from exposing the parse trees to >> C functions that can live in an extension, and you can do all of this >> deparsing there. > > Not really. There's some core functions that need to be touched. Like > most of the stuff in patches 1,2,3,5,16 does. Patch 1 is fine. We've done similar things in the past (cf. c504513f83a9ee8dce4a719746ca73102cae9f13, 82b1b213cad3a69cf5f3dfaa81687c14366960fc). I'd just commit that. Patch 2 adds support for GRANT and REVOKE to the event trigger mechanism. I wonder if it's a bad idea to make the ddl_command_start/end events fire for DCL. We discussed a lot of these issues when this patch originally went in, and I think it'd be worth revisiting that discussion. Patch 3 is the same kind of idea as patch 2, only for COMMENT. Patch 5 depends on patch 4, which does a bunch of things. I *think* the upshot of patch 5 is that we're not currently firing event triggers in some situations where we should, in which case +1 for fixing that. It would help if there were a real commit message, and/or some more contents, and I think it could be more completely disentangled from patch 4. Patch 16 again contains almost no comments and no description of its specific purpose, but it appears to be similar to patch 1, so probably mostly uncontroversial. > We could just integrate those parts, and be done with it. But would that > actually be a good thing for the community? Then slony needs to do it > and potentially others as well? Then auditing can't use it? Then > potential schema tracking solutions can't use it? Do you think Slony is really going to use this? I guess we can let the Slony guys speak for themselves, but I've been skeptical since day one that this is the best way to do DDL replication, and I still am. There are lots of ways that a replicated DDL statement can fail on the replicas, and what are you going to do then? It's too late to show the user the error message, so you can throw it in a log someplace and hope that somebody notices, but that's it. It makes a lot more sense to me to use some kind of a tool that applies the DDL in a coordinated fashion on all nodes - or even just do it manually, since it might very well be desirable to take the lock on different nodes at widely different times, separated by a switchover. I certainly think there's a use-case for what you're trying to do here, but I don't think it'll be right for everyone. Certainly, if the Slony guys - or some other team building an out-of-core replication solutions says, hey, we really want this in core, that would considerably strengthen the argument for putting it there. But I haven't heard anyone say that yet - unlike logical decoding, were we did have other people expressing clear interest in using it. > There've been people for a long while asking about triggers on catalogs > for that purpose. IIRC Jan was one of them. My impression, based on something Christopher Brown said a few years ago, is that Slony's DDL trigger needs are largely satisfied by the existing event trigger stuff. It would be helpful to get confirmation as to whether that's the case. >> On the flip side, the *cost* of pushing it into core is that >> it now becomes the PostgreSQL's community problem to update the >> deparsing code every time the grammar changes. That cost-benefit >> trade-off does not look favorable to me, especially given the absence >> of any kind of robust testing framework. > > I agree that there should be testing in this. > > But I think you're quite overstating the effort of maintaining > this. Looking at gram.y between the stamping of 9.3 devel and 9.4 devel > for commits that'd require DDL deparsing changes: > * ALTER TABLE .. ALTER CONSTRAINT: About 7 lines for the deparse code, > out of a ~350 line patch > * REFRESH MATERIALIZED VIEW ... CONCURRENTLY: About 1 line for deparse, > out of ~700 > * WITH CHECK OPTION for views: About 3 lines for deparse, out of ~1300 > * REPLICA IDENTITY: About 24 lines for deparse, out of ~950 > * ALTER TABLESPACE MOVE: About 20 lines for deparse, out of > ~340. Although that patch was essentially scrapped afterwards. And > rewritten differently. > * CREATE TABLESPACE ... WITH ...: Not supported by event triggers right > now as it's a global object. > > That's really not a whole lot. Those are pretty minor syntax patches, though. I think it's more helpful to look at things like the row-level security stuff, or materialized views per se, or DDL support for collations. In any case, the additional coding burden concerns me less than the additional testing burden - but there's another email further down that's more to that specific point, so I'll stop here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-11-08 17:49:30 -0500, Robert Haas wrote: > Patch 2 adds support for GRANT and REVOKE to the event trigger > mechanism. I wonder if it's a bad idea to make the > ddl_command_start/end events fire for DCL. We discussed a lot of > these issues when this patch originally went in, and I think it'd be > worth revisiting that discussion. Well, it doesn't generally support it for all GRANT statement, but just for ones that are database local. I think that mostly skirts the problems from the last round of discussion. But I only vaguely remember them. > Patch 5 depends on patch 4, which does a bunch of things. I *think* > the upshot of patch 5 is that we're not currently firing event > triggers in some situations where we should, in which case +1 for > fixing that. It would help if there were a real commit message, > and/or some more contents, and I think it could be more completely > disentangled from patch 4. > > We could just integrate those parts, and be done with it. But would that > > actually be a good thing for the community? Then slony needs to do it > > and potentially others as well? Then auditing can't use it? Then > > potential schema tracking solutions can't use it? > > Do you think Slony is really going to use this? There was a fair amount noise about it at one of the past cluster hackers thingies. > I guess we can let > the Slony guys speak for themselves, but I've been skeptical since day > one that this is the best way to do DDL replication, and I still am. Well, I've yet to hear anything that's realistic otherwise. > There are lots of ways that a replicated DDL statement can fail on the > replicas, and what are you going to do then? It's too late to show > the user the error message, so you can throw it in a log someplace and > hope that somebody notices, but that's it. Sure. And absolutely the same is true for DML. And the lack of DDL integration makes it happen really rather frequently... > It makes a lot more sense > to me to use some kind of a tool that applies the DDL in a coordinated > fashion on all nodes - or even just do it manually, since it might > very well be desirable to take the lock on different nodes at widely > different times, separated by a switchover. I certainly think there's > a use-case for what you're trying to do here, but I don't think it'll > be right for everyone. I agree that it's not the right. > Certainly, if the Slony guys - or some other team building an > out-of-core replication solutions says, hey, we really want this in > core, that would considerably strengthen the argument for putting it > there. But I haven't heard anyone say that yet - unlike logical > decoding, were we did have other people expressing clear interest in > using it. As I said, there was clear interest at at least two of the cluster hackers meetings... > > There've been people for a long while asking about triggers on catalogs > > for that purpose. IIRC Jan was one of them. > > My impression, based on something Christopher Brown said a few years > ago, is that Slony's DDL trigger needs are largely satisfied by the > existing event trigger stuff. It would be helpful to get confirmation > as to whether that's the case. Oh, that's contrary to what I remember, but yes, it'd be interesting to hear about htat. > >> On the flip side, the *cost* of pushing it into core is that > >> it now becomes the PostgreSQL's community problem to update the > >> deparsing code every time the grammar changes. That cost-benefit > >> trade-off does not look favorable to me, especially given the absence > >> of any kind of robust testing framework. > > > > I agree that there should be testing in this. > > > > But I think you're quite overstating the effort of maintaining > > this. Looking at gram.y between the stamping of 9.3 devel and 9.4 devel > > for commits that'd require DDL deparsing changes: > > * ALTER TABLE .. ALTER CONSTRAINT: About 7 lines for the deparse code, > > out of a ~350 line patch > > * REFRESH MATERIALIZED VIEW ... CONCURRENTLY: About 1 line for deparse, > > out of ~700 > > * WITH CHECK OPTION for views: About 3 lines for deparse, out of ~1300 > > * REPLICA IDENTITY: About 24 lines for deparse, out of ~950 > > * ALTER TABLESPACE MOVE: About 20 lines for deparse, out of > > ~340. Although that patch was essentially scrapped afterwards. And > > rewritten differently. > > * CREATE TABLESPACE ... WITH ...: Not supported by event triggers right > > now as it's a global object. > > > > That's really not a whole lot. > > Those are pretty minor syntax patches, though. Sure, but these are all the ones from the stamping of 9.3devel to 9.4devel. I wanted to look at a release cycle, and that seemed the easiest way to do it. I didn't choose that cycle, because I knew it was "light" on ddl, I chose it because it was the last complete one. > I think it's more helpful to look at things like the row-level > security stuff, or materialized views per se, or DDL support for > collations. I can't imagine that writing the relatively straight forward stuff that deparsing requires is a noticeable part of a patch like RLS. The actual coding in anything bigger really is the minor part - especially the relatively trivial bits. And for prototyping you can just leave it out, just as frequently done today for pg_dump (which usually is noticeably more complex). I can't imagine the MATERIALIZED VIEW stuff to be much different. The CREATE MATERIALIZED VIEW stuff is basically CREATE VIEW with three extra clauses. I really can't see that mattering that much in the course of a 4.5k line patch. The same with collations. The first patch adding collations to table/index columns, just needs to add print the collation name - that's it. The patch for the collation DDL support needs to add CREATE COLLATION (~27 lines, including newlines and function header). DROP is handled generically. Most of ALTER is as well, and SET SCHEMA can't be very hard. Greetings, Andres Freund
On Sat, Nov 8, 2014 at 4:35 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> > I don't understand why this is particularly difficult to regresssion >> > test. It actually is comparatively simple? >> >> If it is, great. I previously wrote this email: >> >> http://www.postgresql.org/message-id/CA+TgmoZ=vZriJMxLkqi_V0jg4k4LEAPmwUSC6RWXS5MquXUJNA@mail.gmail.com >> >> Alvaro came up with a way of addressing the second point I raised >> there, which I'm quite pleased about, but AFAIK there's been no >> progress on the first one. Maybe I missed something? > > I unfortunately don't think so. And that sounds like a completely > reasonable criticism. I'm glad you agree. If you can find a way to address that point, I can live with the rest of it. I don't think it's dumb to be concerned about features that increase the cost of adding more features. But what really concerns me is that that code won't be well-tested, and if there are cases missing or somebody forgets to do it altogether, it's very likely that we won't notice. That seems like a huge problem from where I sit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Nov 8, 2014 at 6:24 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-11-08 17:49:30 -0500, Robert Haas wrote: >> Patch 2 adds support for GRANT and REVOKE to the event trigger >> mechanism. I wonder if it's a bad idea to make the >> ddl_command_start/end events fire for DCL. We discussed a lot of >> these issues when this patch originally went in, and I think it'd be >> worth revisiting that discussion. > > Well, it doesn't generally support it for all GRANT statement, but just > for ones that are database local. I think that mostly skirts the > problems from the last round of discussion. But I only vaguely remember > them. The issue I was alluding to was terminological: it's not clear that GRANT and REVOKE should be called DDL rather than DCL, although we do have precedent in some of the logging settings. The other issue I remember is that if you have a separate event trigger for GRANT/REVOKE, you can expose fields like operation/object type/permission/target. That's much harder to do for an event trigger that's very broad. But anyway, I think it would be worth going back and looking at the previous discussion. > As I said, there was clear interest at at least two of the cluster > hackers meetings... We need to get some better data here. But again, my core concern is that we have no good way to test this code for bugs, including of omission, without which I think we will be very sad. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 8 November 2014 17:49, Robert Haas <robertmhaas@gmail.com> wrote:
> > We could just integrate those parts, and be done with it. But would that
> > actually be a good thing for the community? Then slony needs to do it
> > and potentially others as well? Then auditing can't use it? Then
> > potential schema tracking solutions can't use it?
>
> Do you think Slony is really going to use this? I guess we can let
> the Slony guys speak for themselves, but I've been skeptical since day
> one that this is the best way to do DDL replication, and I still am.
> There are lots of ways that a replicated DDL statement can fail on the
> replicas, and what are you going to do then? It's too late to show
> the user the error message, so you can throw it in a log someplace and
> hope that somebody notices, but that's it. It makes a lot more sense
> to me to use some kind of a tool that applies the DDL in a coordinated
> fashion on all nodes - or even just do it manually, since it might
> very well be desirable to take the lock on different nodes at widely
> different times, separated by a switchover. I certainly think there's
> a use-case for what you're trying to do here, but I don't think it'll
> be right for everyone.
>
> Certainly, if the Slony guys - or some other team building an
> out-of-core replication solutions says, hey, we really want this in
> core, that would considerably strengthen the argument for putting it
> there. But I haven't heard anyone say that yet - unlike logical
> decoding, were we did have other people expressing clear interest in
> using it.
> > There've been people for a long while asking about triggers on catalogs> > We could just integrate those parts, and be done with it. But would that
> > actually be a good thing for the community? Then slony needs to do it
> > and potentially others as well? Then auditing can't use it? Then
> > potential schema tracking solutions can't use it?
>
> Do you think Slony is really going to use this? I guess we can let
> the Slony guys speak for themselves, but I've been skeptical since day
> one that this is the best way to do DDL replication, and I still am.
> There are lots of ways that a replicated DDL statement can fail on the
> replicas, and what are you going to do then? It's too late to show
> the user the error message, so you can throw it in a log someplace and
> hope that somebody notices, but that's it. It makes a lot more sense
> to me to use some kind of a tool that applies the DDL in a coordinated
> fashion on all nodes - or even just do it manually, since it might
> very well be desirable to take the lock on different nodes at widely
> different times, separated by a switchover. I certainly think there's
> a use-case for what you're trying to do here, but I don't think it'll
> be right for everyone.
>
> Certainly, if the Slony guys - or some other team building an
> out-of-core replication solutions says, hey, we really want this in
> core, that would considerably strengthen the argument for putting it
> there. But I haven't heard anyone say that yet - unlike logical
> decoding, were we did have other people expressing clear interest in
> using it.
> > for that purpose. IIRC Jan was one of them.
>
> My impression, based on something Christopher Brown said a few years
> ago, is that Slony's DDL trigger needs are largely satisfied by the
> existing event trigger stuff. It would be helpful to get confirmation
> as to whether that's the case.
I'm not sure that a replication system that intends to do partial replication
(e.g. - being selective of what objects are to be replicated) will necessarily
want to use the CREATE event triggers to capture creates.
Several cases pop up with different answers:
a) I certainly don't want to replicate temporary tables
b) I almost certainly don't want to replicate unlogged tables
c) For "more ordinary" tables, I'm not sure I want to extend Slony
a) I certainly don't want to replicate temporary tables
b) I almost certainly don't want to replicate unlogged tables
c) For "more ordinary" tables, I'm not sure I want to extend Slony
to detect them and add them automatically, because there
are annoying sub-cases
c.1) If I'm working on data conversion, I may create not totally
temporary tables that are nonetheless not worthy to replicate.
(I'm working on such right now)
Long and short: it seems likely that I'd frequently NOT want all new tables
added to replication, at least not all of them, all the time.
added to replication, at least not all of them, all the time.
What would seem valuable, to me, would be to have a CREATE event
trigger that lets me know the OID and/or fully qualified name of the new
object so that perhaps the replication system:
trigger that lets me know the OID and/or fully qualified name of the new
object so that perhaps the replication system:
a) Has some kind of rule system to detect if it wants to replicate it,
b) Logs the change so a human might know later that there's new stuff
that probably ought to be replicated
b) Logs the change so a human might know later that there's new stuff
that probably ought to be replicated
c) Perhaps a human might put replication into a new "suggestive"
mode, a bit akin to Slony's "EXECUTE SCRIPT", but where the human
mode, a bit akin to Slony's "EXECUTE SCRIPT", but where the human
essentially says, "Here, I'm running DDL against this connection for a
while, and I'd be grateful if Postgres told Slony to capture all the new
tables and sequences and replicated them."
while, and I'd be grateful if Postgres told Slony to capture all the new
tables and sequences and replicated them."
There are kind of two approaches:
a) Just capture the OIDs, and have replication go back later and grab
the table definition once the dust clears on the master
b) We need to capture ALL the DDL, whether CREATE or ALTER, and
b) We need to capture ALL the DDL, whether CREATE or ALTER, and
forward it, altered to have fully qualified names on everything so that
we don't need to duplicate all the "set search_path" requests and
such.
such.
I suppose there's also a third...
c) Have a capability to put an event trigger function in place that makes
DDL requests fail.
That's more useful than you'd think; if, by default, we make them fail,
c) Have a capability to put an event trigger function in place that makes
DDL requests fail.
That's more useful than you'd think; if, by default, we make them fail,
and with an error messages such as
"DDL request failed as it was not submitted using slonik DDL TOOL"
then we have protection against uncontrolled application of DDL.
DDL TOOL would switch off the "fail trigger", possibly trying to
capture the DDL, or perhaps just capturing the statements passed
to it so they get passed everywhere. (That heads back to a) and b);
capture the DDL, or perhaps just capturing the statements passed
to it so they get passed everywhere. (That heads back to a) and b);
what should get captured...)
I'm not sure that all of that is totally internally coherent, but I hope there
are some ideas worth thinking about.
are some ideas worth thinking about.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"
On 10/11/14 23:37, Christopher Browne wrote: > On 8 November 2014 17:49, Robert Haas <robertmhaas@gmail.com > > > > My impression, based on something Christopher Brown said a few years > > ago, is that Slony's DDL trigger needs are largely satisfied by the > > existing event trigger stuff. It would be helpful to get confirmation > > as to whether that's the case. > > I'm not sure that a replication system that intends to do partial > replication > (e.g. - being selective of what objects are to be replicated) will > necessarily > want to use the CREATE event triggers to capture creates. > > Several cases pop up with different answers: > a) I certainly don't want to replicate temporary tables > b) I almost certainly don't want to replicate unlogged tables > c) For "more ordinary" tables, I'm not sure I want to extend Slony > to detect them and add them automatically, because there > are annoying sub-cases > > c.1) If I'm working on data conversion, I may create not totally > temporary tables that are nonetheless not worthy to replicate. > (I'm working on such right now) > > Long and short: it seems likely that I'd frequently NOT want all new tables > added to replication, at least not all of them, all the time. I don't see how this is problem with using CREATE event triggers, just put logic in your trigger that handles this, you get the object identity of the object that is being created/altered so you can get any info about it you wish and you can easily filter however you want. > There are kind of two approaches: > > a) Just capture the OIDs, and have replication go back later and grab > the table definition once the dust clears on the master > > b) We need to capture ALL the DDL, whether CREATE or ALTER, and > forward it, altered to have fully qualified names on everything so that > we don't need to duplicate all the "set search_path" requests and > such. > This is basically what this patch gives you (actually both the canonized command and the identity)? > I suppose there's also a third... > > c) Have a capability to put an event trigger function in place that makes > DDL requests fail. > > That's more useful than you'd think; if, by default, we make them fail, > and with an error messages such as > "DDL request failed as it was not submitted using slonik DDL TOOL" > You can do that already, it's even the example in the event trigger documentation. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2014-11-10 17:37:40 -0500, Christopher Browne wrote: > On 8 November 2014 17:49, Robert Haas <robertmhaas@gmail.com> wrote: > > > We could just integrate those parts, and be done with it. But would that > > > actually be a good thing for the community? Then slony needs to do it > > > and potentially others as well? Then auditing can't use it? Then > > > potential schema tracking solutions can't use it? > > > > Do you think Slony is really going to use this? I guess we can let > > the Slony guys speak for themselves, but I've been skeptical since day > > one that this is the best way to do DDL replication, and I still am. > > There are lots of ways that a replicated DDL statement can fail on the > > replicas, and what are you going to do then? It's too late to show > > the user the error message, so you can throw it in a log someplace and > > hope that somebody notices, but that's it. It makes a lot more sense > > to me to use some kind of a tool that applies the DDL in a coordinated > > fashion on all nodes - or even just do it manually, since it might > > very well be desirable to take the lock on different nodes at widely > > different times, separated by a switchover. I certainly think there's > > a use-case for what you're trying to do here, but I don't think it'll > > be right for everyone. > > > > Certainly, if the Slony guys - or some other team building an > > out-of-core replication solutions says, hey, we really want this in > > core, that would considerably strengthen the argument for putting it > > there. But I haven't heard anyone say that yet - unlike logical > > decoding, were we did have other people expressing clear interest in > > using it. > > > > There've been people for a long while asking about triggers on catalogs > > > for that purpose. IIRC Jan was one of them. > > > > My impression, based on something Christopher Brown said a few years > > ago, is that Slony's DDL trigger needs are largely satisfied by the > > existing event trigger stuff. It would be helpful to get confirmation > > as to whether that's the case. > > I'm not sure that a replication system that intends to do partial > replication > (e.g. - being selective of what objects are to be replicated) will > necessarily > want to use the CREATE event triggers to capture creates. > > Several cases pop up with different answers: > a) I certainly don't want to replicate temporary tables > b) I almost certainly don't want to replicate unlogged tables Those are quite easy to recognize and skip. > c) For "more ordinary" tables, I'm not sure I want to extend Slony > to detect them and add them automatically, because there > are annoying sub-cases > > c.1) If I'm working on data conversion, I may create not totally > temporary tables that are nonetheless not worthy to replicate. > (I'm working on such right now) Sure. you might not want to do it automatically all the time - but I think it's a very useful default mode. Once you can replicate CREATEs per se, it's easy to add logic (in a couple lines of plpgsql or whatever) to only do so in a certain schema or similar. But the main reason all this is interesting isn't so much CREATE itself. But that it can be (and Alvaro has mostly done it!) for ALTER as well. And there it imo becomes really interesting. Because you can quite easily check whether the affected relation is being replicated you can just emit the DDL when that's the case. And that makes DDL in a logically replicated setup *much* easier. > Long and short: it seems likely that I'd frequently NOT want all new tables > added to replication, at least not all of them, all the time. Agreed. That's quite possible with the design here - you get the creation commands and can decide whether you want to do anything with them. You're not forced to insert them into your replication queue or whatever you're using for that. > What would seem valuable, to me, would be to have a CREATE event > trigger that lets me know the OID and/or fully qualified name of the new > object so that perhaps the replication system: > > a) Has some kind of rule system to detect if it wants to replicate it, Sure. > b) Logs the change so a human might know later that there's new stuff > that probably ought to be replicated Sure. > c) Perhaps a human might put replication into a new "suggestive" > mode, a bit akin to Slony's "EXECUTE SCRIPT", but where the human > essentially says, "Here, I'm running DDL against this connection for a > while, and I'd be grateful if Postgres told Slony to capture all the new > tables and sequences and replicated them." Sure. Some of that already is possible with the current event triggers - and all of it would be possible with the suggested functionality here. An old version of bdr, employing the functionality presented here, had the following (simplified) event trigger: CREATE OR REPLACE FUNCTION bdr.queue_commands()RETURNS event_trigger LANGUAGE plpgsql AS $function$ DECLARE r RECORD; BEGIN -- don't recursively log ddl commands IF pg_replication_identifier_is_replaying() THEN RETURN; END IF; IF current_setting('bdr.skip_ddl_replication')::boolean THEN RETURN; END IF; FOR r IN SELECT * FROM pg_event_trigger_get_creation_commands() LOOP /* ignore temporary objects */ IF r.schema= 'pg_temp' THEN CONTINUE; END IF; /* ignore objects that are part of an extension */ IF r.in_extension THEN CONTINUE; END IF; INSERT INTO bdr.bdr_queued_commands( lsn, queued_at, command_tag, command, executed ) VALUES ( pg_current_xlog_location(), NOW(), r.command_tag, pg_catalog.pg_event_trigger_expand_command(r.command), 'false' ); IF r.command_tag = 'CREATE TABLE' and r.object_type = 'table' THEN EXECUTE 'CREATE TRIGGER truncate_triggerAFTER TRUNCATE ON ' || r.identity || ' FOR EACH STATEMENT EXECUTE PROCEDURE bdr.queue_truncate()'; END IF; END LOOP; END; $function$; It seems to me that'd pretty much allow all of your wishes above? > There are kind of two approaches: > > a) Just capture the OIDs, and have replication go back later and grab > the table definition once the dust clears on the master That's problematic imo if there's further changes to the table definition - not exactly a infrequent thing. > b) We need to capture ALL the DDL, whether CREATE or ALTER, and > forward it, altered to have fully qualified names on everything so that > we don't need to duplicate all the "set search_path" requests and > such. That's essentially where this patch is going. The submission is only CREATE, but once the design is agreed upon, ALTER is coming. The patch we're currently using for BDR has a good chunk of ALTER support. > I suppose there's also a third... > > c) Have a capability to put an event trigger function in place that makes > DDL requests fail. That's actually already quite possible. CREATE OR REPLACE FUNCTION prevent_ddl_outside_slonik()RETURNS event_trigger LANGUAGE plpgsql AS $function$ BEGIN IF current_setting('slony.inside_slonik')::boolean THEN RETURN; END IF; RAISE ERROR 'hey there, use slonik!!!'; END $function$; What's missing is that you probably want to look into the object to see whether it's temporary, unlogged, et al. > That's more useful than you'd think; if, by default, we make them fail, > and with an error messages such as > "DDL request failed as it was not submitted using slonik DDL TOOL" > > then we have protection against uncontrolled application of DDL. > > DDL TOOL would switch off the "fail trigger", possibly trying to > capture the DDL, or perhaps just capturing the statements passed > to it so they get passed everywhere. (That heads back to a) and b); > what should get captured...) > > I'm not sure that all of that is totally internally coherent, but I > hope there are some ideas worth thinking about. I think it's actually quite coherent - and at least partially mirrors the thoughts that have gone into this... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 11/10/14, 4:58 PM, Andres Freund wrote: > But the main reason all this is interesting isn't so much CREATE > itself. But that it can be (and Alvaro has mostly done it!) for ALTER as > well. And there it imo becomes really interesting. Because you can quite > easily check whether the affected relation is being replicated you can > just emit the DDL when that's the case. And that makes DDL in a > logically replicated setup*much* easier. +1. Adding columns is a PITA, you have to manually ensure you do it on all slaves first. Drop is somewhat worse, because you have to do it on the master first, opposite of the (more usual) case of adding a column. RENAME is a complete disaster. Handing scripts to your replication system to execute isn't a very good alternative either; it assumes that you actuallyhave a script (bad assumption with ORMs), and that you have a reasonable way to get that script to wherever you runyour replication system. I will also weigh in that there are a LOT of cases that binary replication doesn't cover. I find it interesting that priorto creating built in replication, the community stance was "We won't build that because there's too many different usecases", but now some folks are saying that everyone should just use streaming rep and be done with it. :P -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
On Mon, Nov 10, 2014 at 9:02 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > +1. Adding columns is a PITA, you have to manually ensure you do it on all > slaves first. > > Drop is somewhat worse, because you have to do it on the master first, > opposite of the (more usual) case of adding a column. > > RENAME is a complete disaster. > > Handing scripts to your replication system to execute isn't a very good > alternative either; it assumes that you actually have a script (bad > assumption with ORMs), and that you have a reasonable way to get that script > to wherever you run your replication system. I don't disagree with any of that, but running the command on the master and then propagating it to the slaves where it may succeed or fail - and if it fails, you won't know unless you're watching the logs on those machines, and, oh by the way, replication will also be broken - is not good either. We would never have shipped physical replication solution with that kind of limitation. What has made streaming replication so popular and successful with PostgreSQL users over the last five years is that, while it's a bit of a pain to get set up, once you have it set up, it is rock-solid. If there were a series of legal SQL commands that you could execute without error on a cluster of servers connected by streaming replication such that, when you got done, replication was broken, our users would scream bloody murder, or just stop using PostgreSQL. I think the approach to DDL replication that Alvaro, Andres, et al. are proposing here is absolutely fine - even praiseworthy - as an out-of-core solution that users can adopt if they are willing to accept the associated risks, as many users probably will be. But you wouldn't convince me to run it on any production system for which I was responsible. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-11-12 16:36:30 -0500, Robert Haas wrote: > On Mon, Nov 10, 2014 at 9:02 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > > +1. Adding columns is a PITA, you have to manually ensure you do it on all > > slaves first. > > > > Drop is somewhat worse, because you have to do it on the master first, > > opposite of the (more usual) case of adding a column. > > > > RENAME is a complete disaster. > > > > Handing scripts to your replication system to execute isn't a very good > > alternative either; it assumes that you actually have a script (bad > > assumption with ORMs), and that you have a reasonable way to get that script > > to wherever you run your replication system. > > I don't disagree with any of that, but running the command on the > master and then propagating it to the slaves where it may succeed or > fail - and if it fails, you won't know unless you're watching the logs > on those machines, and, oh by the way, replication will also be broken > - is not good either. That's already the situation today with all the logical replication solutions. They *constantly* break in the field. Most commonly because of DDL differences. I don't understand why you think it's likely for logical replication to break due to this? You mean because deparse yielded a invalid statement? In a normal single master setup there really shouldn't be scenarios where that happens? Except bugs - but as you know we had more than in HS/SR as well? Or are you worried about stuff like ALTER TABLE ... USING()? I think that's the replication solution's job to take care of/prevent. For multimaster the situation is more complex, I agree, but I don't think in core stuff needs to solve that for now? We are thinking about extending 2PC to be usable across logical decoding. That's a relatively simple patch. Then it's possible to do the DDL on the primary, ship it to the standby, apply it there, and only afterwards commit the prepared xact if that was successfull. That's quite cool - but somewhat in the remit of the replication solution. > I think the approach to DDL > replication that Alvaro, Andres, et al. are proposing here is > absolutely fine - even praiseworthy - as an out-of-core solution that > users can adopt if they are willing to accept the associated risks, as > many users probably will be. But you wouldn't convince me to run it > on any production system for which I was responsible. The solution here doesn't force you to do that, does it? It's something that can be used by more than replication solution? I just don't see the alternative you're proposing? I've so far not even seen a credible *sketch* of an alternative design that also can handle ALTER. The only current alternatives are 1) the user inserts some events into the queue manually. If they depend on any local state you're screwed. If they have syntax errors they're often screwed. 2). The user does all actions on the standby first. Then on the primary. That's hard for ALTER ADD COLUMN and similar, and just about impossible for renaming things. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Nov 12, 2014 at 4:58 PM, Andres Freund <andres@2ndquadrant.com> wrote: > That's already the situation today with all the logical replication > solutions. They *constantly* break in the field. Most commonly because > of DDL differences. Right. And that's why it's cool that logical decoding can operate through DDL differences. The apply side might not be able to cope with what pops out, but that's not logical decoding's fault, and different apply-sides can adopt different policies as to how to deal with whatever problems crop up. > I don't understand why you think it's likely for logical replication to > break due to this? You mean because deparse yielded a invalid statement? > In a normal single master setup there really shouldn't be scenarios > where that happens? Except bugs - but as you know we had more than in > HS/SR as well? I don't know exactly what you mean by "a normal single master setup". Surely the point of logical decoding is that the replica might not be identical to the master. And if it isn't, then a command that succeeded on the master might fail on the standby - for example, because an object by that name already exists there, or because a type doesn't exist there. (Even if you replicate a CREATE EXTENSION command, there's no guarantee that the .so exists on the target.) Then what? This is basically the same problem as multi-master replication conflicts, except with DDL. Resolving replication conflicts is not a very easy thing to get right even if you're only concerned about the rows in the tables. It's probably harder if you're worried about the schema, too. > We are thinking about extending 2PC to be usable across logical > decoding. That's a relatively simple patch. Then it's possible to do the > DDL on the primary, ship it to the standby, apply it there, and only > afterwards commit the prepared xact if that was successfull. That's > quite cool - but somewhat in the remit of the replication solution. That would certainly make the user aware of a quite a few kinds of errors that might otherwise go undetected. > The solution here doesn't force you to do that, does it? It's something > that can be used by more than replication solution? In theory, yes. > I just don't see the alternative you're proposing? I've so far not even > seen a credible *sketch* of an alternative design that also can handle > ALTER. The only current alternatives are 1) the user inserts some > events into the queue manually. If they depend on any local state you're > screwed. If they have syntax errors they're often screwed. 2). The user > does all actions on the standby first. Then on the primary. That's hard > for ALTER ADD COLUMN and similar, and just about impossible for renaming > things. It's a really hard problem. I don't think it's possible to make statement-based replication no-fail. Physical replication is basically no-fail because it just says, hey, go write these bytes into this page, and we can pretty much always do that. But statement-based logical replication means basically executing arbitrary chunks of code all over the backend, and there is just no way to guarantee that code won't throw an error. So the best we can do is to hope that those errors will get reported back to the user, which is going to require some kind of distributed transaction. Your idea to just run the replicated DDL statements on the standby before committing on the master is one approach to that problem, and probably the simplest one, but not the only one - one can imagine something that resembles true clustering, for example. By the way, the fact that you're planning to do log-based replication of DML and trigger-based replication of DDL scares the crap out of me. I'm not sure how that's going to work at all if the two are interleaved in the same transaction. Also, relying on triggers for replication is generally not awesome, because it increases administrative complexity. Event triggers are probably better in that regard than ordinary triggers, because they're database-wide, but I don't think they solve the problem completely. But the thing that scares me even more is that the DDL replication is not only trigger-based, but statement-based. Why don't we do logical replication by recording all of the SQL statements that run on the master and re-executing them on the standby? Well, because we all know that there will be plenty of important cases where that doesn't yield the same results on both servers. There's no intrinsic reason why that shouldn't also be a problem for DDL replication, and indeed it is. This patch set is trying to patch around that by finding a way to emit a revised DDL statement that is guaranteed to do exactly the same thing on both machines, and it's probably possible to do that in most cases, but it's probably not possible to do that in all cases. To be clear, none of this is a reason to reject this patch set; I've explained my reasons for being unhappy with it elsewhere, and I think they are valid, but these are not them. These are just ruminations on the difficulty of doing truly robust DDL replication. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-11-13 07:17:32 -0500, Robert Haas wrote: > On Wed, Nov 12, 2014 at 4:58 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > That's already the situation today with all the logical replication > > solutions. They *constantly* break in the field. Most commonly because > > of DDL differences. > > Right. And that's why it's cool that logical decoding can operate > through DDL differences. The apply side might not be able to cope > with what pops out, but that's not logical decoding's fault, and > different apply-sides can adopt different policies as to how to deal > with whatever problems crop up. I think pretty much all of the solutions just say "oops, you're on your own". And I can't blame them for that. Once there's a schema difference and it causes problem there's really not much that can be done. > > I don't understand why you think it's likely for logical replication to > > break due to this? You mean because deparse yielded a invalid statement? > > In a normal single master setup there really shouldn't be scenarios > > where that happens? Except bugs - but as you know we had more than in > > HS/SR as well? > > I don't know exactly what you mean by "a normal single master setup". > Surely the point of logical decoding is that the replica might not be > identical to the master. I actually think that that's not primary the point if you talk about individual objects. The majority of objects will be exactly the same on all nodes. If you actually want to have differening objects on the nodes you'll have to opt out/in (depending on your solution) of ddl replication for those objects. > And if it isn't, then a command that > succeeded on the master might fail on the standby - for example, > because an object by that name already exists there, or because a type > doesn't exist there. (Even if you replicate a CREATE EXTENSION > command, there's no guarantee that the .so exists on the target.) Then > what? Sure. There's reasons logical replication isn't always a win. But I don't see why that's a reason not to make it as robust as possible. Btw, the .so problem exists for wal shipping as as well. > This is basically the same problem as multi-master replication > conflicts, except with DDL. Resolving replication conflicts is not a > very easy thing to get right even if you're only concerned about the > rows in the tables. It's probably harder if you're worried about the > schema, too. I don't think it's a sane thing to do multimaster with differing schemas for individual relations, except maybe additional nonunique indexes. > > We are thinking about extending 2PC to be usable across logical > > decoding. That's a relatively simple patch. Then it's possible to do the > > DDL on the primary, ship it to the standby, apply it there, and only > > afterwards commit the prepared xact if that was successfull. That's > > quite cool - but somewhat in the remit of the replication solution. > > That would certainly make the user aware of a quite a few kinds of > errors that might otherwise go undetected. It is (or rather would be) a generally quite cool feature imo ;) > > The solution here doesn't force you to do that, does it? It's something > > that can be used by more than replication solution? > > In theory, yes. What's the practical point here? > > I just don't see the alternative you're proposing? I've so far not even > > seen a credible *sketch* of an alternative design that also can handle > > ALTER. The only current alternatives are 1) the user inserts some > > events into the queue manually. If they depend on any local state you're > > screwed. If they have syntax errors they're often screwed. 2). The user > > does all actions on the standby first. Then on the primary. That's hard > > for ALTER ADD COLUMN and similar, and just about impossible for renaming > > things. > > It's a really hard problem. > > I don't think it's possible to make statement-based replication > no-fail. I think generally logical replication has more failure cases than physical ones. Which you seem to agree with. > Physical replication is basically no-fail because it just > says, hey, go write these bytes into this page, and we can pretty much > always do that. But statement-based logical replication means > basically executing arbitrary chunks of code all over the backend, and > there is just no way to guarantee that code won't throw an error. So > the best we can do is to hope that those errors will get reported back > to the user, which is going to require some kind of distributed > transaction. Your idea to just run the replicated DDL statements on > the standby before committing on the master is one approach to that > problem, and probably the simplest one, but not the only one - one can > imagine something that resembles true clustering, for example. I think that's generally not what people need for primary/standby cases (of subsets of tables). In practice there aren't many failures like that besides schema differences. And there's just many usecases that aren't doable with physical replication, so we can't advise people doing that. > By the way, the fact that you're planning to do log-based replication > of DML and trigger-based replication of DDL scares the crap out of me. > I'm not sure how that's going to work at all if the two are > interleaved in the same transaction. Maybe that's based on a misunderstanding. All the event trigger does is insert a event into a (local) queue. That's then shipped to the other side using the same replication mechanisms as used for rows. Then, when receiving changes in that ddl queue the standby performs those actions before continuing with the replay. That makes the interleaving on the standby to be pretty much the same as on the primary. > Also, relying on triggers for > replication is generally not awesome, because it increases > administrative complexity. Event triggers are probably better in that > regard than ordinary triggers, because they're database-wide, but I > don't think they solve the problem completely. Maybe. It's not that hard to make the DDL solution record dependencies against the event triggers to prohibit the user from dropping them or such if that's your worry. We can easily (additionally) define "implicit" event triggers that are declared using a hook. Then some replication extension can just force them to be applied. > But the thing that > scares me even more is that the DDL replication is not only > trigger-based, but statement-based. Why don't we do logical > replication by recording all of the SQL statements that run on the > master and re-executing them on the standby? Well, because we all > know that there will be plenty of important cases where that doesn't > yield the same results on both servers. There's no intrinsic reason > why that shouldn't also be a problem for DDL replication, and indeed > it is. This patch set is trying to patch around that by finding a way > to emit a revised DDL statement that is guaranteed to do exactly the > same thing on both machines, and it's probably possible to do that in > most cases, but it's probably not possible to do that in all cases. Yes, it's not trivial. And I think there's some commands where you might not want to try but either scream ERROR or just rereplicate the whole relation or such. I very strongly feel that we (as postgres devs) have a *much* higher chance of recognizing these cases than either some random users (that write slonik scripts or similar) or some replication solution authors that aren't closely involved with -hackers. > These are just ruminations on the difficulty of doing truly robust DDL > replication. Right, it's far from easy. But imo that's an argument for providing the tools to do it as robust as we can in core. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Nov 13, 2014 at 7:45 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> Right. And that's why it's cool that logical decoding can operate >> through DDL differences. The apply side might not be able to cope >> with what pops out, but that's not logical decoding's fault, and >> different apply-sides can adopt different policies as to how to deal >> with whatever problems crop up. > > I think pretty much all of the solutions just say "oops, you're on your > own". And I can't blame them for that. Once there's a schema difference > and it causes problem there's really not much that can be done. Agreed. >> I don't know exactly what you mean by "a normal single master setup". >> Surely the point of logical decoding is that the replica might not be >> identical to the master. > > I actually think that that's not primary the point if you talk about > individual objects. The majority of objects will be exactly the same on > all nodes. If you actually want to have differening objects on the nodes > you'll have to opt out/in (depending on your solution) of ddl > replication for those objects. I'm not saying it's the primary point; I'm just saying that it can happen. >> And if it isn't, then a command that >> succeeded on the master might fail on the standby - for example, >> because an object by that name already exists there, or because a type >> doesn't exist there. (Even if you replicate a CREATE EXTENSION >> command, there's no guarantee that the .so exists on the target.) Then >> what? > > Sure. There's reasons logical replication isn't always a win. But I > don't see why that's a reason not to make it as robust as possible. I agree. > Btw, the .so problem exists for wal shipping as as well. Not in the same way. You might not be able to access the data on the standby if the .so defining the datatype isn't present, but the replication itself doesn't care. >> This is basically the same problem as multi-master replication >> conflicts, except with DDL. Resolving replication conflicts is not a >> very easy thing to get right even if you're only concerned about the >> rows in the tables. It's probably harder if you're worried about the >> schema, too. > > I don't think it's a sane thing to do multimaster with differing schemas > for individual relations, except maybe additional nonunique indexes. But that's exactly what you ARE doing, isn't it? I mean, if you replicate in only one direction, nothing keeps someone from modifying things on the replica independently of BDR, and if you replicate in both directions, well that's multi-master. >> > The solution here doesn't force you to do that, does it? It's something >> > that can be used by more than replication solution? >> >> In theory, yes. > > What's the practical point here? I am quite doubtful about whether there will ever be a second, working implementation, so I see all of this code - and the maintenance effort associated with it - as something that will really only benefit BDR. I understand that you don't see it that way, and I'm not saying that you are offering anything in bad faith, but it looks to me like even with all of this very substantial amount of infrastructure, you're still going to need a big pile of additional code inside BDR to actually make it work, and I don't hear anyone else offering to develop something similar. > I think generally logical replication has more failure cases than > physical ones. Which you seem to agree with. Yes, I agree with that. >> Physical replication is basically no-fail because it just >> says, hey, go write these bytes into this page, and we can pretty much >> always do that. But statement-based logical replication means >> basically executing arbitrary chunks of code all over the backend, and >> there is just no way to guarantee that code won't throw an error. So >> the best we can do is to hope that those errors will get reported back >> to the user, which is going to require some kind of distributed >> transaction. Your idea to just run the replicated DDL statements on >> the standby before committing on the master is one approach to that >> problem, and probably the simplest one, but not the only one - one can >> imagine something that resembles true clustering, for example. > > I think that's generally not what people need for primary/standby > cases (of subsets of tables). In practice there aren't many failures > like that besides schema differences. And there's just many usecases > that aren't doable with physical replication, so we can't advise people > doing that. I can't really follow this, especially the first sentence. >> By the way, the fact that you're planning to do log-based replication >> of DML and trigger-based replication of DDL scares the crap out of me. >> I'm not sure how that's going to work at all if the two are >> interleaved in the same transaction. > > Maybe that's based on a misunderstanding. All the event trigger does is > insert a event into a (local) queue. That's then shipped to the other > side using the same replication mechanisms as used for rows. Then, when > receiving changes in that ddl queue the standby performs those actions > before continuing with the replay. > That makes the interleaving on the standby to be pretty much the same as > on the primary. OK. But that's also not something that I can really imagine us ever adopting in core. If we were going to do DDL replication in core, I have to believe we'd find a way to put all of the information in the WAL stream, not use triggers. > Yes, it's not trivial. And I think there's some commands where you might > not want to try but either scream ERROR or just rereplicate the whole > relation or such. > > I very strongly feel that we (as postgres devs) have a *much* higher > chance of recognizing these cases than either some random users (that > write slonik scripts or similar) or some replication solution authors > that aren't closely involved with -hackers. It's clearly the job of the replication solution authors to figure these details out. I'm not going to get into the business of passing judgement on the relative competence of different groups of replication developers. There's no question that 2ndQuadrant is doing great, exciting work, but we should not tilt the playing field in a way that crowds other people out, even new players that haven't yet written their first line of code. > Right, it's far from easy. But imo that's an argument for providing the > tools to do it as robust as we can in core. I agree with that on general principle, but I do not have confidence in this patch set as the best foundation for DDL replication. You're talking about layering a lot of other complexity on top of the deparsing stuff in this patch to produce a usable solution, and even then I am not really convinced that it's going to be the final solution to this problem. Maybe it will, but I think there are enough chinks in the armor to be really, really cautious about being certain this is the right road to go down. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-11-14 12:38:52 -0500, Robert Haas wrote: > >> This is basically the same problem as multi-master replication > >> conflicts, except with DDL. Resolving replication conflicts is not a > >> very easy thing to get right even if you're only concerned about the > >> rows in the tables. It's probably harder if you're worried about the > >> schema, too. > > > > I don't think it's a sane thing to do multimaster with differing schemas > > for individual relations, except maybe additional nonunique indexes. > > But that's exactly what you ARE doing, isn't it? I mean, if you > replicate in only one direction, nothing keeps someone from modifying > things on the replica independently of BDR, and if you replicate in > both directions, well that's multi-master. Meh. By that definition any logical replication solution does multi master replication. Either you tell your users that that's not allowed, or you just prevent it by technical means. Absolutely the same is true for table contents. FWIW, in BDR we *do* prevent schemas from being modified independently on different nodes (unless you set the 'running with scissors' guc). > I am quite doubtful about whether there will ever be a second, working > implementation, so I see all of this code - and the maintenance effort > associated with it - as something that will really only benefit BDR. > I understand that you don't see it that way, and I'm not saying that > you are offering anything in bad faith, but it looks to me like even > with all of this very substantial amount of infrastructure, you're > still going to need a big pile of additional code inside BDR to > actually make it work, and I don't hear anyone else offering to > develop something similar. I don't know what to say about this. I don't see any other realistic way to perform DDL replication in logical rep, and nearly all my conversations with users have indicated that they want that. I think it's a good idea to structure independent features in a way that other solutions can reuse them. But I sure as hell can't force them to use it - especially as there's unfortunately not too much development going on in the existing logical replication solutions for postgres. Btw, I really think there's quite legitimate use cases for this besides logical replication solutions - e.g. schema tracking is quite a sensible use case. > >> By the way, the fact that you're planning to do log-based replication > >> of DML and trigger-based replication of DDL scares the crap out of me. > >> I'm not sure how that's going to work at all if the two are > >> interleaved in the same transaction. > > > > Maybe that's based on a misunderstanding. All the event trigger does is > > insert a event into a (local) queue. That's then shipped to the other > > side using the same replication mechanisms as used for rows. Then, when > > receiving changes in that ddl queue the standby performs those actions > > before continuing with the replay. > > That makes the interleaving on the standby to be pretty much the same as > > on the primary. > > OK. But that's also not something that I can really imagine us ever > adopting in core. Well, that bit really depends on what the user (e.g. a replication solution, or a schema tracking feature) needs. There's certain things that you can quite easily do as part of core (e.g. insert something into the WAL stream), that you just can't as external code. I don't think there's any external replication solution that won't have some form of internal queue to manipulate its internal state. For an external solution such a queue currently pretty much has to be some table, but for an eventual in core solution it could be done differently. > If we were going to do DDL replication in core, I have to believe we'd > find a way to put all of the information in the WAL stream, not use > triggers. I agree that we might want to represent the transport to standbys differently for something in core. That there's many different ways the deparsing output could be used imo is a good reason why that part of the mechanism isn't part of this submission. I don't really understand the arguments against triggers in general though. We're already using them quite extensively - I don't see why DDL replication has to meet some completely different bar than say foreign key checks or deferred uniqueness checks. We easily can add 'implicit' event triggers, that aren't defined inside the catalog if we feel like it. I'm just not sure we really would need/want to. > > Yes, it's not trivial. And I think there's some commands where you might > > not want to try but either scream ERROR or just rereplicate the whole > > relation or such. > > > > I very strongly feel that we (as postgres devs) have a *much* higher > > chance of recognizing these cases than either some random users (that > > write slonik scripts or similar) or some replication solution authors > > that aren't closely involved with -hackers. > > It's clearly the job of the replication solution authors to figure > these details out. I'm not going to get into the business of passing > judgement on the relative competence of different groups of > replication developers. It's not a question of relative competence. It's a question of the focus of that competence. There's just not many reasons a replication solution developer needs to have detailed knowledge about postgres' internal parsetrees for DDL statements. It's not like extensive documentation about their meaning exists. > There's no question that 2ndQuadrant is doing > great, exciting work, but we should not tilt the playing field in a > way that crowds other people out, even new players that haven't yet > written their first line of code. Huh? I'm not seing how I'm doing that in any way. > > Right, it's far from easy. But imo that's an argument for providing the > > tools to do it as robust as we can in core. > > I agree with that on general principle, but I do not have confidence > in this patch set as the best foundation for DDL replication. Can you please hint at some other workable design? I don't think there really is anything else. How should DDL be replicated in a logical replication solution but via DDL? It's not like just replicating the catalog rows or anything like that is something even remotely sane. As stated before, everything the user has to do manually is *much* more error prone than what's presented here. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Nov 14, 2014 at 1:18 PM, Andres Freund <andres@2ndquadrant.com> wrote: > I think it's a good idea to structure independent features in a way that > other solutions can reuse them. But I sure as hell can't force them to > use it - especially as there's unfortunately not too much development > going on in the existing logical replication solutions for postgres. > > Btw, I really think there's quite legitimate use cases for this besides > logical replication solutions - e.g. schema tracking is quite a sensible > use case. Well, as I already said, despite my doubts about the general utility of this feature, I'm willing to see us take it IF we have a testing framework that will reliably catch bugs, including bugs of omission. Without that, I'm very confident it's going to be a maintenance nightmare, and I believe you admitted yourself that that concern was reasonable. > Can you please hint at some other workable design? I don't think there > really is anything else. I think this really depends on what you mean by "anything else". Any DDL replication solution is necessarily going to involve the following steps: 1. Define some set of primitives such that any imaginable DDL operation can be expressed as a series of those primitives. 2. Find a way to capture those events as they happen. 3. Serialize them into some wire format and transport that format to the replica. 4. Apply them, possibly coordinating in some way with the master so that the user's original request fails if the apply fails. There are meaningful choices at every step. You're proposing that the primitives should be "anything that can be expressed as a complete SQL command against a single object" (I think - what are you going to emit for an ALL IN SCHEMA op - that thing itself, or a similar operation against each object in the schema?); that the capture mechanism should be an event trigger that inserts into a queue table; that the serialization format should be a JSON language designed to allow reassembly of the corresponding SQL statement; and that the apply coordination mechanism should be 2PC. But none of that is written in stone. As far as deciding what primitives to use, I think the principal decision to be made here is as to the appropriate level of granularity. For example, CREATE TABLE foo (a int DEFAULT 1, b int, CHECK (b > 42)) could be emitted as a single event saying that a table was created. But it could also be emitted as create-table (foo), add-column (foo, a, int), add-column (foo, b, int), add-column-default (a, 1), add-check-constraint (foo, b > 42). The appeal of a more fine-grained set of primitives is that there might be fewer of them, and that each of them might be simpler; one of the things that makes physical replication so reliable is that its primitives are very simple and thus easy to verify. The possible event-capture mechanisms seem to be to have either (a) event trigger or (b) a hook function in some yet-to-be-defined place or (c) core code which will either (i) write each event to a table, (ii) write each event directly into WAL, or perhaps (iii) write it someplace else (file? in-memory queue? network socket?). There are lots of possible serialization formats. Coordinating with the master could involve 2PC, as you propose; or trying to somehow validate that a given series of events is a valid state transformation based on the starting state on the standby before doing the operation on the master; or the use of a distributed transaction coordinated by something like PG-XC's global transaction manager; or you can skip it and hope for the best. In addition to the decisions above, you can try to prevent failures by restricting certain changes from happening, or you can let users change what they like and hope for the best. Different solutions can have different mechanisms for controlling which objects are under replication and which changes are not; or even allowing some individual DDL statements to opt out of replication while forcing others to participate. Administratively, solutions can be built to facilitate easy replication of an entire database to another node, or more specific applications like sharding, where creating a table on a master node creates child tables on a bunch of slave nodes, but they're not all identical, because we're partitioning the data so that only some of it will be on each node - thus the constraints and so on will be different. BDR has one set of answers to all of these questions, and despite my worries about a few points here and there, they are not stupid answers. But they are not the ONLY answers either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Nov 8, 2014 at 05:56:00PM +0100, Andres Freund wrote: > On 2014-11-08 11:52:43 -0500, Tom Lane wrote: > > Adding a similar > > level of burden to support a feature with a narrow use-case seems like > > a nonstarter from here. > > I don't understand this statement. In my experience the lack of a usable > replication solution that allows temporary tables and major version > differences is one of the most, if not *the* most, frequent criticisms > of postgres I hear. How is this a narrow use case? How would replicating DDL handle cases where the master and slave servers have different major versions and the DDL is only supported by the Postgres version on the master server? If it would fail, does this limit the idea that logical replication allows major version-different replication? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Bruce Momjian wrote: > On Sat, Nov 8, 2014 at 05:56:00PM +0100, Andres Freund wrote: > > On 2014-11-08 11:52:43 -0500, Tom Lane wrote: > > > Adding a similar level of burden to support a feature with a > > > narrow use-case seems like a nonstarter from here. > > > > I don't understand this statement. In my experience the lack of a > > usable replication solution that allows temporary tables and major > > version differences is one of the most, if not *the* most, frequent > > criticisms of postgres I hear. How is this a narrow use case? > > How would replicating DDL handle cases where the master and slave > servers have different major versions and the DDL is only supported by > the Postgres version on the master server? Normally you would replicate between an older master and a newer replica, so this shouldn't be an issue. I find it unlikely that we would de-support some syntax that works in an older version: it would break pg_dump, for one thing. In other words I view cross-version replication as a mechanism to upgrade, not something that you would use permanently. Once you finish upgrading, promote the newer server and ditch the old master. > If it would fail, does this limit the idea that logical replication > allows major version-different replication? Not in my view, at least. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
* Alvaro Herrera (alvherre@2ndquadrant.com) wrote: > Bruce Momjian wrote: > > How would replicating DDL handle cases where the master and slave > > servers have different major versions and the DDL is only supported by > > the Postgres version on the master server? > > Normally you would replicate between an older master and a newer > replica, so this shouldn't be an issue. I find it unlikely that we > would de-support some syntax that works in an older version: it would > break pg_dump, for one thing. While I tend to agree with you that it's not something we're likely to do, how would it break pg_dump? We have plenty of version-specific logic in pg_dump and we could certainly generate a different syntax in a newer version than we did in an older version, if the newer server was expecting something different. We've always held that you should use the version of pg_dump which match the server you are moving *to*, after all. > In other words I view cross-version replication as a mechanism to > upgrade, not something that you would use permanently. Once you > finish upgrading, promote the newer server and ditch the old master. I agree with this also. > > If it would fail, does this limit the idea that logical replication > > allows major version-different replication? > > Not in my view, at least. I'm all for having logical replication be a way to do major version different replication (particularly for the purposes of upgrades), but it shouldn't mean we can never de-support a given syntax. As one example, we've discussed a few times removing certain table-level privileges on the basis that they practically mean you own the table. Perhaps that'll never actually happen, but if it does, logical replication would need to deal with it. Thanks, Stephen
On Wed, Nov 26, 2014 at 09:01:13PM -0500, Stephen Frost wrote: > * Alvaro Herrera (alvherre@2ndquadrant.com) wrote: > > Bruce Momjian wrote: > > > How would replicating DDL handle cases where the master and slave > > > servers have different major versions and the DDL is only supported by > > > the Postgres version on the master server? > > > > Normally you would replicate between an older master and a newer > > replica, so this shouldn't be an issue. I find it unlikely that we > > would de-support some syntax that works in an older version: it would > > break pg_dump, for one thing. I like the idea of older master/new replica, but what about bidirectional replication? Would the create/alter/drop WAL locical structure remain consistent across major versions, or would the code have to read at least one older version? Would we limit it to one? > While I tend to agree with you that it's not something we're likely to > do, how would it break pg_dump? We have plenty of version-specific > logic in pg_dump and we could certainly generate a different syntax in > a newer version than we did in an older version, if the newer server was > expecting something different. We've always held that you should use > the version of pg_dump which match the server you are moving *to*, after > all. pg_upgrade avoids having to deal with major version changes by leveraging pg_dump. Is it possible to have the new replica use the new pg_dump to connect to the old master to get a CREATE command that it can execute? Would that avoid having to ship CREATE syntax? What it wouldn't help with is ALTER and DROP though. (We do have ALTER but I think only for inheritance cases.) > > In other words I view cross-version replication as a mechanism to > > upgrade, not something that you would use permanently. Once you > > finish upgrading, promote the newer server and ditch the old master. > > I agree with this also. > > > If it would fail, does this limit the idea that logical replication > > > allows major version-different replication? > > > > Not in my view, at least. > > I'm all for having logical replication be a way to do major version > different replication (particularly for the purposes of upgrades), but > it shouldn't mean we can never de-support a given syntax. > > As one example, we've discussed a few times removing certain table-level > privileges on the basis that they practically mean you own the table. > Perhaps that'll never actually happen, but if it does, logical > replication would need to deal with it. Should we just tell the user not to modify the database schema during this period? Should we have a server mode which disables DDL? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
> Normally you would replicate between an older master and a newer > replica, so this shouldn't be an issue. I find it unlikely that we > would de-support some syntax that works in an older version: it would > break pg_dump, for one thing. The most common way in which we break forward-compatibility is by reserving additional keywords. Logical replication can deal with this by quoting all identifiers, so it's not a big issue. It's not the only possible issue; it is certainly conceivable that we could change something in the server and then try to change pg_dump to compensate. I think we wouldn't do that with anything very big, but suppose spgist were supplanted by a new and better access method tsigps. Well, we might figure that there are few enough people accustomed to the current syntax that we can get away with hacking the pg_dump output, and yes, anyone with an existing dump from an older system will have problems, but there won't be many of them, and they can always adjust the dump by hand. If we ever decided to do such a thing, whatever syntax transformation we did in pg_dump would have to be mimicked by any logical replication solution. I think it's legitimate to say that this could be a problem, but I think it probably won't be a problem very often, and I think when it is a problem it probably won't be a very big problem, because we already have good reasons not to break the ability to restore older dumps on newer servers more often than absolutely necessary. One of the somewhat disappointing things about this is that we're talking about adding a lot of new deparsing code to the server that probably, in some sense, duplicates pg_dump. Perhaps in an ideal world there would be a way to avoid that, but in the world we really live in, there probably isn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Nov 27, 2014 at 10:16 AM, Bruce Momjian <bruce@momjian.us> wrote: > On Sat, Nov 8, 2014 at 05:56:00PM +0100, Andres Freund wrote: >> On 2014-11-08 11:52:43 -0500, Tom Lane wrote: >> > Adding a similar >> > level of burden to support a feature with a narrow use-case seems like >> > a nonstarter from here. >> >> I don't understand this statement. In my experience the lack of a usable >> replication solution that allows temporary tables and major version >> differences is one of the most, if not *the* most, frequent criticisms >> of postgres I hear. How is this a narrow use case? > > How would replicating DDL handle cases where the master and slave > servers have different major versions and the DDL is only supported by > the Postgres version on the master server? If it would fail, does this > limit the idea that logical replication allows major version-different > replication? Marking this patch as "Returned with feedback". Even with the more-fundamental arguments of putting such replication solution in-core or not (I am skeptical as well btw), on a code-base-discussion-only I don't think that this patch is acceptable as-is without more structure of jsonb to do on-memory manipulation of containers (aka remove ObjTree stuff). Regards, -- Michael