Обсуждение: The output sql generated by pg_dump for a create function refers to a modified table name
Hi, The output sql generated by pg_dump for the below function refers to a modified table name: create table t1 (c1 int); create table t2 (c1 int); CREATE OR REPLACE FUNCTION test_fun(c1 int) RETURNS void LANGUAGE SQL BEGIN ATOMIC WITH delete_t1 AS ( DELETE FROM t1 WHERE c1 = $1 ) INSERT INTO t1 (c1) SELECT $1 FROM t2; END; The below sql output created by pg_dump refers to t1_1 which should have been t1: CREATE FUNCTION public.test_fun(c1 integer) RETURNS void LANGUAGE sql BEGIN ATOMIC WITH delete_t1 AS ( DELETE FROM public.t1 WHERE (t1_1.c1 = test_fun.c1) ) INSERT INTO public.t1 (c1) SELECT test_fun.c1 FROM public.t2; END; pg_get_function_sqlbody also returns similar result: select proname, pg_get_function_sqlbody(oid) from pg_proc where proname = 'test_fun'; proname | pg_get_function_sqlbody ----------+------------------------------------------- test_fun | BEGIN ATOMIC + | WITH delete_t1 AS ( + | DELETE FROM t1 + | WHERE (t1_1.c1 = test_fun.c1) + | ) + | INSERT INTO t1 (c1) SELECT test_fun.c1+ | FROM t2; + | END (1 row) I felt the problem here is with set_rtable_names function which changes the relation name t1 to t1_1 while parsing the statement: /* * If the selected name isn't unique, append digits to make it so, and * make a new hash entry for it once we've got a unique name. For a * very long input name, we might have to truncate to stay within * NAMEDATALEN. */ During the query generation we will set the table names before generating each statement, in our case the table t1 would have been added already to the hash table during the first insert statement generation. Next time it will try to set the relation names again for the next statement, i.e delete statement, if the entry with same name already exists, it will change the name to t1_1 by appending a digit to keep the has entry unique. Regards, Vignesh
Re: The output sql generated by pg_dump for a create function refers to a modified table name
От
"Jonathan S. Katz"
Дата:
On 2/17/23 5:22 AM, vignesh C wrote: > Hi, > > The output sql generated by pg_dump for the below function refers to a > modified table name: > create table t1 (c1 int); > create table t2 (c1 int); > > CREATE OR REPLACE FUNCTION test_fun(c1 int) > RETURNS void > LANGUAGE SQL > BEGIN ATOMIC > WITH delete_t1 AS ( > DELETE FROM t1 WHERE c1 = $1 > ) > INSERT INTO t1 (c1) SELECT $1 FROM t2; > END; > > The below sql output created by pg_dump refers to t1_1 which should > have been t1: > CREATE FUNCTION public.test_fun(c1 integer) RETURNS void > LANGUAGE sql > BEGIN ATOMIC > WITH delete_t1 AS ( > DELETE FROM public.t1 > WHERE (t1_1.c1 = test_fun.c1) > ) > INSERT INTO public.t1 (c1) SELECT test_fun.c1 > FROM public.t2; > END; > > pg_get_function_sqlbody also returns similar result: > select proname, pg_get_function_sqlbody(oid) from pg_proc where > proname = 'test_fun'; > proname | pg_get_function_sqlbody > ----------+------------------------------------------- > test_fun | BEGIN ATOMIC + > | WITH delete_t1 AS ( + > | DELETE FROM t1 + > | WHERE (t1_1.c1 = test_fun.c1) + > | ) + > | INSERT INTO t1 (c1) SELECT test_fun.c1+ > | FROM t2; + > | END > (1 row) Thanks for reproducing and demonstrating that this was more generally applicable. For context, this was initially discovered when testing the DDL replication patch[1] under that context. > I felt the problem here is with set_rtable_names function which > changes the relation name t1 to t1_1 while parsing the statement: > /* > * If the selected name isn't unique, append digits to make it so, and > * make a new hash entry for it once we've got a unique name. For a > * very long input name, we might have to truncate to stay within > * NAMEDATALEN. > */ > > During the query generation we will set the table names before > generating each statement, in our case the table t1 would have been > added already to the hash table during the first insert statement > generation. Next time it will try to set the relation names again for > the next statement, i.e delete statement, if the entry with same name > already exists, it will change the name to t1_1 by appending a digit > to keep the has entry unique. Good catch. Do you have thoughts on how we can adjust the naming logic to handle cases like this? Jonathan [1] https://www.postgresql.org/message-id/e947fa21-24b2-f922-375a-d4f763ef3e4b%40postgresql.org
Вложения
"Jonathan S. Katz" <jkatz@postgresql.org> writes: > Good catch. Do you have thoughts on how we can adjust the naming logic > to handle cases like this? I think it's perfectly fine that ruleutils decided to use different aliases for the two different occurrences of "t1": the statement is quite confusing as written. The problem probably is that get_delete_query_def() has no idea that it's supposed to print the adjusted alias just after "DELETE FROM tab". UPDATE likely has same issue ... maybe INSERT too? regards, tom lane
Re: The output sql generated by pg_dump for a create function refers to a modified table name
От
"Jonathan S. Katz"
Дата:
On 2/17/23 10:09 AM, Tom Lane wrote: > "Jonathan S. Katz" <jkatz@postgresql.org> writes: >> Good catch. Do you have thoughts on how we can adjust the naming logic >> to handle cases like this? > > I think it's perfectly fine that ruleutils decided to use different > aliases for the two different occurrences of "t1": the statement is > quite confusing as written. Agreed on that -- while it's harder to set up, I do prefer the original example[1] to demonstrate this, as it shows the issue given it does not have those multiple occurrences, at least not within the same context, i.e.: CREATE OR REPLACE FUNCTION public.calendar_manage(room_id int, calendar_date date) RETURNS void LANGUAGE SQL BEGIN ATOMIC WITH delete_calendar AS ( DELETE FROM calendar WHERE room_id = $1 AND calendar_date = $2 ) INSERT INTO calendar (room_id, status, calendar_date, calendar_range) SELECT $1, c.status, $2, c.calendar_range FROM calendar_generate_calendar($1, tstzrange($2, $2 + 1)) c; END; the table prefixes on the attributes within the DELETE statement were ultimately mangled: WITH delete_calendar AS ( DELETE FROM public.calendar WHERE ((calendar_1.room_id OPERATOR(pg_catalog.=) calendar_manage.room_id) AND (calendar_1.calendar_date OPERATOR(pg_catalog.=) calendar_manage.calendar_date)) ) INSERT INTO public.calendar (room_id, status, calendar_date, calendar_range) > The problem probably is that > get_delete_query_def() has no idea that it's supposed to print the > adjusted alias just after "DELETE FROM tab". UPDATE likely has same > issue ... maybe INSERT too? Maybe? I modified the function above to do an INSERT/UPDATE instead of a DELETE but I did not get any errors. However, if the logic is similar there could be an issue there. Thanks, Jonathan [1] https://www.postgresql.org/message-id/e947fa21-24b2-f922-375a-d4f763ef3e4b%40postgresql.org
Вложения
Re: The output sql generated by pg_dump for a create function refers to a modified table name
От
"Jonathan S. Katz"
Дата:
On 2/17/23 11:19 AM, Jonathan S. Katz wrote: > On 2/17/23 10:09 AM, Tom Lane wrote: > Agreed on that -- while it's harder to set up, I do prefer the original > example[1] to demonstrate this, as it shows the issue given it does not > have those multiple occurrences, at least not within the same context, > i.e.: > > CREATE OR REPLACE FUNCTION public.calendar_manage(room_id int, > calendar_date date) > RETURNS void > LANGUAGE SQL > BEGIN ATOMIC > WITH delete_calendar AS ( > DELETE FROM calendar > WHERE > room_id = $1 AND > calendar_date = $2 > ) > INSERT INTO calendar (room_id, status, calendar_date, calendar_range) > SELECT $1, c.status, $2, c.calendar_range > FROM calendar_generate_calendar($1, tstzrange($2, $2 + 1)) c; > END; > >> The problem probably is that >> get_delete_query_def() has no idea that it's supposed to print the >> adjusted alias just after "DELETE FROM tab". UPDATE likely has same >> issue ... maybe INSERT too? > > Maybe? I modified the function above to do an INSERT/UPDATE instead of a > DELETE but I did not get any errors. However, if the logic is similar > there could be an issue there. I spoke too soon -- I was looking at the wrong logs. I did reproduce it with UPDATE, but not INSERT. The example I used for UPDATE: CREATE OR REPLACE FUNCTION public.calendar_manage(room_id int, calendar_date date) RETURNS void LANGUAGE SQL BEGIN ATOMIC WITH update_calendar AS ( UPDATE calendar SET room_id = $1 WHERE room_id = $1 AND calendar_date = $2 ) INSERT INTO calendar (room_id, status, calendar_date, calendar_range) SELECT $1, c.status, $2, c.calendar_range FROM calendar_generate_calendar($1, tstzrange($2, $2 + 1)) c; END; which produced: WITH update_calendar AS ( UPDATE public.calendar SET room_id = calendar_manage.room_id WHERE ( (calendar_1.room_id OPERATOR(pg_catalog.=) calendar_manage.room_id) AND (calendar_1.calendar_date OPERATOR(pg_catalog.=) calendar_manage.calendar_date)) ) INSERT INTO public.calendar (room_id, status, calendar_date, calendar_range) SELECT calendar_manage.room_id, c.status, calendar_manage.calendar_date, c.calendar_range FROM public.calendar_generate_calendar(calendar_manage.room_id, pg_catalog.tstzrange((calendar_manage.calendar_date)::timestamp with time zone, ((calendar_manage.calendar_date OPERATOR(pg_catalog.+) 1))::timestamp with time zone)) c(status, calendar_range); Thanks, Jonathan
Вложения
"Jonathan S. Katz" <jkatz@postgresql.org> writes: > I spoke too soon -- I was looking at the wrong logs. I did reproduce it > with UPDATE, but not INSERT. It can be reproduced with INSERT too, on the same principle as the others: put the DML command inside a WITH, and give it an alias conflicting with the outer query. Being a lazy sort, I tried to collapse all three cases into a single test case, and observed something I hadn't thought of: we disambiguate aliases in a WITH query with respect to the outer query, but not with respect to other WITH queries. This makes the example (see attached) a bit more confusing than I would have hoped. However, the same sort of thing happens within other kinds of nested subqueries, so I think it's probably all right as-is. In any case, changing this aspect would require a significantly bigger patch with more risk of unwanted side-effects. To fix it, I pulled out the print-an-alias logic within get_from_clause_item and called that new function for INSERT/UPDATE/DELETE. This is a bit of overkill perhaps, because only the RTE_RELATION case can be needed by these other callers, but it seemed like a sane refactorization. I've not tested, but I imagine this will need patched all the way back. The rule case should be reachable in all supported versions. regards, tom lane diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index 9ac42efdbc..6dc117dea8 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -475,6 +475,8 @@ static void get_from_clause(Query *query, const char *prefix, deparse_context *context); static void get_from_clause_item(Node *jtnode, Query *query, deparse_context *context); +static void get_rte_alias(RangeTblEntry *rte, int varno, bool use_as, + deparse_context *context); static void get_column_alias_list(deparse_columns *colinfo, deparse_context *context); static void get_from_clause_coldeflist(RangeTblFunction *rtfunc, @@ -6648,12 +6650,14 @@ get_insert_query_def(Query *query, deparse_context *context, context->indentLevel += PRETTYINDENT_STD; appendStringInfoChar(buf, ' '); } - appendStringInfo(buf, "INSERT INTO %s ", + appendStringInfo(buf, "INSERT INTO %s", generate_relation_name(rte->relid, NIL)); - /* INSERT requires AS keyword for target alias */ - if (rte->alias != NULL) - appendStringInfo(buf, "AS %s ", - quote_identifier(rte->alias->aliasname)); + + /* Print the relation alias, if needed; INSERT requires explicit AS */ + get_rte_alias(rte, query->resultRelation, true, context); + + /* always want a space here */ + appendStringInfoChar(buf, ' '); /* * Add the insert-column-names list. Any indirection decoration needed on @@ -6835,9 +6839,10 @@ get_update_query_def(Query *query, deparse_context *context, appendStringInfo(buf, "UPDATE %s%s", only_marker(rte), generate_relation_name(rte->relid, NIL)); - if (rte->alias != NULL) - appendStringInfo(buf, " %s", - quote_identifier(rte->alias->aliasname)); + + /* Print the relation alias, if needed */ + get_rte_alias(rte, query->resultRelation, false, context); + appendStringInfoString(buf, " SET "); /* Deparse targetlist */ @@ -7043,9 +7048,9 @@ get_delete_query_def(Query *query, deparse_context *context, appendStringInfo(buf, "DELETE FROM %s%s", only_marker(rte), generate_relation_name(rte->relid, NIL)); - if (rte->alias != NULL) - appendStringInfo(buf, " %s", - quote_identifier(rte->alias->aliasname)); + + /* Print the relation alias, if needed */ + get_rte_alias(rte, query->resultRelation, false, context); /* Add the USING clause if given */ get_from_clause(query, " USING ", context); @@ -10887,10 +10892,8 @@ get_from_clause_item(Node *jtnode, Query *query, deparse_context *context) { int varno = ((RangeTblRef *) jtnode)->rtindex; RangeTblEntry *rte = rt_fetch(varno, query->rtable); - char *refname = get_rtable_name(varno, context); deparse_columns *colinfo = deparse_columns_fetch(varno, dpns); RangeTblFunction *rtfunc1 = NULL; - bool printalias; if (rte->lateral) appendStringInfoString(buf, "LATERAL "); @@ -11027,59 +11030,7 @@ get_from_clause_item(Node *jtnode, Query *query, deparse_context *context) } /* Print the relation alias, if needed */ - printalias = false; - if (rte->alias != NULL) - { - /* Always print alias if user provided one */ - printalias = true; - } - else if (colinfo->printaliases) - { - /* Always print alias if we need to print column aliases */ - printalias = true; - } - else if (rte->rtekind == RTE_RELATION) - { - /* - * No need to print alias if it's same as relation name (this - * would normally be the case, but not if set_rtable_names had to - * resolve a conflict). - */ - if (strcmp(refname, get_relation_name(rte->relid)) != 0) - printalias = true; - } - else if (rte->rtekind == RTE_FUNCTION) - { - /* - * For a function RTE, always print alias. This covers possible - * renaming of the function and/or instability of the - * FigureColname rules for things that aren't simple functions. - * Note we'd need to force it anyway for the columndef list case. - */ - printalias = true; - } - else if (rte->rtekind == RTE_SUBQUERY || - rte->rtekind == RTE_VALUES) - { - /* - * For a subquery, always print alias. This makes the output SQL - * spec-compliant, even though we allow the alias to be omitted on - * input. - */ - printalias = true; - } - else if (rte->rtekind == RTE_CTE) - { - /* - * No need to print alias if it's same as CTE name (this would - * normally be the case, but not if set_rtable_names had to - * resolve a conflict). - */ - if (strcmp(refname, rte->ctename) != 0) - printalias = true; - } - if (printalias) - appendStringInfo(buf, " %s", quote_identifier(refname)); + get_rte_alias(rte, varno, false, context); /* Print the column definitions or aliases, if needed */ if (rtfunc1 && rtfunc1->funccolnames != NIL) @@ -11217,6 +11168,77 @@ get_from_clause_item(Node *jtnode, Query *query, deparse_context *context) (int) nodeTag(jtnode)); } +/* + * get_rte_alias - print the relation's alias, if needed + * + * If printed, the alias is preceded by a space, or by " AS " if use_as is true. + */ +static void +get_rte_alias(RangeTblEntry *rte, int varno, bool use_as, + deparse_context *context) +{ + deparse_namespace *dpns = (deparse_namespace *) linitial(context->namespaces); + char *refname = get_rtable_name(varno, context); + deparse_columns *colinfo = deparse_columns_fetch(varno, dpns); + bool printalias = false; + + if (rte->alias != NULL) + { + /* Always print alias if user provided one */ + printalias = true; + } + else if (colinfo->printaliases) + { + /* Always print alias if we need to print column aliases */ + printalias = true; + } + else if (rte->rtekind == RTE_RELATION) + { + /* + * No need to print alias if it's same as relation name (this would + * normally be the case, but not if set_rtable_names had to resolve a + * conflict). + */ + if (strcmp(refname, get_relation_name(rte->relid)) != 0) + printalias = true; + } + else if (rte->rtekind == RTE_FUNCTION) + { + /* + * For a function RTE, always print alias. This covers possible + * renaming of the function and/or instability of the FigureColname + * rules for things that aren't simple functions. Note we'd need to + * force it anyway for the columndef list case. + */ + printalias = true; + } + else if (rte->rtekind == RTE_SUBQUERY || + rte->rtekind == RTE_VALUES) + { + /* + * For a subquery, always print alias. This makes the output + * SQL-spec-compliant, even though we allow such aliases to be omitted + * on input. + */ + printalias = true; + } + else if (rte->rtekind == RTE_CTE) + { + /* + * No need to print alias if it's same as CTE name (this would + * normally be the case, but not if set_rtable_names had to resolve a + * conflict). + */ + if (strcmp(refname, rte->ctename) != 0) + printalias = true; + } + + if (printalias) + appendStringInfo(context->buf, "%s%s", + use_as ? " AS " : " ", + quote_identifier(refname)); +} + /* * get_column_alias_list - print column alias list for an RTE * diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out index 174b725fff..a3a5a62329 100644 --- a/src/test/regress/expected/rules.out +++ b/src/test/regress/expected/rules.out @@ -2998,28 +2998,21 @@ select * from rules_log; (16 rows) create rule r4 as on delete to rules_src do notify rules_src_deletion; -\d+ rules_src - Table "public.rules_src" - Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ---------+---------+-----------+----------+---------+---------+--------------+------------- - f1 | integer | | | | plain | | - f2 | integer | | | 0 | plain | | -Rules: - r1 AS - ON UPDATE TO rules_src DO INSERT INTO rules_log (f1, f2, tag, id) VALUES (old.f1,old.f2,'old'::text,DEFAULT), (new.f1,new.f2,'new'::text,DEFAULT) - r2 AS - ON UPDATE TO rules_src DO VALUES (old.f1,old.f2,'old'::text), (new.f1,new.f2,'new'::text) - r3 AS - ON INSERT TO rules_src DO INSERT INTO rules_log (f1, f2, tag, id) VALUES (NULL::integer,NULL::integer,'-'::text,DEFAULT),(new.f1,new.f2,'new'::text,DEFAULT) - r4 AS - ON DELETE TO rules_src DO - NOTIFY rules_src_deletion - -- -- Ensure an aliased target relation for insert is correctly deparsed. -- create rule r5 as on insert to rules_src do instead insert into rules_log AS trgt SELECT NEW.* RETURNING trgt.f1, trgt.f2; create rule r6 as on update to rules_src do instead UPDATE rules_log AS trgt SET tag = 'updated' WHERE trgt.f1 = new.f1; +-- +-- Check deparse disambiguation of INSERT/UPDATE/DELETE targets. +-- +create rule r7 as on delete to rules_src do instead + with wins as (insert into int4_tbl as trgt values (0) returning *), + wupd as (update int4_tbl trgt set f1 = f1+1 returning *), + wdel as (delete from int4_tbl trgt where f1 = 0 returning *) + insert into rules_log AS trgt select old.* from wins, wupd, wdel + returning trgt.f1, trgt.f2; +-- check display of all rules added above \d+ rules_src Table "public.rules_src" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description @@ -3044,6 +3037,26 @@ Rules: r6 AS ON UPDATE TO rules_src DO INSTEAD UPDATE rules_log trgt SET tag = 'updated'::text WHERE trgt.f1 = new.f1 + r7 AS + ON DELETE TO rules_src DO INSTEAD WITH wins AS ( + INSERT INTO int4_tbl AS trgt_1 (f1) + VALUES (0) + RETURNING trgt_1.f1 + ), wupd AS ( + UPDATE int4_tbl trgt_1 SET f1 = trgt_1.f1 + 1 + RETURNING trgt_1.f1 + ), wdel AS ( + DELETE FROM int4_tbl trgt_1 + WHERE trgt_1.f1 = 0 + RETURNING trgt_1.f1 + ) + INSERT INTO rules_log AS trgt (f1, f2) SELECT old.f1, + old.f2 + FROM wins, + wupd, + wdel + RETURNING trgt.f1, + trgt.f2 -- -- Also check multiassignment deparsing. diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql index 1f858129b8..ac1a4ce554 100644 --- a/src/test/regress/sql/rules.sql +++ b/src/test/regress/sql/rules.sql @@ -1015,13 +1015,24 @@ insert into rules_src values(22,23), (33,default); select * from rules_src; select * from rules_log; create rule r4 as on delete to rules_src do notify rules_src_deletion; -\d+ rules_src -- -- Ensure an aliased target relation for insert is correctly deparsed. -- create rule r5 as on insert to rules_src do instead insert into rules_log AS trgt SELECT NEW.* RETURNING trgt.f1, trgt.f2; create rule r6 as on update to rules_src do instead UPDATE rules_log AS trgt SET tag = 'updated' WHERE trgt.f1 = new.f1; + +-- +-- Check deparse disambiguation of INSERT/UPDATE/DELETE targets. +-- +create rule r7 as on delete to rules_src do instead + with wins as (insert into int4_tbl as trgt values (0) returning *), + wupd as (update int4_tbl trgt set f1 = f1+1 returning *), + wdel as (delete from int4_tbl trgt where f1 = 0 returning *) + insert into rules_log AS trgt select old.* from wins, wupd, wdel + returning trgt.f1, trgt.f2; + +-- check display of all rules added above \d+ rules_src --
Re: The output sql generated by pg_dump for a create function refers to a modified table name
От
"Jonathan S. Katz"
Дата:
On 2/17/23 1:18 PM, Tom Lane wrote: > It can be reproduced with INSERT too, on the same principle as the others: > put the DML command inside a WITH, and give it an alias conflicting with > the outer query. Ah, I see based on your example below. I did not alias the INSERT statement in the way (and I don't know how common of a pattern it is to o that). > Being a lazy sort, I tried to collapse all three cases into a single > test case, and observed something I hadn't thought of: we disambiguate > aliases in a WITH query with respect to the outer query, but not with > respect to other WITH queries. This makes the example (see attached) > a bit more confusing than I would have hoped. However, the same sort > of thing happens within other kinds of nested subqueries, so I think > it's probably all right as-is. In any case, changing this aspect > would require a significantly bigger patch with more risk of unwanted > side-effects. I think I agree. Most people should not be looking at the disambiguated statements unless they are troubleshooting an issue (such as $SUBJECT). The main goal is to disambiguate correctly. > To fix it, I pulled out the print-an-alias logic within > get_from_clause_item and called that new function for > INSERT/UPDATE/DELETE. This is a bit of overkill perhaps, because > only the RTE_RELATION case can be needed by these other callers, but > it seemed like a sane refactorization. > > I've not tested, but I imagine this will need patched all the way back. > The rule case should be reachable in all supported versions. I tested this against HEAD (+v69 of the DDL replication patch). My cases are now all passing. The code looks good to me -- I don't know if moving that logic is overkill, but it makes the solution relatively clean. I didn't test in any back branches yet, but given this can generate an invalid function body, it does likely need to be backpatched. Thanks, Jonathan
Вложения
"Jonathan S. Katz" <jkatz@postgresql.org> writes: > On 2/17/23 1:18 PM, Tom Lane wrote: >> It can be reproduced with INSERT too, on the same principle as the others: >> put the DML command inside a WITH, and give it an alias conflicting with >> the outer query. > Ah, I see based on your example below. I did not alias the INSERT > statement in the way (and I don't know how common of a pattern it is to > o that). I suppose you can also make examples where the true name of the DML target table conflicts with an outer-query name, implying that we need to give it an alias even though the user wrote none. > I tested this against HEAD (+v69 of the DDL replication patch). My cases > are now all passing. > The code looks good to me -- I don't know if moving that logic is > overkill, but it makes the solution relatively clean. Cool, thanks for testing and code-reading. I'll go see about back-patching. > I didn't test in any back branches yet, but given this can generate an > invalid function body, it does likely need to be backpatched. Presumably it can also cause dump/restore failures for rules that do this sort of thing, though admittedly those wouldn't be common. regards, tom lane