Обсуждение: misleading error message in ProcessUtilitySlow T_CreateStatsStmt
hi. while reviewing other work, some error messages in src/backend/tcop/utility.c seem not accurate. static void ProcessUtilitySlow(ParseState *pstate, PlannedStmt *pstmt, const char *queryString, ProcessUtilityContext context, ParamListInfo params, QueryEnvironment *queryEnv, DestReceiver *dest, QueryCompletion *qc) case T_CreateStatsStmt: { Oid relid; CreateStatsStmt *stmt = (CreateStatsStmt *) parsetree; RangeVar *rel = (RangeVar *) linitial(stmt->relations); if (!IsA(rel, RangeVar)) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("only a single relation is allowed in CREATE STATISTICS"))); relid = RangeVarGetRelid(rel, ShareUpdateExclusiveLock, false); /* Run parse analysis ... */ stmt = transformStatsStmt(relid, stmt, queryString); address = CreateStatistics(stmt); } for example: create or replace function tftest(int) returns table(a int, b int) as $$ begin return query select $1, $1+i from generate_series(1,5) g(i); end; $$ language plpgsql immutable strict; CREATE STATISTICS alt_stat2 ON a, b FROM tftest(1); ERROR: only a single relation is allowed in CREATE STATISTICS this error message seem misleading? also this error check seems duplicated in CreateStatistics?
On Thu, 21 Aug 2025 at 17:00, jian he <jian.universality@gmail.com> wrote: > > hi. Hi! > RangeVar *rel = (RangeVar *) linitial(stmt->relations); > if (!IsA(rel, RangeVar)) These two lines are weird. Looks like linitial(stmt->relations) should be assigned to variable with type Node* first? > > for example: > > create or replace function tftest(int) returns table(a int, b int) as $$ > begin > return query select $1, $1+i from generate_series(1,5) g(i); > end; > $$ language plpgsql immutable strict; > > CREATE STATISTICS alt_stat2 ON a, b FROM tftest(1); > ERROR: only a single relation is allowed in CREATE STATISTICS > > this error message seem misleading? I wouldn’t say this is misleading, but " a single relation" is indeed not precise enough. IMO we need a more precise term to distinguish regular relation and table func. > also this error check seems duplicated in CreateStatistics? > Indeed. -- Best regards, Kirill Reshke
Kirill Reshke <reshkekirill@gmail.com> writes: > On Thu, 21 Aug 2025 at 17:00, jian he <jian.universality@gmail.com> wrote: >> RangeVar *rel = (RangeVar *) linitial(stmt->relations); >> if (!IsA(rel, RangeVar)) > These two lines are weird. Looks like linitial(stmt->relations) > should be assigned to variable with type Node* first? We take that sort of shortcut in many places. If there's not any need for the code to deal with more than one node type, an extra variable that's used just for the IsA test seems like a lot of notational overhead. regards, tom lane
On 2025-Aug-21, Kirill Reshke wrote: > I wouldn’t say this is misleading, but " a single relation" is indeed > not precise enough. IMO we need a more precise term to distinguish > regular relation and table func. I'm not sure. See the definition of relation in the glossary: https://www.postgresql.org/docs/18/glossary.html#GLOSSARY-RELATION The generic term for all objects in a database that have a name and a list of attributes defined in a specific order. Tables, sequences, views, foreign tables, materialized views, composite types, and indexes are all relations. More generically, a relation is a set of tuples; for example, the result of a query is also a relation. In PostgreSQL, Class is an archaic synonym for relation. (I wonder why this says "generically" rather than "generally". Is that word choice a mistake?) Maybe in the "For example" clause we can also mention table functions. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
On Fri, 22 Aug 2025 at 14:46, Álvaro Herrera <alvherre@kurilemu.de> wrote: > > On 2025-Aug-21, Kirill Reshke wrote: > > > I wouldn’t say this is misleading, but " a single relation" is indeed > > not precise enough. IMO we need a more precise term to distinguish > > regular relation and table func. > > I'm not sure. See the definition of relation in the glossary: > https://www.postgresql.org/docs/18/glossary.html#GLOSSARY-RELATION > > The generic term for all objects in a database that have a name and a > list of attributes defined in a specific order. Tables, sequences, > views, foreign tables, materialized views, composite types, and > indexes are all relations. > > More generically, a relation is a set of tuples; for example, the > result of a query is also a relation. > > In PostgreSQL, Class is an archaic synonym for relation. > > (I wonder why this says "generically" rather than "generally". Is that > word choice a mistake?) Maybe in the "For example" clause we can also > mention table functions. > > -- > Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ I am sorry: I am not following. CREATE STATISTIC will work only for single HEAP (or other AM) relations. So, for "simple regular tables" as one can say (not tablefunc). You say: relation is a term for both HEAP relation, tablefunc relation and much more. I say: "a single relation" in the error message is not precise enough. Where do we disagree? Anyway, I would say correct error message here should be: ``` db=# CREATE STATISTICS alt_stat2 ON a, b FROM tftest(1); ERROR: cannot define statistics for relation "alt_stat2" DETAIL: This operation is not supported for query result. ``` -- Best regards, Kirill Reshke
hi. will + if (!IsA(rln, RangeVar)) + ereport(ERROR, + errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("CREATE STATISTICS only supports regular tables, materialized views, foreign tables, and partitioned tables")); solve the problem?
On Fri, Aug 22, 2025 at 6:46 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: > I'm not sure. See the definition of relation in the glossary: > https://www.postgresql.org/docs/18/glossary.html#GLOSSARY-RELATION > > The generic term for all objects in a database that have a name and a > list of attributes defined in a specific order. Tables, sequences, > views, foreign tables, materialized views, composite types, and > indexes are all relations. > > More generically, a relation is a set of tuples; for example, the > result of a query is also a relation. > > In PostgreSQL, Class is an archaic synonym for relation. > > (I wonder why this says "generically" rather than "generally". Is that > word choice a mistake?) "More generally" feels more natural to me than "more generically" in this sentence, but I'm not a native English speaker, so I could be wrong. Thanks Richard
=?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@kurilemu.de> writes: > I'm not sure. See the definition of relation in the glossary: > https://www.postgresql.org/docs/18/glossary.html#GLOSSARY-RELATION > The generic term for all objects in a database that have a name and a > list of attributes defined in a specific order. Tables, sequences, > views, foreign tables, materialized views, composite types, and > indexes are all relations. > More generically, a relation is a set of tuples; for example, the > result of a query is also a relation. > In PostgreSQL, Class is an archaic synonym for relation. > (I wonder why this says "generically" rather than "generally". Is that > word choice a mistake?) Maybe in the "For example" clause we can also > mention table functions. Yeah, I think s/generically/generally/ would be an improvement. I'm not certain, but I think that our use of "relation" to mean "an object with a pg_class entry" is a Postgres-ism. I do know that the meaning of "a set of tuples" is widely used, as that's where the term "relational database" comes from. Maybe whoever wrote this was trying to get at that point? But this text is hardly clear about that. regards, tom lane
On 2025-Aug-22, Kirill Reshke wrote: > I am sorry: I am not following. CREATE STATISTIC will work only for > single HEAP (or other AM) relations. So, for "simple regular tables" > as one can say (not tablefunc). Ah yeah, you're right, I think we should say "a single table", or maybe "a single table or materialized view" if the latter case is supported (I don't remember offhand if it is). The error message is mostly concerned with rejecting the case of multiple relations being given in the FROM clause, because we said at the time that we needed to make the grammar flexible enough to support that case, but make it clear that it wasn't implemented yet. (It's a pity that we haven't implemented that thus far ... I suppose it must be a difficult problem.) > DETAIL: This operation is not supported for query result. I would say "This operation is only supported for tables [and matviews]." We don't need to list all the things that aren't supported, I think. But your example here is wrong: > db=# CREATE STATISTICS alt_stat2 ON a, b FROM tftest(1); > ERROR: cannot define statistics for relation "alt_stat2" > DETAIL: This operation is not supported for query result. It's not alt_stat2 that's the problem (that's the extstats object being created), but tftest(1). I don't know if we need to specify that the problem is in the FROM clause here -- seems obvious enough to me -- but maybe someone opines differently? ERROR: cannot define statistics for relation "tftest" DETAIL: The FROM clause may only contain a single table. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "Si quieres ser creativo, aprende el arte de perder el tiempo"
On 2025-Aug-22, Tom Lane wrote: > I'm not certain, but I think that our use of "relation" to mean > "an object with a pg_class entry" is a Postgres-ism. I do know > that the meaning of "a set of tuples" is widely used, as that's > where the term "relational database" comes from. Maybe whoever > wrote this was trying to get at that point? But this text is > hardly clear about that. Yeah, AFAIR I wrote it (or at least heavily edited the original to get to that), and I was trying to convey exactly those two ideas. If you want to propose improvements, they're very welcome. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "Puedes vivir sólo una vez, pero si lo haces bien, una vez es suficiente"
=?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@kurilemu.de> writes: > Yeah, AFAIR I wrote it (or at least heavily edited the original to get > to that), and I was trying to convey exactly those two ideas. If you > want to propose improvements, they're very welcome. Hmmm ... maybe something like Mathematically, a "relation" is a set of tuples; this is the sense meant in the term "relational database". In Postgres, "relation" is commonly used to mean a database object that has a name and a list of attributes defined in a specific order. Tables, sequences, views, foreign tables, materialized views, composite types, and indexes are all relations. A relation in this sense is a container or descriptor for a set of tuples. "Class" is an alternative but archaic term. The system catalog pg_class holds an entry for each Postgres relation. regards, tom lane
On 2025-Aug-22, Tom Lane wrote: > Hmmm ... maybe something like > > Mathematically, a "relation" is a set of tuples; this is the sense > meant in the term "relational database". > > In Postgres, "relation" is commonly used to mean a database object > that has a name and a list of attributes defined in a specific > order. Tables, sequences, views, foreign tables, materialized > views, composite types, and indexes are all relations. A relation > in this sense is a container or descriptor for a set of tuples. > > "Class" is an alternative but archaic term. The system catalog > pg_class holds an entry for each Postgres relation. Thanks, pushed like that. I changed "a database object" to "an SQL object", because that's a term we have a definition for. (I also wrote "PostgreSQL" where you had "Postgres". I think it might be okay now to change the product name in various places here, but it seems better to do it consistently across the whole page.) -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "¿Qué importan los años? Lo que realmente importa es comprobar que a fin de cuentas la mejor edad de la vida es estar vivo" (Mafalda)
=?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@kurilemu.de> writes: > Thanks, pushed like that. I changed "a database object" to "an SQL > object", because that's a term we have a definition for. > (I also wrote "PostgreSQL" where you had "Postgres". I think it might > be okay now to change the product name in various places here, but it > seems better to do it consistently across the whole page.) LGTM, thanks. (My wording was only meant as a draft ...) regards, tom lane
On 2025-Aug-21, jian he wrote: > case T_CreateStatsStmt: > { > Oid relid; > CreateStatsStmt *stmt = (CreateStatsStmt *) parsetree; > RangeVar *rel = (RangeVar *) linitial(stmt->relations); > > if (!IsA(rel, RangeVar)) > ereport(ERROR, > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > errmsg("only a single relation is > allowed in CREATE STATISTICS"))); > relid = RangeVarGetRelid(rel, > ShareUpdateExclusiveLock, false); > /* Run parse analysis ... */ > stmt = transformStatsStmt(relid, stmt, queryString); > address = CreateStatistics(stmt); > } Yeah, I think there's room to argue that this ereport() is just wrongly copy-and-pasted from other nearby errors, and that it should have completely different wording. BTW another way to cause this is CREATE STATISTICS alt_stat2 ON a, b FROM xmltable('foo' passing 'bar' columns a text); I propose to use this report: ERROR: cannot create statistics on specified relation DETAIL: CREATE STATISTICS only supports tables, materialized views, foreign tables, and partitioned tables. (Not sold on saying just "tables" vs. "regular tables" or "plain tables"; thoughts?) While looking at this whole thing, I noticed that this code is somewhat bogus in a different way: we're resolving the relation name twice, both here in ProcessUtilitySlow() (to pass to transformStatsStmt), and later again inside CreateStatistics(). It's really strange that we decided not to pass the relation Oids as a list argument to CreateStatistics. I suggest to change that as the first attached patch. Now, such a change would be appropriate only for branch master; it seems too invasive for stable ones. For them I propose we only change the error message, as in the other attachment. We should add a couple of test cases for this particular error message also. It's bad that these changes don't break any tests. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Once again, thank you and all of the developers for your hard work on PostgreSQL. This is by far the most pleasant management experience of any database I've worked on." (Dan Harris) http://archives.postgresql.org/pgsql-performance/2006-04/msg00247.php
Вложения
=?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@kurilemu.de> writes: > I propose to use this report: > ERROR: cannot create statistics on specified relation > DETAIL: CREATE STATISTICS only supports tables, materialized views, foreign tables, and partitioned tables. WFM, although I think you could shorten it to "tables, materialized views, and foreign tables". We generally expect that partitioned tables are included when saying "tables", no? I'm not dead set on that either way, though. > While looking at this whole thing, I noticed that this code is somewhat > bogus in a different way: we're resolving the relation name twice, both > here in ProcessUtilitySlow() (to pass to transformStatsStmt), and later > again inside CreateStatistics(). It's really strange that we decided > not to pass the relation Oids as a list argument to CreateStatistics. I > suggest to change that as the first attached patch. I think this is a good idea, but I'm not sure that the locking considerations are straight in this version. Once we translate a relation name to OID, we'd better hold lock on the rel continuously to the end of usage of that OID. It looks like you do, but then couldn't the + rel = relation_open(relid, ShareUpdateExclusiveLock); in CreateStatistics use NoLock to signify that we expect to have the lock already? Alternatively, maybe the right fix is to move the parse-analysis work into CreateStatistics altogether. I think there is not a very good argument for ProcessUtilitySlow doing all that stuff. It's supposed to mainly just be a dispatching switch(), IMO. regards, tom lane
On Fri, Aug 29, 2025 at 5:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > =?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@kurilemu.de> writes: > > I propose to use this report: > > > ERROR: cannot create statistics on specified relation > > DETAIL: CREATE STATISTICS only supports tables, materialized views, foreign tables, and partitioned tables. > > WFM, although I think you could shorten it to "tables, materialized > views, and foreign tables". We generally expect that partitioned > tables are included when saying "tables", no? I'm not dead set on > that either way, though. > https://www.postgresql.org/docs/current/sql-copy.html use "COPY TO can be used only with plain tables, not views, and does not copy rows from child tables or child partitions" > > Alternatively, maybe the right fix is to move the parse-analysis > work into CreateStatistics altogether. I think there is not a > very good argument for ProcessUtilitySlow doing all that stuff. > It's supposed to mainly just be a dispatching switch(), IMO. > seems doable. transformStatsStmt, CreateStatistics both used only twice, refactoring arguments should be fine. please check the attached POC, regress tests also added. main idea is first check CreateStatsStmt->relations, then call transformStatsStmt, transformStatsStmt only to transform CreateStatsStmt->exprs. also the above complaint about the relation lock issue will be resolved.
Вложения
On 2025-Aug-29, jian he wrote: > On Fri, Aug 29, 2025 at 5:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > WFM, although I think you could shorten it to "tables, materialized > > views, and foreign tables". We generally expect that partitioned > > tables are included when saying "tables", no? I'm not dead set on > > that either way, though. > > https://www.postgresql.org/docs/current/sql-copy.html > use "COPY TO can be used only with plain tables, not views, and does > not copy rows from child tables or child partitions" I'm inclined to think that we should only mention partitioned tables specifically when they for some reason deviate from what we do for regular tables, i.e., what Tom is saying. I don't think we've had an explicit, consistent rule for that thus far, so there may be places where we fail to follow it. Anyway, I have pushed the error message change. > > Alternatively, maybe the right fix is to move the parse-analysis > > work into CreateStatistics altogether. I think there is not a > > very good argument for ProcessUtilitySlow doing all that stuff. > > It's supposed to mainly just be a dispatching switch(), IMO. > > seems doable. > transformStatsStmt, CreateStatistics both used only twice, refactoring > arguments should be fine. > please check the attached POC, regress tests also added. Yeah, I like how this turned out. I found out this was introduced in commit a4d75c86bf15. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On Fri, Aug 29, 2025 at 8:48 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: > > > > Alternatively, maybe the right fix is to move the parse-analysis > > > work into CreateStatistics altogether. I think there is not a > > > very good argument for ProcessUtilitySlow doing all that stuff. > > > It's supposed to mainly just be a dispatching switch(), IMO. > > > > seems doable. > > transformStatsStmt, CreateStatistics both used only twice, refactoring > > arguments should be fine. > > please check the attached POC, regress tests also added. > > Yeah, I like how this turned out. I found out this was introduced in > commit a4d75c86bf15. Previously, CreateStatistics and ProcessUtilitySlow opened the same relation twice with a ShareUpdateExclusiveLock. This refactor ensures the relation is opened with ShareUpdateExclusiveLock only once and also reduces redundant error checks. The logic is now more intuitive: we first error checking CreateStatsStmt->relations and then call transformStatsStmt to parse analysis CreateStatsStmt->exprs. please check the attached refactor CreateStatsStmt.
Вложения
On 29.08.25 14:48, Álvaro Herrera wrote: > On 2025-Aug-29, jian he wrote: > >> On Fri, Aug 29, 2025 at 5:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> WFM, although I think you could shorten it to "tables, materialized >>> views, and foreign tables". We generally expect that partitioned >>> tables are included when saying "tables", no? I'm not dead set on >>> that either way, though. >> >> https://www.postgresql.org/docs/current/sql-copy.html >> use "COPY TO can be used only with plain tables, not views, and does >> not copy rows from child tables or child partitions" > > I'm inclined to think that we should only mention partitioned tables > specifically when they for some reason deviate from what we do for > regular tables, i.e., what Tom is saying. I don't think we've had an > explicit, consistent rule for that thus far, so there may be places > where we fail to follow it. > > Anyway, I have pushed the error message change. I think this message is still wrong. The check doesn't even look at the relation kind, which is what the message is implying. (If the message were about relkinds, then it should use errdetail_relkind_not_supported().) It checks that the from list entry is a table name instead of some other thing like VALUES or JOIN. So it should be something like CREATE STATISTICS only supports plain table names in the FROM clause The same could also be accomplished by changing the grammar from FROM from_list to FROM qualified_name_list
On 05.09.25 14:30, Peter Eisentraut wrote: > On 29.08.25 14:48, Álvaro Herrera wrote: >> On 2025-Aug-29, jian he wrote: >> >>> On Fri, Aug 29, 2025 at 5:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >>>> WFM, although I think you could shorten it to "tables, materialized >>>> views, and foreign tables". We generally expect that partitioned >>>> tables are included when saying "tables", no? I'm not dead set on >>>> that either way, though. >>> >>> https://www.postgresql.org/docs/current/sql-copy.html >>> use "COPY TO can be used only with plain tables, not views, and does >>> not copy rows from child tables or child partitions" >> >> I'm inclined to think that we should only mention partitioned tables >> specifically when they for some reason deviate from what we do for >> regular tables, i.e., what Tom is saying. I don't think we've had an >> explicit, consistent rule for that thus far, so there may be places >> where we fail to follow it. >> >> Anyway, I have pushed the error message change. > > I think this message is still wrong. The check doesn't even look at the > relation kind, which is what the message is implying. (If the message > were about relkinds, then it should use > errdetail_relkind_not_supported().) It checks that the from list entry > is a table name instead of some other thing like VALUES or JOIN. So it > should be something like > > CREATE STATISTICS only supports plain table names in the FROM clause I propose the attached patch to fix this. I think this restores the original meaning better.
Вложения
Peter Eisentraut <peter@eisentraut.org> writes: > I propose the attached patch to fix this. I think this restores the > original meaning better. I'm okay with this wording change, but I would stay with ERRCODE_FEATURE_NOT_SUPPORTED rather than calling this a "syntax error". It's not a syntax error IMV, but a potential feature that we have deliberately left syntax space for, even though we don't yet have ideas about a workable implementation. regards, tom lane
On 12.09.25 15:49, Tom Lane wrote: > Peter Eisentraut <peter@eisentraut.org> writes: >> I propose the attached patch to fix this. I think this restores the >> original meaning better. > > I'm okay with this wording change, but I would stay with > ERRCODE_FEATURE_NOT_SUPPORTED rather than calling this > a "syntax error". It's not a syntax error IMV, but a > potential feature that we have deliberately left syntax > space for, even though we don't yet have ideas about > a workable implementation. Ok, done that way.
hi. moving all code in ProcessUtilitySlow (case T_CreateStatsStmt:) to CreateStatistic seems more intuitive to me. so I rebased v3.