Обсуждение: magical eref alias names
Hi, When a query references a normal relation, the RangeTblEntry we construct sets rte->eref->aliasname to either the user-specified alias name, if there is one, or otherwise to the table name. But if there's neither a user-specified alias name nor a table name, then we make up a name. I have two related questions about this. First, why do we do it at all? Second, why do we assign the names as we do? Let me start with the second question. After some surveying of the source code, here's a (possibly-incomplete) list of eref names that we fabricate: old, new, *SELECT*, ANY_subquery, *MERGE*, *RESULT*, *SELECT*, excluded, unnamed_subquery, unnamed_join, *GROUP*, *TLOCRN*, *TROCRN*, *SELECT* %d where %d is an integer, *VALUES*, xmltable, json_table. Of these, old, new, and excluded seem to make total sense -- in certain contexts, the user can actually use those names in SQL expressions to refer to stuff. But the rest, to my knowledge, can't be referenced within the query, so I suppose they're just for display purposes. But that seems like an odd choice, because these artificial names are (1) not unique, which means that if they are referenced anywhere such references are ambiguous; (2) not consistent in terms of capitalization and punctuation, which is questionable for something whose primary purpose is to inform the user; and (3) sometimes incomprehensible -- I can make nothing of *TLOCRN* or *TROCRN*, even after looking at the relevant source code, and I wouldn't know what the distinction is between *SELECT* and *SELECT* %d without looking at the source code. And that brings me to the first question, which is why we're even making up these names at all (with the exceptions of new, old, and excluded, which serve a clear purpose). If we needed them to identify things, that would make sense, but since they're neither unique nor comprehensible, they're not really any good for that. And it seems downright confusing at best, and a source of bugs at worst, to mix together user-supplied names and system-generated names in such a way that the one can't be easily distinguished from the other. We even have regression tests verifying that if the user explicitly enters unnamed_join or unnamed_subquery as an alias name, it doesn't break anything due to confusion with identical alias names that might be generated internally, which seems like good evidence for the proposition that I'm not the first person to worry about this being bug-prone. Granted, there are some cases where these names make their way into EXPLAIN output. For example, this query from the regression tests: select * from int4_tbl o where (f1, f1) in (select f1, generate_series(1,50) / 10 g from int4_tbl i group by f1); produces EXPLAIN output that includes this: -> Subquery Scan on "ANY_subquery" Output: "ANY_subquery".f1, "ANY_subquery".g Filter: ("ANY_subquery".f1 = "ANY_subquery".g) However, that seems to be a minority position. Many of the system-generated eref names do not appear in EXPLAIN output, at least not in our regression tests. Furthermore, EXPLAIN itself does post-processing of the relations that appear in the output to set the final names that are displaced (see set_rtable_names()), so if we didn't insert names at parse time, we'd still have an opportunity to make up something for display purposes. You could argue that the results would be worse, but the current results aren't especially amazing so I'm not sure I believe that. Perhaps with some elbow grease they would even be better. So overall I'm just confused here. Is this just a bunch of old cruft that has never been cleaned up or standardized, or does it serve some valuable purpose that is not obvious to me? -- Robert Haas EDB: http://www.enterprisedb.com
On 06.11.24 20:06, Robert Haas wrote: > I can make nothing of*TLOCRN* or*TROCRN*, even > after looking at the relevant source code, These are from the SQL standard text. So they are more guidance to the implementer than anything else. I think something had to be put there, because erefs are required. I'm also interested in the discussion you raise about that. (I think they are meant to be something like "table {left|right} operand cycle reference name".)
On Thu, Nov 7, 2024 at 2:05 AM Peter Eisentraut <peter@eisentraut.org> wrote: > On 06.11.24 20:06, Robert Haas wrote: > > I can make nothing of*TLOCRN* or*TROCRN*, even > > after looking at the relevant source code, > > These are from the SQL standard text. So they are more guidance to the > implementer than anything else. I think something had to be put there, > because erefs are required. I'm also interested in the discussion you > raise about that. > > (I think they are meant to be something like "table {left|right} operand > cycle reference name".) Interesting. I guess we could maybe document that a bit better, but it's not the main thing to focus on. I tried replacing all of those fake RTE entry names with NULL and fixing whatever broke in the regression tests. I attach the result. It seems like this is basically a problem that applies to subqueries rather than any other form of RTE, and mostly subqueries that are named *SELECT* or *SELECT* %d. The main place where the names are user-visible in EXPLAIN output. With this patch, the name displayed by EXPLAIN changed to unnamed_subquery, but that's what we already use for subqueries in some cases, so I don't see a problem with it. There's one case where *SELECT* 2 actually showed up in an error message. I chose to add a second variant of the message rather than displaying the name as unnamed_subquery, but either could be done: -DETAIL: There is a column named "q2" in table "*SELECT* 2", but it cannot be referenced from this part of the query. +DETAIL: There is a column named "q2", but it cannot be referenced from this part of the query. It's quite possible that this patch isn't completely correct and it's also possible that I'm missing some deeper problem with this idea that just isn't exercised by the regression tests. But overall I find these results fairly encouraging -- it just wasn't very hard to make the regression tests pass. I did a bit of historical checking at the comment that eref->aliasname is required to be present was added in commit 3a94e789f5c9537d804210be3cb26f7fb08e3b9e, Tom Lane, 2000. I can't immediately tell what the reasoning was. Copying Tom in case he has thoughts. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > It's quite possible that this patch isn't completely correct and it's > also possible that I'm missing some deeper problem with this idea that > just isn't exercised by the regression tests. But overall I find these > results fairly encouraging -- it just wasn't very hard to make the > regression tests pass. My guess is that there are more places that assume eref->aliasname isn't NULL than you've caught here; and some of them probably are in code we don't control. And you didn't even update the comments for struct Alias. Is there some strong reason to insist on making that core-dump-risking change, rather than simply assigning the now-one-size-fits-all alias when creating Alias nodes? > I did a bit of historical checking at the comment that eref->aliasname > is required to be present was added in commit > 3a94e789f5c9537d804210be3cb26f7fb08e3b9e, Tom Lane, 2000. I can't > immediately tell what the reasoning was. Copying Tom in case he has > thoughts. Well, that was 24 years ago ... but the commit message seems to only be about the requirement that user-written sub-SELECT-in-FROM have an alias. Which is a requirement we've since dropped. When it comes to system-supplied aliases, I'd argue that the tradeoff is about the same: forcing somebody or something that has a clue what the relation is to define an alias, versus falling back to some generic alias. If we're okay with generic aliases for user-written sub-SELECT then it's not much of a step to generic aliases for system-defined relations. I won't argue hard about that --- certainly things like "*VALUE*" aren't very pretty. But I'd really rather not switch to "eref->aliasname can be NULL": that's introducing a coding gotcha for zero benefit that I can detect. regards, tom lane
On Thu, Nov 7, 2024 at 4:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Is there some strong reason to insist on making that core-dump-risking > change, rather than simply assigning the now-one-size-fits-all alias > when creating Alias nodes? What I'm unhappy about is not being able to tell the difference between a name that was invented by or at least meaningful to the user and one that isn't. Filling in unnamed_subquery everywhere doesn't accomplish that because the user could in theory supply that name; even if that were no issue, I do not want to have code like: if (strcmp(aliasname, "unnamed_subquery") == 0 || (strncmp(aliasname, "unnamed_subquery_", strlen("unnamed_subquery_") && something complicated with strtol to see if the rest of the name is an integer)) ... looks system generated ... else ... looks user generated ... I would be more sympathetic to the idea of system-generated aliases if they were generated in a way that made it likely that they would be meaningful to the user. In fact, if they were generated in such a way that they would be unique, that would actually be fantastic and I would definitely not be arguing for removing them. But I think what we have right now is a mess. -- Robert Haas EDB: http://www.enterprisedb.com
BTW, one aspect of this proposal that needs to be discussed is that it can break existing SQL. An example discussed nearby[1] is regression=# select * from (values (1,7), (3,4) order by "*VALUES*".column2); column1 | column2 ---------+--------- 3 | 4 1 | 7 (2 rows) We concluded in the other thread that we didn't want to risk breaking such code. I concede that this example isn't too compelling on its own, but I wonder if there's other cases that are more likely to be in common use. (If we do decide it's okay to break this, my opinion about what to do in the other thread would change.) regards, tom lane [1] https://www.postgresql.org/message-id/251197.1730222362%40sss.pgh.pa.us
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Nov 7, 2024 at 4:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Is there some strong reason to insist on making that core-dump-risking >> change, rather than simply assigning the now-one-size-fits-all alias >> when creating Alias nodes? > What I'm unhappy about is not being able to tell the difference > between a name that was invented by or at least meaningful to the user > and one that isn't. You can already tell that, by looking to see whether RTE->alias->aliasname exists. eref is meant to be the resolved name-to-use not the user's original input. > I would be more sympathetic to the idea of system-generated aliases if > they were generated in a way that made it likely that they would be > meaningful to the user. In fact, if they were generated in such a way > that they would be unique, that would actually be fantastic and I > would definitely not be arguing for removing them. The trick there is to keep them predictable, because as I mentioned in my previous response, there may be people depending on knowing what name will be assigned. We're working with a ton of history here, and I'm not really convinced that change will be change for the better. regards, tom lane
On Thu, Nov 7, 2024 at 4:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > What I'm unhappy about is not being able to tell the difference > > between a name that was invented by or at least meaningful to the user > > and one that isn't. > > You can already tell that, by looking to see whether > RTE->alias->aliasname exists. eref is meant to be the resolved > name-to-use not the user's original input. Hmm, OK, that's useful. But I guess I'm still puzzled about the theory here. A name like *VALUES* doesn't seem like it was created with the idea of referring to it from some other part of your query. I do take your point that it works and somebody's probably relying on it, but I don't think you'd pick a name that requires quoting if you were trying to make it easy to use that name in the query. You might possibly also try to generate names that are easy for users to guess, and distinct. Since none of that was done, it seems like it was just intended for display. But EXPLAIN already has its own logic to decide on what names it will display in the output, which may be different from anything that was entered by the user or invented by the parser (set_rtable_names()). So the whole thing just seems like a really strange system. I find it hard to avoid the conclusion that it's just a historical accident -- somebody did something 20 or 30 years ago to make it all work, before we had all the infrastructure that we do today, and then none of those decisions ever got revisited due either to inertia or backward compatibility concerns. Do you see it differently? > The trick there is to keep them predictable, because as I mentioned in > my previous response, there may be people depending on knowing what > name will be assigned. We're working with a ton of history here, > and I'm not really convinced that change will be change for the > better. Yeah, I don't really want to be the one to break somebody's query that explicitly references "*VALUES*" or whatever. At least not without a better reason than I currently have. If this were just a display artifact I think finding some way to clean it up would be pretty worthwhile, but I would need a better reason to break working SQL. -- Robert Haas EDB: http://www.enterprisedb.com
On 11/8/24 20:33, Robert Haas wrote: > On Thu, Nov 7, 2024 at 4:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The trick there is to keep them predictable, because as I mentioned in >> my previous response, there may be people depending on knowing what >> name will be assigned. We're working with a ton of history here, >> and I'm not really convinced that change will be change for the >> better. > > Yeah, I don't really want to be the one to break somebody's query that > explicitly references "*VALUES*" or whatever. At least not without a > better reason than I currently have. If this were just a display > artifact I think finding some way to clean it up would be pretty > worthwhile, but I would need a better reason to break working SQL. Thanks for this topic! Having run into this years ago, I was confused by eref and alias fields. I frequently use eref during debugging. Also, knowing the naming convention makes it much easier to resolve issues with only an explanation when the user can't provide any other information. I wonder if other people who work with EXPLAIN a lot already have some sort of habit here. I agree that the naming convention can float, but please let it be stable and predictable. Also, I'm not sure how other extension developers operate, but in a handful of mine, I use the fact that eref always contains a name - the relational model requires a name for each (even intermediate) table and column, doesn't it? Also, do not forget that these names can be used in pg_hint_plan hints - one more reason to make it stable. -- regards, Andrei Lepikhov
[ this thread seems to have stalled out, but we need to resolve it ] Robert Haas <robertmhaas@gmail.com> writes: >>> What I'm unhappy about is not being able to tell the difference >>> between a name that was invented by or at least meaningful to the user >>> and one that isn't. >> You can already tell that, by looking to see whether >> RTE->alias->aliasname exists. eref is meant to be the resolved >> name-to-use not the user's original input. > Hmm, OK, that's useful. But I guess I'm still puzzled about the theory > here. A name like *VALUES* doesn't seem like it was created with the > idea of referring to it from some other part of your query. I do take > your point that it works and somebody's probably relying on it, but I > don't think you'd pick a name that requires quoting if you were trying > to make it easy to use that name in the query. As you say, some of this is lost in the mists of time; but I think the idea was specifically that these names should *not* be easy to type, because we don't want them to conflict with any relation alias that the user is likely to pick. I'm fairly sure that the SQL spec actually has verbiage to the effect of "the implementation shall select a name that does not conflict with any other name in the query". > You might possibly also > try to generate names that are easy for users to guess, and distinct. Yeah, per spec they should be distinct; but somebody didn't bother with that early on, and now we've ended up in a situation where ruleutils.c does it instead. I'm not sure that that's terribly evil. In particular, in a situation where we're trying to show a plan for a query with inlined views, EXPLAIN would probably have to have code to unique-ify the names anyway --- there's no way we're going to make these nonce names globally unique, so the view(s) might contain names that conflict with each other or the outer query. >> ... We're working with a ton of history here, >> and I'm not really convinced that change will be change for the >> better. > Yeah, I don't really want to be the one to break somebody's query that > explicitly references "*VALUES*" or whatever. At least not without a > better reason than I currently have. If this were just a display > artifact I think finding some way to clean it up would be pretty > worthwhile, but I would need a better reason to break working SQL. So it seems like we're coming to the conclusion that we don't want to change things here? The reason I want to push for a conclusion is that the discussion about "*VALUES*" over at [1] is on hold pending some decision here. The v3 patch in that thread is set up in such a way that it improves EXPLAIN output without breaking any existing SQL, and that's what I'd use if we refrain from changing things here. But it's a tad inconsistent, so if we did decide it was okay to break some edge cases, I'd reconsider. The reason that patch can (mostly) assign some other names to "*VALUES*" without breaking things is that we treat a VALUES clause as an implicit sub-select: SELECT * FROM (VALUES (1,2),...) v; is parsed as though it were more or less SELECT * FROM (SELECT * FROM VALUES (1,2),... AS "*VALUES*") AS v; and the "*VALUES*" alias is not referenceable except within that implicit sub-SELECT. The only place anybody notices it is in EXPLAIN, which has to print the base relation alias because "v" is usually gone due to flattening. So we can change the base relation alias to "v" without breaking anything in the original query (except in some very weird edge cases), and then EXPLAIN will talk about "v" not "*VALUES*". Perhaps a similar idea could be applied to the other cases where we invent aliases, but it's less obvious how. For instance in SELECT ... UNION SELECT ... there's no obvious place where we could get names for the two sub-SELECTs. regards, tom lane [1] https://www.postgresql.org/message-id/flat/CAExHW5sh28_gwQP4%3DX4i4kMsAYaoSi3tsNse3LaTihV%3DeWuTcA%40mail.gmail.com
On Sun, Dec 22, 2024 at 12:45 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Hmm, OK, that's useful. But I guess I'm still puzzled about the theory > > here. A name like *VALUES* doesn't seem like it was created with the > > idea of referring to it from some other part of your query. I do take > > your point that it works and somebody's probably relying on it, but I > > don't think you'd pick a name that requires quoting if you were trying > > to make it easy to use that name in the query. > > As you say, some of this is lost in the mists of time; but I think the > idea was specifically that these names should *not* be easy to type, > because we don't want them to conflict with any relation alias that > the user is likely to pick. I'm fairly sure that the SQL spec > actually has verbiage to the effect of "the implementation shall > select a name that does not conflict with any other name in the query". OK, but picking names that the user probably won't use is neither necessary nor sufficient to guarantee uniqueness. > > You might possibly also > > try to generate names that are easy for users to guess, and distinct. > > Yeah, per spec they should be distinct; but somebody didn't bother > with that early on, and now we've ended up in a situation where > ruleutils.c does it instead. I'm not sure that that's terribly evil. > In particular, in a situation where we're trying to show a plan for > a query with inlined views, EXPLAIN would probably have to have code > to unique-ify the names anyway --- there's no way we're going to make > these nonce names globally unique, so the view(s) might contain names > that conflict with each other or the outer query. When you say "there's no way we're going to make these nonce names globally unique," is that because you think it would be too costly from a performance perspective (which was my concern) or that you think it's flat-out impossible for some reason (which doesn't seem to me to be true)? > > Yeah, I don't really want to be the one to break somebody's query that > > explicitly references "*VALUES*" or whatever. At least not without a > > better reason than I currently have. If this were just a display > > artifact I think finding some way to clean it up would be pretty > > worthwhile, but I would need a better reason to break working SQL. > > So it seems like we're coming to the conclusion that we don't want > to change things here? > > The reason I want to push for a conclusion is that the discussion > about "*VALUES*" over at [1] is on hold pending some decision here. > The v3 patch in that thread is set up in such a way that it improves > EXPLAIN output without breaking any existing SQL, and that's what > I'd use if we refrain from changing things here. But it's a tad > inconsistent, so if we did decide it was okay to break some edge > cases, I'd reconsider. I think that if we can solve multiple problems all at once by breaking some edge cases, that's worth considering. There probably aren't that many people who have queries that reference the "*VALUES*" alias, so if we wanted to change that to "values" I don't see that as completely out of the question. Somebody will probably be inconvenienced but if it solves other problems maybe it's worth it. But I don't think that change by itself really helps anything here. > Perhaps a similar idea could be applied to the other cases where > we invent aliases, but it's less obvious how. For instance in > > SELECT ... UNION SELECT ... > > there's no obvious place where we could get names for the > two sub-SELECTs. If we had global uniqueness, or even per-subquery-level uniqueness, this would sort itself out somewhat nicely, I think. We would just end up with select_1 and select_2 or union_1 and union_2 or something like that, and I think that would be strictly better than the status quo where we do sometimes generate *SELECT* %d, but also sometimes just *SELECT* and other times unnamed_subquery, and also only ever *VALUES* and not *VALUES* %d. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Dec 22, 2024 at 12:45 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> In particular, in a situation where we're trying to show a plan for >> a query with inlined views, EXPLAIN would probably have to have code >> to unique-ify the names anyway --- there's no way we're going to make >> these nonce names globally unique, so the view(s) might contain names >> that conflict with each other or the outer query. > When you say "there's no way we're going to make these nonce names > globally unique," is that because you think it would be too costly > from a performance perspective (which was my concern) or that you > think it's flat-out impossible for some reason (which doesn't seem to > me to be true)? Global uniqueness across the database (not single queries) would be needed to prevent cases where different views use the same generated names. The only way I can see to do that without nasty performance costs is to use something like an OID counter. Which would mean that rather than nice names like "union_1", "union_2", etc, you'd soon be looking at "union_5846926". I don't think anyone would find that to be an improvement on what we're doing now. > If we had global uniqueness, or even per-subquery-level uniqueness, > this would sort itself out somewhat nicely, I think. We would just end > up with select_1 and select_2 or union_1 and union_2 or something like > that, and I think that would be strictly better than the status quo > where we do sometimes generate *SELECT* %d, but also sometimes just > *SELECT* and other times unnamed_subquery, and also only ever *VALUES* > and not *VALUES* %d. I'll concede that it'd be nicer. But I'm not convinced it'd be enough nicer to justify the costs of changing. We've been doing it this way for a long time, and AFAIR you're the first to complain about it. regards, tom lane
On Thu, Jan 2, 2025 at 3:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Global uniqueness across the database (not single queries) would be > needed to prevent cases where different views use the same generated > names. The only way I can see to do that without nasty performance > costs is to use something like an OID counter. Which would mean > that rather than nice names like "union_1", "union_2", etc, you'd > soon be looking at "union_5846926". I don't think anyone would > find that to be an improvement on what we're doing now. Oh, I agree, but I don't see why anyone would care whether rel names are unique across different queries. When I mentioned global uniqueness, I meant unique within a query, like what set_rtable_names() does after the fact. > > If we had global uniqueness, or even per-subquery-level uniqueness, > > this would sort itself out somewhat nicely, I think. We would just end > > up with select_1 and select_2 or union_1 and union_2 or something like > > that, and I think that would be strictly better than the status quo > > where we do sometimes generate *SELECT* %d, but also sometimes just > > *SELECT* and other times unnamed_subquery, and also only ever *VALUES* > > and not *VALUES* %d. > > I'll concede that it'd be nicer. But I'm not convinced it'd be enough > nicer to justify the costs of changing. We've been doing it this way > for a long time, and AFAIR you're the first to complain about it. I'm not sure it's enough nicer to justify the cost of changing, either. I do think that "you're the first to complain about it" is not a convincing argument, here or in general. Every time somebody reports a new problem, they are the first to complain about it, but that does make the problem any less real. The reason that I got interested in this problem was because of the thread about allowing extensions to control planner behavior. I wrote some words about it over there, but it's been a while so those thoughts might not be entirely up to date. What I've found is that it's a lot easier to insert a hook or three to allow an extension to control the planner behavior than it is to get those hooks to do anything useful, and that's precisely because it is hard to find a useful way to identify a particular part of the query plan. If the query planner were a person sitting next to you at your computer, you could point at the screen with your finger and say "hey, do you see this part of the EXPLAIN plan right here? could you try making this a sequential scan rather than an index scan?" and the planner could say "sure, let me re-plan it that way and I'll show you how it turns out". Since the planner is incorporeal and cannot see your finger, you want to identify "this part of the EXPLAIN plan right here" in some other way, like with a name, but the names known at parsing time and planning time are not unique and need not match what shows up in the EXPLAIN output. Concretely, if the user creates a partitioned table called *VALUES* with several partitions and then creates another such table, also called *VALUES*, in a different schema, and then joins the two of them together; and then does UNION ALL with a subquery that actually uses VALUES (), and then somewhere includes a subquery that also queries one of the tables actually called *VALUES*, you cannot meaningfully use the label *VALUES* to identify one particular scan; and you can't use anything that appears in the EXPLAIN output for that purpose either because the next planning cycle won't have those labels available *during planning* when it needs to honor whatever plan-shape requests were made. Now, I'm firmly convinced that this is a real problem and worth solving, but let me be clear that I don't think the solution is anywhere on this thread, nor do I think that it is simple. My original proposal of getting rid of system-generated fake names isn't necessary, because you very helpfully pointed out that I can look at whether RTE->alias->aliasname exists to figure that out. Also, enforcing global uniqueness across the query at parse time wouldn't actually help, because when we apply rewriter rules we can introduce new relations into the query, and then when we expand inheritance hierarchies we can introduce even more new relations into the query; and what the user will care about if they want to modify "that portion of the plan right there" is what ultimately ends up in the plan, not what was in the query at parse time. So as far as I am concerned, this thread has served the useful purpose of making me smarter and I can't really see a way for it to do anything more than that; but if it's given you a clever idea for something that we could change in the code, I'm certainly happy to hear about that. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > Oh, I agree, but I don't see why anyone would care whether rel names > are unique across different queries. When I mentioned global > uniqueness, I meant unique within a query, like what > set_rtable_names() does after the fact. Okay, but then we still have the problem of how to ensure that in a query that has inline'd some views. I think solving the sort of I-want-to-reference-this problem you describe would require that we unique-ify the aliases in the rewriter, just after it finishes incorporating any views. We could do that, but it seems like a lot of cycles to expend on something that would be pointless in the typical case where nobody ever looks at the aliases later. > Now, I'm firmly convinced that this is a real problem and worth > solving, but let me be clear that I don't think the solution is > anywhere on this thread, nor do I think that it is simple. Agreed. > My original > proposal of getting rid of system-generated fake names isn't > necessary, because you very helpfully pointed out that I can look at > whether RTE->alias->aliasname exists to figure that out. Actually, I noticed that we are failing to honor that in the places where we inject "*SELECT*" and "*SELECT* %d" names, because that code puts those names into RTE->alias not only RTE->eref. I experimented with the attached patch to not do that anymore, which is sort of a subset of what you did but just focused on not lying about what's generated versus user-written. We could alternatively keep the current generated names by extending addRangeTableEntryForSubquery's API so that alias and generated eref are passed separately. (I didn't look to see if anyplace else is messing up this distinction similarly.) regards, tom lane diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index bf322198a2..3d8a433c19 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -4961,13 +4961,13 @@ SELECT ft1.c1 FROM ft1 JOIN ft2 on ft1.c1 = ft2.c1 WHERE -- =================================================================== EXPLAIN (verbose, costs off) INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20; - QUERYPLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + QUERY PLAN +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Insert on public.ft2 Remote SQL: INSERT INTO "S 1"."T 1"("C 1", c2, c3, c4, c5, c6, c7, c8) VALUES ($1, $2, $3, $4, $5, $6, $7, $8) Batch Size: 1 - -> Subquery Scan on "*SELECT*" - Output: "*SELECT*"."?column?", "*SELECT*"."?column?_1", NULL::integer, "*SELECT*"."?column?_2", NULL::timestampwith time zone, NULL::timestamp without time zone, NULL::character varying, 'ft2 '::character(10), NULL::user_enum + -> Subquery Scan on unnamed_subquery + Output: unnamed_subquery."?column?", unnamed_subquery."?column?_1", NULL::integer, unnamed_subquery."?column?_2",NULL::timestamp with time zone, NULL::timestamp without time zone, NULL::character varying,'ft2 '::character(10), NULL::user_enum -> Foreign Scan on public.ft2 ft2_1 Output: (ft2_1.c1 + 1000), (ft2_1.c2 + 100), (ft2_1.c3 || ft2_1.c3) Remote SQL: SELECT "C 1", c2, c3 FROM "S 1"."T 1" LIMIT 20::bigint diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c index 3a2d51c5ad..c58bbd5b78 100644 --- a/src/backend/executor/functions.c +++ b/src/backend/executor/functions.c @@ -1962,7 +1962,7 @@ tlist_coercion_finished: rte = makeNode(RangeTblEntry); rte->rtekind = RTE_SUBQUERY; rte->subquery = parse; - rte->eref = rte->alias = makeAlias("*SELECT*", colnames); + rte->eref = makeAlias("unnamed_subquery", colnames); rte->lateral = false; rte->inh = false; rte->inFromCl = true; diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c index 3864a675d2..359d3c390e 100644 --- a/src/backend/parser/analyze.c +++ b/src/backend/parser/analyze.c @@ -823,7 +823,7 @@ transformInsertStmt(ParseState *pstate, InsertStmt *stmt) */ nsitem = addRangeTableEntryForSubquery(pstate, selectQuery, - makeAlias("*SELECT*", NIL), + NULL, false, false); addNSItemToQuery(pstate, nsitem, true, false, false); @@ -2147,7 +2147,6 @@ transformSetOperationTree(ParseState *pstate, SelectStmt *stmt, { /* Process leaf SELECT */ Query *selectQuery; - char selectName[32]; ParseNamespaceItem *nsitem; RangeTblRef *rtr; ListCell *tl; @@ -2203,11 +2202,9 @@ transformSetOperationTree(ParseState *pstate, SelectStmt *stmt, /* * Make the leaf query be a subquery in the top-level rangetable. */ - snprintf(selectName, sizeof(selectName), "*SELECT* %d", - list_length(pstate->p_rtable) + 1); nsitem = addRangeTableEntryForSubquery(pstate, selectQuery, - makeAlias(selectName, NIL), + NULL, false, false); diff --git a/src/test/regress/expected/rangefuncs.out b/src/test/regress/expected/rangefuncs.out index 397a8b35d6..7f9c441f90 100644 --- a/src/test/regress/expected/rangefuncs.out +++ b/src/test/regress/expected/rangefuncs.out @@ -2130,10 +2130,10 @@ select testrngfunc(); explain (verbose, costs off) select * from testrngfunc(); - QUERY PLAN ----------------------------------------------------------- - Subquery Scan on "*SELECT*" - Output: "*SELECT*"."?column?", "*SELECT*"."?column?_1" + QUERY PLAN +---------------------------------------------------------------------- + Subquery Scan on unnamed_subquery + Output: unnamed_subquery."?column?", unnamed_subquery."?column?_1" -> Unique Output: (1), (2) -> Sort diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out index caa8fe70a0..cde8154220 100644 --- a/src/test/regress/expected/union.out +++ b/src/test/regress/expected/union.out @@ -942,7 +942,7 @@ SELECT q1 FROM int8_tbl EXCEPT SELECT q2 FROM int8_tbl ORDER BY q2 LIMIT 1; ERROR: column "q2" does not exist LINE 1: ... int8_tbl EXCEPT SELECT q2 FROM int8_tbl ORDER BY q2 LIMIT 1... ^ -DETAIL: There is a column named "q2" in table "*SELECT* 2", but it cannot be referenced from this part of the query. +DETAIL: There is a column named "q2" in table "unnamed_subquery", but it cannot be referenced from this part of the query. -- But this should work: SELECT q1 FROM int8_tbl EXCEPT (((SELECT q2 FROM int8_tbl ORDER BY q2 LIMIT 1))) ORDER BY 1; q1 @@ -1338,14 +1338,14 @@ where q2 = q2; ---------------------------------------------------------- Unique -> Merge Append - Sort Key: "*SELECT* 1".q1 - -> Subquery Scan on "*SELECT* 1" + Sort Key: unnamed_subquery.q1 + -> Subquery Scan on unnamed_subquery -> Unique -> Sort Sort Key: i81.q1, i81.q2 -> Seq Scan on int8_tbl i81 Filter: (q2 IS NOT NULL) - -> Subquery Scan on "*SELECT* 2" + -> Subquery Scan on unnamed_subquery_1 -> Unique -> Sort Sort Key: i82.q1, i82.q2 @@ -1374,14 +1374,14 @@ where -q1 = q2; -------------------------------------------------------- Unique -> Merge Append - Sort Key: "*SELECT* 1".q1 - -> Subquery Scan on "*SELECT* 1" + Sort Key: unnamed_subquery.q1 + -> Subquery Scan on unnamed_subquery -> Unique -> Sort Sort Key: i81.q1, i81.q2 -> Seq Scan on int8_tbl i81 Filter: ((- q1) = q2) - -> Subquery Scan on "*SELECT* 2" + -> Subquery Scan on unnamed_subquery_1 -> Unique -> Sort Sort Key: i82.q1, i82.q2
On Thu, Jan 2, 2025 at 5:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Okay, but then we still have the problem of how to ensure that in > a query that has inline'd some views. I think solving the sort of > I-want-to-reference-this problem you describe would require > that we unique-ify the aliases in the rewriter, just after it > finishes incorporating any views. We could do that, but it seems > like a lot of cycles to expend on something that would be pointless > in the typical case where nobody ever looks at the aliases later. Right, plus if you care about function-inlining or inheritance-expansion, those happen even later, at planning time. > > My original > > proposal of getting rid of system-generated fake names isn't > > necessary, because you very helpfully pointed out that I can look at > > whether RTE->alias->aliasname exists to figure that out. > > Actually, I noticed that we are failing to honor that in the places > where we inject "*SELECT*" and "*SELECT* %d" names, because that > code puts those names into RTE->alias not only RTE->eref. > I experimented with the attached patch to not do that anymore, > which is sort of a subset of what you did but just focused on > not lying about what's generated versus user-written. We could > alternatively keep the current generated names by extending > addRangeTableEntryForSubquery's API so that alias and generated eref > are passed separately. (I didn't look to see if anyplace else > is messing up this distinction similarly.) Hmm, I definitely like not lying about what is generated vs. what is user-written. I don't have a strong opinion right now on the best way of accomplishing that. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Jan 3, 2025 at 8:44 AM Robert Haas <robertmhaas@gmail.com> wrote: > > Actually, I noticed that we are failing to honor that in the places > > where we inject "*SELECT*" and "*SELECT* %d" names, because that > > code puts those names into RTE->alias not only RTE->eref. > > I experimented with the attached patch to not do that anymore, > > which is sort of a subset of what you did but just focused on > > not lying about what's generated versus user-written. We could > > alternatively keep the current generated names by extending > > addRangeTableEntryForSubquery's API so that alias and generated eref > > are passed separately. (I didn't look to see if anyplace else > > is messing up this distinction similarly.) > > Hmm, I definitely like not lying about what is generated vs. what is > user-written. I don't have a strong opinion right now on the best way > of accomplishing that. I rediscovered, or re-encountered, this problem today, which motivated me to have a closer look at your (Tom's) patch. My feeling is that it's the right approach. I agree that we could try to keep the current generated names by extending addRangeTableEntryForSubquery, but I'm tentatively inclined to think that we shouldn't. Perhaps that's partly a stylistic preference on my part: I think unnamed_subquery_N looks nicer than "*SELECT * N", but there's also something to be said for keeping the code simple. I think it would be reasonable to instead extend addRangeTableEntryForSubquery if we find that the naming change breaks something, but if it doesn't then I like what you've done better. There's also an argument from consistency: even without the patch, unnamed_subquery leaks out in some contexts, and I think it's nicer to use unnamed_subquery everywhere than to use that in some places and *SELECT* in others. I then went looking for other places that have similar issues. I pretty quickly discovered that convert_ANY_sublink_to_join is another caller of addRangeTableEntryForSubquery that is fabricating an alias when it could just pass NULL; in that case, the fabricated name is ANY_subquery. Also, it seems like the recursive CTE stuff can just set the alias to NULL and the eref as it's currently doing, instead of setting both alias and eref as the code does currently. So, PFA 0001, a rebased version of Tom's patch; 0002, addressing ANY_subquery; and 0003, addressing *TLOCRN* and *TROCRN*. If we decide to adopt all of these, we may want to do some squashing before commit, but we have a little time to figure that out since I think this is v19 material anyway. Apart from those cases, and at least AFAICS, everything that's using a wholly fabricated name seems to be either (1) a case where we intend for the name to be referenceable (old, new, excluded) or (2) a value that is assigned only to eref and not to alias (e.g. *GROUP*). However, I did come across one other mildly interesting case. expand_single_inheritance_child has this: /* * We just duplicate the parent's table alias name for each child. If the * plan gets printed, ruleutils.c has to sort out unique table aliases to * use, which it can handle. */ childrte->alias = childrte->eref = makeAlias(parentrte->eref->aliasname, child_colnames); What I find curious about this is that we're assigning the parent's eref to both the child's eref and the child's alias. Maybe there's something I don't understand here, or maybe it just doesn't matter, but why wouldn't we assign eref to eref and alias to alias? Or even alias to alias and generate a new eref? -- Robert Haas EDB: http://www.enterprisedb.com
Вложения
[ returning to this thread now that v19 is open for business ] Robert Haas <robertmhaas@gmail.com> writes: > I rediscovered, or re-encountered, this problem today, which motivated > me to have a closer look at your (Tom's) patch. My feeling is that > it's the right approach. I agree that we could try to keep the current > generated names by extending addRangeTableEntryForSubquery, but I'm > tentatively inclined to think that we shouldn't. That's fine by me. I'm personally content with all of the changes shown in your patchset. > However, I did come across one other mildly interesting case. > expand_single_inheritance_child has this: > ... > What I find curious about this is that we're assigning the parent's > eref to both the child's eref and the child's alias. Maybe there's > something I don't understand here, or maybe it just doesn't matter, > but why wouldn't we assign eref to eref and alias to alias? Or even > alias to alias and generate a new eref? The issue is explained in the previous comment block a few lines up: * Construct an alias clause for the child, which we can also use as eref. * This is important so that EXPLAIN will print the right column aliases * for child-table columns. (Since ruleutils.c doesn't have any easy way * to reassociate parent and child columns, we must get the child column * aliases right to start with. Note that setting childrte->alias forces * ruleutils.c to use these column names, which it otherwise would not.) I think the case this is worried about is that if you have a parent table t(a,b,c), and the query writes say "t as t1(x)", then t's column "a" will be labeled "x" by EXPLAIN and we want the child's column "a" to similarly be labeled "x". So unless we want to start rejiggering ruleutils' already-overly- complex behavior, we have to put something in childrte->alias, even if the parent had no alias. So that's a violation of the principle you were hoping to establish, but as long as it only applies to partitions and inheritance children, I'm not sure it's worth moving heaven and earth to make it different. We could certainly do something a little different than what the code is doing, such as "if the parent does have a user-written alias, use parentrte->alias->aliasname not parentrte->eref->aliasname for the childrte->alias->aliasname". I'm not sure it's worth bothering with that, though. I noticed that the patchset was failing in cfbot because we've since grown another regression test case whose output is affected by 0002. So here's a v3 that incorporates that additional change. I did a little bit of wordsmithing on the commit messages too, but the code changes are identical to v2. regards, tom lane From ab46f16a15464e9ba3fae5008ef533e0a6a28303 Mon Sep 17 00:00:00 2001 From: Tom Lane <tgl@sss.pgh.pa.us> Date: Wed, 23 Jul 2025 17:04:45 -0400 Subject: [PATCH v3 1/3] Don't generate fake "*SELECT*" or "*SELECT* %d" subquery aliases. rte->alias should point only to a user-written alias, but in these cases that principle was violated. Fixing this causes some regression test output changes: wherever rte->alias previously had a value and is now NULL, rte->eref is now set to a generated name rather than to rte->alias; and the scheme used to generate eref names differs from what we were doing for aliases. The upshot is that instead of "*SELECT*" or "*SELECT* %d", EXPLAIN will now emit "unnamed_subquery" or "unnamed_subquery_%d". But that's a reasonable descriptor, and we were already producing that in yet other cases, so this seems not too objectionable. Author: Tom Lane <tgl@sss.pgh.pa.us> Co-authored-by: Robert Haas <rhaas@postgresql.org> Discussion: https://postgr.es/m/CA+TgmoYSYmDA2GvanzPMci084n+mVucv0bJ0HPbs6uhmMN6HMg@mail.gmail.com --- contrib/postgres_fdw/expected/postgres_fdw.out | 8 ++++---- src/backend/executor/functions.c | 2 +- src/backend/parser/analyze.c | 7 ++----- src/test/regress/expected/partition_prune.out | 4 ++-- src/test/regress/expected/rangefuncs.out | 8 ++++---- src/test/regress/expected/union.out | 14 +++++++------- 6 files changed, 20 insertions(+), 23 deletions(-) diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 4b6e49a5d95..c492ba38c58 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -5098,13 +5098,13 @@ SELECT ft1.c1 FROM ft1 JOIN ft2 on ft1.c1 = ft2.c1 WHERE -- =================================================================== EXPLAIN (verbose, costs off) INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20; - QUERYPLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + QUERY PLAN +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Insert on public.ft2 Remote SQL: INSERT INTO "S 1"."T 1"("C 1", c2, c3, c4, c5, c6, c7, c8) VALUES ($1, $2, $3, $4, $5, $6, $7, $8) Batch Size: 1 - -> Subquery Scan on "*SELECT*" - Output: "*SELECT*"."?column?", "*SELECT*"."?column?_1", NULL::integer, "*SELECT*"."?column?_2", NULL::timestampwith time zone, NULL::timestamp without time zone, NULL::character varying(10), 'ft2 '::character(10),NULL::user_enum + -> Subquery Scan on unnamed_subquery + Output: unnamed_subquery."?column?", unnamed_subquery."?column?_1", NULL::integer, unnamed_subquery."?column?_2",NULL::timestamp with time zone, NULL::timestamp without time zone, NULL::character varying(10),'ft2 '::character(10), NULL::user_enum -> Foreign Scan on public.ft2 ft2_1 Output: (ft2_1.c1 + 1000), (ft2_1.c2 + 100), (ft2_1.c3 || ft2_1.c3) Remote SQL: SELECT "C 1", c2, c3 FROM "S 1"."T 1" LIMIT 20::bigint diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c index 359aafea681..0547fda2101 100644 --- a/src/backend/executor/functions.c +++ b/src/backend/executor/functions.c @@ -2454,7 +2454,7 @@ tlist_coercion_finished: rte = makeNode(RangeTblEntry); rte->rtekind = RTE_SUBQUERY; rte->subquery = parse; - rte->eref = rte->alias = makeAlias("*SELECT*", colnames); + rte->eref = makeAlias("unnamed_subquery", colnames); rte->lateral = false; rte->inh = false; rte->inFromCl = true; diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c index 34f7c17f576..b9763ea1714 100644 --- a/src/backend/parser/analyze.c +++ b/src/backend/parser/analyze.c @@ -777,7 +777,7 @@ transformInsertStmt(ParseState *pstate, InsertStmt *stmt) */ nsitem = addRangeTableEntryForSubquery(pstate, selectQuery, - makeAlias("*SELECT*", NIL), + NULL, false, false); addNSItemToQuery(pstate, nsitem, true, false, false); @@ -2100,7 +2100,6 @@ transformSetOperationTree(ParseState *pstate, SelectStmt *stmt, { /* Process leaf SELECT */ Query *selectQuery; - char selectName[32]; ParseNamespaceItem *nsitem; RangeTblRef *rtr; ListCell *tl; @@ -2156,11 +2155,9 @@ transformSetOperationTree(ParseState *pstate, SelectStmt *stmt, /* * Make the leaf query be a subquery in the top-level rangetable. */ - snprintf(selectName, sizeof(selectName), "*SELECT* %d", - list_length(pstate->p_rtable) + 1); nsitem = addRangeTableEntryForSubquery(pstate, selectQuery, - makeAlias(selectName, NIL), + NULL, false, false); diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out index d1966cd7d82..68ecd951809 100644 --- a/src/test/regress/expected/partition_prune.out +++ b/src/test/regress/expected/partition_prune.out @@ -4763,7 +4763,7 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o QUERY PLAN ---------------------------------------------------------------------------------------------- Append - -> Subquery Scan on "*SELECT* 1_1" + -> Subquery Scan on unnamed_subquery_2 -> WindowAgg Window: w1 AS (PARTITION BY part_abc.a ORDER BY part_abc.a) -> Append @@ -4780,7 +4780,7 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o -> Index Scan using part_abc_3_2_a_idx on part_abc_3_2 part_abc_4 Index Cond: (a >= (stable_one() + 1)) Filter: (d <= stable_one()) - -> Subquery Scan on "*SELECT* 2" + -> Subquery Scan on unnamed_subquery_1 -> WindowAgg Window: w1 AS (PARTITION BY part_abc_5.a ORDER BY part_abc_5.a) -> Append diff --git a/src/test/regress/expected/rangefuncs.out b/src/test/regress/expected/rangefuncs.out index c21be83aa4a..30241e22da2 100644 --- a/src/test/regress/expected/rangefuncs.out +++ b/src/test/regress/expected/rangefuncs.out @@ -2130,10 +2130,10 @@ select testrngfunc(); explain (verbose, costs off) select * from testrngfunc(); - QUERY PLAN ----------------------------------------------------------- - Subquery Scan on "*SELECT*" - Output: "*SELECT*"."?column?", "*SELECT*"."?column?_1" + QUERY PLAN +---------------------------------------------------------------------- + Subquery Scan on unnamed_subquery + Output: unnamed_subquery."?column?", unnamed_subquery."?column?_1" -> Unique Output: (1), (2) -> Sort diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out index 96962817ed4..d3ea433db15 100644 --- a/src/test/regress/expected/union.out +++ b/src/test/regress/expected/union.out @@ -942,7 +942,7 @@ SELECT q1 FROM int8_tbl EXCEPT SELECT q2 FROM int8_tbl ORDER BY q2 LIMIT 1; ERROR: column "q2" does not exist LINE 1: ... int8_tbl EXCEPT SELECT q2 FROM int8_tbl ORDER BY q2 LIMIT 1... ^ -DETAIL: There is a column named "q2" in table "*SELECT* 2", but it cannot be referenced from this part of the query. +DETAIL: There is a column named "q2" in table "unnamed_subquery", but it cannot be referenced from this part of the query. -- But this should work: SELECT q1 FROM int8_tbl EXCEPT (((SELECT q2 FROM int8_tbl ORDER BY q2 LIMIT 1))) ORDER BY 1; q1 @@ -1338,14 +1338,14 @@ where q2 = q2; ---------------------------------------------------------- Unique -> Merge Append - Sort Key: "*SELECT* 1".q1 - -> Subquery Scan on "*SELECT* 1" + Sort Key: unnamed_subquery.q1 + -> Subquery Scan on unnamed_subquery -> Unique -> Sort Sort Key: i81.q1, i81.q2 -> Seq Scan on int8_tbl i81 Filter: (q2 IS NOT NULL) - -> Subquery Scan on "*SELECT* 2" + -> Subquery Scan on unnamed_subquery_1 -> Unique -> Sort Sort Key: i82.q1, i82.q2 @@ -1374,14 +1374,14 @@ where -q1 = q2; -------------------------------------------------------- Unique -> Merge Append - Sort Key: "*SELECT* 1".q1 - -> Subquery Scan on "*SELECT* 1" + Sort Key: unnamed_subquery.q1 + -> Subquery Scan on unnamed_subquery -> Unique -> Sort Sort Key: i81.q1, i81.q2 -> Seq Scan on int8_tbl i81 Filter: ((- q1) = q2) - -> Subquery Scan on "*SELECT* 2" + -> Subquery Scan on unnamed_subquery_1 -> Unique -> Sort Sort Key: i82.q1, i82.q2 -- 2.43.5 From 4bf665212789c0856dd4b166ca9e0359bca3994e Mon Sep 17 00:00:00 2001 From: Robert Haas <rhaas@postgresql.org> Date: Wed, 23 Jul 2025 17:12:34 -0400 Subject: [PATCH v3 2/3] Don't generate fake "ANY_subquery" aliases, either. This is just like the previous commit, but for a different invented alias name. Author: Robert Haas <rhaas@postgresql.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA+TgmoYSYmDA2GvanzPMci084n+mVucv0bJ0HPbs6uhmMN6HMg@mail.gmail.com --- src/backend/optimizer/plan/subselect.c | 2 +- src/test/regress/expected/memoize.out | 8 ++++---- src/test/regress/expected/subselect.out | 14 +++++++------- 3 files changed, 12 insertions(+), 12 deletions(-) diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index d71ed958e31..fae18548e07 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -1397,7 +1397,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, */ nsitem = addRangeTableEntryForSubquery(pstate, subselect, - makeAlias("ANY_subquery", NIL), + NULL, use_lateral, false); rte = nsitem->p_rte; diff --git a/src/test/regress/expected/memoize.out b/src/test/regress/expected/memoize.out index 150dc1b44cf..fbcaf113266 100644 --- a/src/test/regress/expected/memoize.out +++ b/src/test/regress/expected/memoize.out @@ -545,15 +545,15 @@ EXPLAIN (COSTS OFF) SELECT * FROM tab_anti t1 WHERE t1.a IN (SELECT a FROM tab_anti t2 WHERE t2.b IN (SELECT t1.b FROM tab_anti t3 WHERE t2.a > 1 OFFSET 0)); - QUERY PLAN -------------------------------------------------- + QUERY PLAN +--------------------------------------------------- Nested Loop Semi Join -> Seq Scan on tab_anti t1 -> Nested Loop Semi Join Join Filter: (t1.a = t2.a) -> Seq Scan on tab_anti t2 - -> Subquery Scan on "ANY_subquery" - Filter: (t2.b = "ANY_subquery".b) + -> Subquery Scan on unnamed_subquery + Filter: (t2.b = unnamed_subquery.b) -> Result One-Time Filter: (t2.a > 1) -> Seq Scan on tab_anti t3 diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out index 18fed63e738..54df8fcbba9 100644 --- a/src/test/regress/expected/subselect.out +++ b/src/test/regress/expected/subselect.out @@ -1467,14 +1467,14 @@ select * from int4_tbl o where (f1, f1) in ------------------------------------------------------------------- Nested Loop Semi Join Output: o.f1 - Join Filter: (o.f1 = "ANY_subquery".f1) + Join Filter: (o.f1 = unnamed_subquery.f1) -> Seq Scan on public.int4_tbl o Output: o.f1 -> Materialize - Output: "ANY_subquery".f1, "ANY_subquery".g - -> Subquery Scan on "ANY_subquery" - Output: "ANY_subquery".f1, "ANY_subquery".g - Filter: ("ANY_subquery".f1 = "ANY_subquery".g) + Output: unnamed_subquery.f1, unnamed_subquery.g + -> Subquery Scan on unnamed_subquery + Output: unnamed_subquery.f1, unnamed_subquery.g + Filter: (unnamed_subquery.f1 = unnamed_subquery.g) -> Result Output: i.f1, ((generate_series(1, 50)) / 10) -> ProjectSet @@ -2642,8 +2642,8 @@ ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd); -> Memoize Cache Key: b.hundred, b.odd Cache Mode: binary - -> Subquery Scan on "ANY_subquery" - Filter: (b.hundred = "ANY_subquery".min) + -> Subquery Scan on unnamed_subquery + Filter: (b.hundred = unnamed_subquery.min) -> Result InitPlan 1 -> Limit -- 2.43.5 From e92da9d1520e8feb71aa6fd42515c4e34b862535 Mon Sep 17 00:00:00 2001 From: Robert Haas <rhaas@postgresql.org> Date: Wed, 23 Jul 2025 17:16:20 -0400 Subject: [PATCH v3 3/3] Don't generate fake "*TLOCRN*" or "*TROCRN*" aliases, either. This fix actually doesn't change any regression test outputs. Author: Robert Haas <rhaas@postgresql.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA+TgmoYSYmDA2GvanzPMci084n+mVucv0bJ0HPbs6uhmMN6HMg@mail.gmail.com --- src/backend/rewrite/rewriteSearchCycle.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/backend/rewrite/rewriteSearchCycle.c b/src/backend/rewrite/rewriteSearchCycle.c index 19b89dee0d0..3b1a9dfc576 100644 --- a/src/backend/rewrite/rewriteSearchCycle.c +++ b/src/backend/rewrite/rewriteSearchCycle.c @@ -282,8 +282,8 @@ rewriteSearchAndCycle(CommonTableExpr *cte) newrte = makeNode(RangeTblEntry); newrte->rtekind = RTE_SUBQUERY; - newrte->alias = makeAlias("*TLOCRN*", cte->ctecolnames); - newrte->eref = newrte->alias; + newrte->alias = NULL; + newrte->eref = makeAlias("*TLOCRN*", cte->ctecolnames); newsubquery = copyObject(rte1->subquery); IncrementVarSublevelsUp((Node *) newsubquery, 1, 1); newrte->subquery = newsubquery; @@ -379,8 +379,8 @@ rewriteSearchAndCycle(CommonTableExpr *cte) ewcl = lappend(ewcl, makeString(cte->cycle_clause->cycle_mark_column)); ewcl = lappend(ewcl, makeString(cte->cycle_clause->cycle_path_column)); } - newrte->alias = makeAlias("*TROCRN*", ewcl); - newrte->eref = newrte->alias; + newrte->alias = NULL; + newrte->eref = makeAlias("*TROCRN*", ewcl); /* * Find the reference to the recursive CTE in the right UNION subquery's -- 2.43.5
On Wed, Jul 23, 2025 at 5:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > [ returning to this thread now that v19 is open for business ] Thanks for your attention to this, and sorry for the slow response. > I think the case this is worried about is that if you have a parent > table t(a,b,c), and the query writes say "t as t1(x)", then t's column > "a" will be labeled "x" by EXPLAIN and we want the child's column "a" > to similarly be labeled "x". OK, but I don't quite see how that would go wrong. My proposal was to assign eref to eref and alias to alias. If the parent has no alias, then the child would also have no alias, but the eref would match. If the parent does have an alias, then the child would end up with an alias matching the parent, which would presumably take precedence over the eref that would also match the parent. For what we're doing now to be necessary, there must be something in ruleutils.c that either needs the eref and alias of a child to match, or needs the eref of a child to match the alias of the parent, unless I'm missing something. There might well be such a thing, I'm just not sure what it is. > We could certainly do something a little different than what the code > is doing, such as "if the parent does have a user-written alias, use > parentrte->alias->aliasname not parentrte->eref->aliasname for the > childrte->alias->aliasname". I'm not sure it's worth bothering with > that, though. I don't have a clear opinion on this. I don't understand well enough what's being done here. I don't think not doing this is going to cause my development plans any immediate problems, but some of this stuff is quite fiddly and hard to understand. > I noticed that the patchset was failing in cfbot because we've since > grown another regression test case whose output is affected by 0002. > So here's a v3 that incorporates that additional change. I did a > little bit of wordsmithing on the commit messages too, but the code > changes are identical to v2. Thanks. I committed these today, after editing the commit message for 0003 a bit more. -- Robert Haas EDB: http://www.enterprisedb.com