Обсуждение: Re: plan shape work
On 5/19/25 20:01, Robert Haas wrote: > Hi, > > A couple of people at pgconf.dev seemed to want to know more about my > ongoing plan shape work, so here are the patches I have currently. > This is a long way from something that actually looks like a usable > feature, but these are bits of infrastructure that I think will be > necessary to get to a usable feature. As a recap, my overall goal here > is to make it so that you can examine a finished plan, figure out what > decisions the planner made, and then somehow get the planner to make > those same decisions over again in a future planning cycle. Since > doing this for all types of planner decisions seems too difficult for > an initial goal, I'm focusing on scans and joins for now. A further > goal is that I want it to be possible for extensions to use this > infrastructure to implement a variety of different policies that they > might feel to be beneficial, so I'm looking to minimize the amount of > stuff that has to be done in core PostgreSQL or can only be used by > core PostgreSQL. > > ... Thanks for the overview. I don't have any immediate feedback, but it sounds like it might be related to the "making planner decisions clear" session from the unconference ... The basic premise of that session was about how to give users better info about the planner decisions - why paths were selected/rejected, etc. A simple example would be "why was the index not used", and the possible answers include "dominated by cost by another path" or "does not match the index keys" etc. I wonder if this work might be useful for something like that. regards -- Tomas Vondra
+1, this seems like it could be very useful. A somewhat related issue is being able to tie plan nodes back to the query text: it can be hard to understand the planner's decisions if it's not even clear what part of the query it's making decisions about. I'm sure this is not an easy problem in general, but I wonder if you think that could be improved in the course of this work, or if you have other thoughts about it. Thanks, Maciek
On Tue, May 20, 2025 at 2:45 PM Tomas Vondra <tomas@vondra.me> wrote: > Thanks for the overview. I don't have any immediate feedback, but it > sounds like it might be related to the "making planner decisions clear" > session from the unconference ... > > The basic premise of that session was about how to give users better > info about the planner decisions - why paths were selected/rejected, > etc. A simple example would be "why was the index not used", and the > possible answers include "dominated by cost by another path" or "does > not match the index keys" etc. > > I wonder if this work might be useful for something like that. I've been wondering that, too. There's definitely some indirect ways in which that might be the case. For example, I think this work would lend itself to saying "hey, try planning this query, but for that table over there, use an index scan on this table." Then, it either still doesn't -- meaning the index isn't usable for some reason -- or it does and you can see the resulting plan with presumably higher cost and maybe infer why it didn't happen. That's better than today, where we have only very crude tools that let us do things like disable an entire scan type for the entire query, and I think it would make it a lot easier and less frustrating for a knowledgeable user to figure out why things are happening. But even though I think that would be better than today, I'm not sure it rises to the level of actually being good, because I think it still requires a fairly knowledgeable operator to figure things out, and you probably have to experiment a bunch to understand the situation instead of, say, being able to just look at the EXPLAIN plan and see the answer. I think being able to look at the EXPLAIN plan and see the answer, without needing a bunch of poking around, would be the ideal scenario here. But in some sense this is the same problem as understanding how an AI neural network is reasoning. The answer to "why did the planner pick plan X" is always "X was the cheapest possible plan". Ideas like "we chose a merge join because both tables are large enough that neither would fit into a hash table conveniently" are human explanations of why the math had the effect that it did; they are not how the planner actually reasons. So it's not just a matter of exposing the actual reasoning process to the user, because the computer is not reasoning in a way that a human would. It would have to be a matter of exposing some kind of other information that would allow the human being to comprehend easily what led the machine's algorithm to a certain conclusion; and it is not obvious how to get there. I have a sense - possibly an incorrect one - that the core of the problem here is that the planner considers lots of very similar alternatives. A hypothetical feature that showed the second-cheapest plan would be all but useless, because the second-cheapest plan would just be a very minor variation of the cheapest plan in almost all cases. One idea that crossed my mind was to display information in EXPLAIN about what would have happened if we'd done something really different. For instance, suppose that at a certain level of the plan tree we actually chose a merge join, but we also show the estimated cost of the cheapest hash join (if any) and the cheapest nested loop (if any) that we considered at that level. The user might be able to draw useful conclusions based on whether those numbers were altogether absent (i.e. that join type was not viable at all) or whether the cost was a little higher or a lot higher than that of the path actually chosen. For scans, you could list which indexes were believed to be usable and perhaps what the cost would have been for the cheapest one not actually selected; and what the cost of a sequential scan would have been if you hadn't picked one. I'm not sure how useful this would be, so the whole idea might actually suck, or maybe it's sort of the right idea but needs a bunch of refinement to really be useful. I don't have a better idea right now, though. If there are any notes that were taken during that unconference session, please point me in the right direction; I was in another session at that time but would read any available notes with interest. -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, May 20, 2025 at 3:09 PM Maciek Sakrejda <maciek@pganalyze.com> wrote: > +1, this seems like it could be very useful. A somewhat related issue > is being able to tie plan nodes back to the query text: it can be hard > to understand the planner's decisions if it's not even clear what part > of the query it's making decisions about. I'm sure this is not an easy > problem in general, but I wonder if you think that could be improved > in the course of this work, or if you have other thoughts about it. Thanks. I don't really have any ideas about the problem you mention, perhaps partly because I haven't experienced it too much. I mean, I have sometimes been confused about which parts of the query go with which parts of the EXPLAIN, but I think in my experience so far that is mostly because either (1) both the query and the EXPLAIN output are super long and maybe also super-wide and therefore it's hard to correlate things by eye or (2) somebody wrote a query where they use the same table and/or table alias over and over again in different parts of the query and so it's hard to tell which reference goes with which. Neither of those problems seems all that exciting to me from a dev perspective: if you're calling everything a or x or orders or something, maybe don't do that, and if your query is 1500 characters long, I guess you need to budget some time to align that with the query plan. I don't really know how much we can do here. But maybe there are cases that I haven't seen where something better is possible, or perhaps you have some good idea that I haven't considered. (If I'm honest, I do have an idea that I think might very significantly improve the readability of EXPLAIN output. I think it would make it much less wide in normal cases without making it much longer. This has been percolating in my brain for a few years now and I have the vague intention of proposing it at some point, but not until I'm good and ready to be flamed to a well-done crisp, because I'm quite sure there will be more than one opinion on the merits.) -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, May 21, 2025 at 7:29 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, May 20, 2025 at 3:09 PM Maciek Sakrejda <maciek@pganalyze.com> wrote: > > +1, this seems like it could be very useful. A somewhat related issue > > is being able to tie plan nodes back to the query text: it can be hard > > to understand the planner's decisions if it's not even clear what part > > of the query it's making decisions about. I'm sure this is not an easy > > problem in general, but I wonder if you think that could be improved > > in the course of this work, or if you have other thoughts about it. > > Thanks. I don't really have any ideas about the problem you mention, > perhaps partly because I haven't experienced it too much. That may be due to your extensive experience with Postgres and EXPLAIN plans. > I mean, I > have sometimes been confused about which parts of the query go with > which parts of the EXPLAIN, but I think in my experience so far that > is mostly because either (1) both the query and the EXPLAIN output are > super long and maybe also super-wide and therefore it's hard to > correlate things by eye or (2) somebody wrote a query where they use > the same table and/or table alias over and over again in different > parts of the query and so it's hard to tell which reference goes with > which. Neither of those problems seems all that exciting to me from a > dev perspective: if you're calling everything a or x or orders or > something, maybe don't do that, and if your query is 1500 characters > long, I guess you need to budget some time to align that with the > query plan. Fair enough, although the people trying to make sense of EXPLAIN plans are sometimes not the same ones who are writing the queries. And sometimes the queries are not written by people at all but by ORMs (or—heaven help us—vibe coded). "Don't do X" is a reasonable response to "It hurts when I do X," but it doesn't really solve the user's problem. That said, it's hard to argue with "We don't have any good ideas on how to improve this right now, and it's not a total dumpster fire, so we'll focus on other work." > I don't really know how much we can do here. But maybe > there are cases that I haven't seen where something better is > possible, or perhaps you have some good idea that I haven't > considered. No great ideas here. I thought initially that a good solution would be to have structured EXPLAIN output include something like "Query Text Start Index" and "Query Text End Index" fields for each node, but I realized that this doesn't really work for multiple joins (and probably other cases). Maybe "Query Text Indices", as a list of pairs? But from the little I know about the planner, that seems like any sort of tracking back to the source would be hard to implement. And it only really solves the problem for external EXPLAIN viewers, and only ones that put in the work to support this. I'm not sure if the problem can be meaningfully addressed for text format, but maybe that's another reason not to spend time on it in core. > (If I'm honest, I do have an idea that I think might very > significantly improve the readability of EXPLAIN output. I think it > would make it much less wide in normal cases without making it much > longer. This has been percolating in my brain for a few years now and > I have the vague intention of proposing it at some point, but not > until I'm good and ready to be flamed to a well-done crisp, because > I'm quite sure there will be more than one opinion on the merits.) I'm intrigued, and happy to stand by with an extinguisher. The road to great ideas is paved with bad ideas. Thanks, Maciek
On Wed, May 21, 2025 at 12:03 PM Maciek Sakrejda <m.sakrejda@gmail.com> wrote: > That may be due to your extensive experience with Postgres and EXPLAIN plans. Yes, that is very possible. All things being equal, it helps to have done something a lot of times. > Fair enough, although the people trying to make sense of EXPLAIN plans > are sometimes not the same ones who are writing the queries. And > sometimes the queries are not written by people at all but by ORMs > (or—heaven help us—vibe coded). "Don't do X" is a reasonable response > to "It hurts when I do X," but it doesn't really solve the user's > problem. That said, it's hard to argue with "We don't have any good > ideas on how to improve this right now, and it's not a total dumpster > fire, so we'll focus on other work." +1 to all of that. > No great ideas here. I thought initially that a good solution would be > to have structured EXPLAIN output include something like "Query Text > Start Index" and "Query Text End Index" fields for each node, but I > realized that this doesn't really work for multiple joins (and > probably other cases). Maybe "Query Text Indices", as a list of pairs? > But from the little I know about the planner, that seems like any sort > of tracking back to the source would be hard to implement. And it only > really solves the problem for external EXPLAIN viewers, and only ones > that put in the work to support this. I'm not sure if the problem can > be meaningfully addressed for text format, but maybe that's another > reason not to spend time on it in core. I'm not gonna say you couldn't make something like that work, but it sounds like a lot of effort for a hypothetical piece of external visualization software that might or might not produce satisfying results. My advice to anyone wanting to pursue this idea would be: make a totally fake POC first. Get a sample query with at least a moderately complex plan, get the EXPLAIN output, manually generate whatever data you think PostgreSQL ought to be able to spit out, and do a mock-up of an external viewer. When you're happy with the results, show it to some other people and see if they also like it. We can have the discussion about whether to include anything in core and what it should be after that. I definitely would not rule out the possibility that something like this could turn out to be really cool -- maybe hovering over stuff and having the corresponding part of the plan get highlighted will turn out to be awesome. But I think it might also turn out that there are things where it's not quite clear what you can or should usefully highlight, like target-list items, or for example a case where the query says that a.x = b.x and b.x = c.x but in the actual plan we use evaluate a.x = c.x, an expression not appearing anywhere in the query text. The legwork of sorting some of that kind of stuff out should really happen before making a feature proposal. > I'm intrigued, and happy to stand by with an extinguisher. The road to > great ideas is paved with bad ideas. Thanks. That proposal is a task for another day, but I appreciate the sentiment. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, May 21, 2025 at 7:29 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, May 20, 2025 at 2:45 PM Tomas Vondra <tomas@vondra.me> wrote: > I have a sense - possibly an incorrect one - that the core of the > problem here is that the planner considers lots of very similar > alternatives. A hypothetical feature that showed the second-cheapest > plan would be all but useless, because the second-cheapest plan would > just be a very minor variation of the cheapest plan in almost all > cases. One idea that crossed my mind was to display information in > EXPLAIN about what would have happened if we'd done something really > different. For instance, suppose that at a certain level of the plan > tree we actually chose a merge join, but we also show the estimated > cost of the cheapest hash join (if any) and the cheapest nested loop > (if any) that we considered at that level. The user might be able to > draw useful conclusions based on whether those numbers were altogether > absent (i.e. that join type was not viable at all) or whether the cost > was a little higher or a lot higher than that of the path actually > chosen. For scans, you could list which indexes were believed to be > usable and perhaps what the cost would have been for the cheapest one > not actually selected; and what the cost of a sequential scan would > have been if you hadn't picked one. > > I'm not sure how useful this would be, so the whole idea might > actually suck, or maybe it's sort of the right idea but needs a bunch > of refinement to really be useful. I don't have a better idea right > now, though. Having detailed information on the costs of alternative join methods/scan method, even when a different method is chosen, would be valuable information. For example, if a merge join is selected for tables t1 and t2 in a subquery, showing the estimated costs for both a hash join and a nested loop join would provide a more complete picture of the planner's decision-making process. And I believe, this information would be particularly useful if the cost of a non-selected plan, such as a nested loop join, is very close to the cost of the chosen merge join. In such cases, a database administrator or query optimizer could use this insight to manually override the planner's choice and opt for the nested loop join for specific tables in a subquery. This level of detail would empower users to fine-tune query performance and explore alternative execution strategies. IIUC, one of the goal of this work is where operator can say I want to use this scan method while scanning a particular table in a particular subquery, that means if the planner can give the information about non selected paths as well then it would be really helpful in making this process more smooth otherwise without much information on what path got rejected its very hard to provide hints. -- Regards, Dilip Kumar Google