Обсуждение: RFC: extensible planner state
I've been working on planner extensibility for some time now, and am still not quite ready to make a full-fledged proposal, but see partially-fledged proposals here and here: http://postgr.es/m/CA+TgmoZY+baV-T-5ifDn6P=L=aV-VkVBrPmi0TQkcEq-5Finww@mail.gmail.com http://postgr.es/m/CA+TgmoZxQO8svE_vtNCkEubnCYrnrCEnhftdbkdZ496Nfhg=wQ@mail.gmail.com While trying to build out a real, working example based on those patches, I ran into the problem that it's rather difficult for multiple planner hooks to coordinate with each other. For example, you might want to do some calculation once per query, or once per RelOptnfo, and that's somewhat difficult to arrange right now. I tried having my planner hook push an item onto a state stack before calling standard_planner() and pop it afterward, and then any hooks called during planning can look at the top of the state stack. But that doesn't quite work because plan_cluster_use_sort() and plan_create_index_workers() can provide a backdoor into the planner code, allowing get_relation_info() to be called not in reference to the most recent call to planner(). My first instinct was to invent QSRC_DUMMY and have those functions use that, which as far as I can see is an adequate solution to that immediate problem, since get_relation_info() can now identify those cases cleanly. But that still requires the extension to do a lot of bookkeeping just for the privilege of storing some per-query private state, and it seems to me that you might well want to store some private state per-RelOptInfo or possibly per-PlannerInfo, which seems to require an even-more-unreasonable amount of effort. An extension might be able to spin up a hash table keyed by pointer address or maybe some identifying properties of a RelOptInfo, but I think it's going to be slow, fragile, and ugly. So what I'd like to propose instead is something along the lines of the private-ExplainState-data system: http://postgr.es/m/CA+TgmoYSzg58hPuBmei46o8D3SKX+SZoO4K_aGQGwiRzvRApLg@mail.gmail.com https://git.postgresql.org/pg/commitdiff/c65bc2e1d14a2d4daed7c1921ac518f2c5ac3d17 The attached (untested) patch shows how this could work, allowing extensible state in each PlannerGlobal, PlannerInfo, and RelOptInfo, which seem like the logical places to me. I have use cases for the first and the third at present, so the second could be omitted on suspicion of being unuseful, but I bet it isn't. As compared with c65bc2e1d14a2d4daed7c1921ac518f2c5ac3d17, I reduced the initial allocation size to 4 from 16 and made the getter functions static inline, out of the feeling that you're not likely to have more than one ExplainState and the speed of EXPLAIN doesn't matter much, but you might store and access private per-RelOptInfo state a lot of times in one query planner invocation. I'm not altogether convinced this is the right design. It seems slightly unwieldy, and having to allocate an extra array, even if a small one, for every RelOptInfo that has private state seems like it could add a noticeable amount of overhead. On the other hand, I strongly suspect that assuming that there's only ever one planner extension in operation is short-sighted. The fact that we have none right now seems to me to be evidence of the absence of infrastructure rather than the absence of demand. If that is correct then I don't quite see how to do better than this. But I'm interested in hearing what other people think. If people like this design, I will propose it here or on another thread for commit, after suitable testing and polishing. If people do not like this design, then I would like to know what alternative they would prefer. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
Вложения
Robert Haas <robertmhaas@gmail.com> writes: > But that still requires the extension to do a lot of bookkeeping just > for the privilege of storing some per-query private state, and it > seems to me that you might well want to store some private state > per-RelOptInfo or possibly per-PlannerInfo, which seems to require an > even-more-unreasonable amount of effort. An extension might be able to > spin up a hash table keyed by pointer address or maybe some > identifying properties of a RelOptInfo, but I think it's going to be > slow, fragile, and ugly. So what I'd like to propose instead is > something along the lines of the private-ExplainState-data system: > http://postgr.es/m/CA+TgmoYSzg58hPuBmei46o8D3SKX+SZoO4K_aGQGwiRzvRApLg@mail.gmail.com > https://git.postgresql.org/pg/commitdiff/c65bc2e1d14a2d4daed7c1921ac518f2c5ac3d17 This seems generally reasonable to me. I agree that it's slightly annoying to bloat every RelOptInfo with two more fields, but I don't see a better alternative to that. In any case, sizeof(RelOptInfo) is 448 right now on my dev machine; making it 464 isn't going to change anything. I wonder if we couldn't get rid of PlannerInfo.join_search_private in favor of expecting join search hooks to use this mechanism (thus, GEQO would become an in-core consumer of the mechanism). Another idea is to get rid of RelOptInfo.fdw_private, although that has a little more excuse to live in that it's reasonably clear who gets to use it, namely the FDW supporting the relation. (Too bad there's no comment explaining that.) Nitpicks: * The initial allocations of the arrays need to take more care than this about which context the arrays go into, ie it had better be planner_cxt for PlannerInfo or PlannerGlobal, and the same context the RelOptInfo is in for RelOptInfo. Otherwise you risk a mess under GEQO. * Surely, if extension_state etc is read_write_ignore, then extension_state_allocated etc had better be as well? I don't understand the rationale for preserving one without the other. regards, tom lane
On Tue, Aug 19, 2025 at 1:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > This seems generally reasonable to me. Cool. > I wonder if we couldn't get rid of PlannerInfo.join_search_private > in favor of expecting join search hooks to use this mechanism > (thus, GEQO would become an in-core consumer of the mechanism). Let me try that. > * The initial allocations of the arrays need to take > more care than this about which context the arrays go into, > ie it had better be planner_cxt for PlannerInfo or PlannerGlobal, > and the same context the RelOptInfo is in for RelOptInfo. > Otherwise you risk a mess under GEQO. It's easy to do this for PlannerInfo, but PlannerGlobal has no planner_cxt member. GetMemoryChunkContext() could be used but I'm not sure we want to spread reliance on that to more places. What's your thought? > * Surely, if extension_state etc is read_write_ignore, then > extension_state_allocated etc had better be as well? I don't > understand the rationale for preserving one without the other. I figured we can't print a void** but we can print an integer and the user might want to see it. Wrong idea? -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Aug 19, 2025 at 1:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> * The initial allocations of the arrays need to take >> more care than this about which context the arrays go into, >> ie it had better be planner_cxt for PlannerInfo or PlannerGlobal, >> and the same context the RelOptInfo is in for RelOptInfo. >> Otherwise you risk a mess under GEQO. > It's easy to do this for PlannerInfo, but PlannerGlobal has no > planner_cxt member. GetMemoryChunkContext() could be used but I'm not > sure we want to spread reliance on that to more places. What's your > thought? You'll presumably have to use GetMemoryChunkContext() for RelOptInfo, so I don't see much downside from using it in one or even both of the other cases too. >> * Surely, if extension_state etc is read_write_ignore, then >> extension_state_allocated etc had better be as well? I don't >> understand the rationale for preserving one without the other. > I figured we can't print a void** but we can print an integer and the > user might want to see it. Wrong idea? Hm. We don't have read support for these structs, so maybe it's fine. It looks weird though. regards, tom lane
On 19/8/2025 18:47, Robert Haas wrote: > polishing. If people do not like this design, then I would like to > know what alternative they would prefer.Thanks for these efforts! Generally, such an interface seems good for the extension's purposes. It is OK in this specific context because all these structures are created during the planning process. Going further into the plan, which is more stable and reusable, you would need to think about read/write rules. I utilise PlannerGlobal extensibility to notify my extension when a transformation or optimisation occurs, enabling it to initiate replanning from the top level with alternative settings if necessary. RelOptInfo extensibility serves multiple purposes, but its most notable feature is the inclusion of a node signature that enables the identification of a specific RelOptInfo instance during re-optimisation of all or only a part of the query. I haven't used PlannerInfo extensibility yet, but I think it makes sense - if an extension performs a complicated planning job that spans multiple planning stages, it makes sense to store intermediate data in this 'cache'. The weak points of this approach are: 1. Needs a new core routine for each node to be extended. 2. Doesn't propose copy/read/write node machinery. 3. Allocates more memory than needed. I frequently see installations with 5-10 modules installed. If the 9th extension employs the RelOptInfo extensibility, it would be unfortunate to see another eight elements allocated unnecessarily. What if we ever consider extending the Path node? I have been using a slightly different approach [1] for years, which involves adding a List at the end of each structure. Any extension, by convention, may add elements as an ExtensibleNode. Such an approach saves memory, resolves read/write/copy node issues and allows an extension to correctly identify its data in parallel workers and across backends (see [2] for the reasoning). This approach appears more general (though less restrictive) and can be applied to extend any node in the same way, which offers a clear benefit, because tracking query planning decisions often requires extensibility in Query, RelOptInfo, and PlannedStmt as well. Although I prefer the ExtensibleNode / extension list approach, I will be OK with your method as well, especially if you add extensibility to the PlannedStmt node too. [1] https://commitfest.postgresql.org/patch/5965/ [2] https://www.postgresql.org/message-id/aKQIeXKMifXqV58R@jrouhaud -- regards, Andrei Lepikhov
On Tue, Aug 19, 2025 at 2:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > You'll presumably have to use GetMemoryChunkContext() for RelOptInfo, > so I don't see much downside from using it in one or even both of the > other cases too. Pointer dereference must be faster than a function call. > Hm. We don't have read support for these structs, so maybe it's fine. > It looks weird though. Left this one as-is for now. Here's v2. 0001 is what you saw before with an attempt to fix the memory context handling. 0002 removes join_search_private. All I've tested is that the tests pass. -- Robert Haas EDB: http://www.enterprisedb.com
Вложения
On Wed, Aug 20, 2025 at 3:13 PM Robert Haas <robertmhaas@gmail.com> wrote: > Here's v2. 0001 is what you saw before with an attempt to fix the > memory context handling. 0002 removes join_search_private. All I've > tested is that the tests pass. Here's v3 with a few more patches. I'm now fairly confident I have the basic approach correct here, but let's see what others think. 0001 is the core "private state" patch for PlannerGlobal, PlannerInfo, and RelOptInfo. It is unchanged since v2, and contains only the fix for memory context handling since v1. However, I've now tested it, and I think it's OK to commit, barring further review comments. 0002 removes join_search_private, as before. Whether it makes sense to go ahead with this is debatable. Needs review, and needs an opinion on whether this should be considered a PoC only (and discarded) or something that should go forward to commit. 0003 adds two new planner hooks. In experimenting with 0001, I discovered that it was a little hard to use. PlannerGlobal has to do with what happens in a whole planning cycle, but the only hook we have that's in approximately the right place is planner_hook, and it can't see the PlannerGlobal object. So, I added these hooks. The first fires after PlannerGlobal is fully initialized and before we start using it, and the second fires just before we throw PlannerGlobal away. I considered some other approaches, specifically: (1) making subquery_planner a hook, (2) making grouping_planner a hook, and (3) doing as the patch does but with the call before rather than after assembling the PlannedStmt. Those proved inferior; the hook at the very end of planner() just before we discard the PlannerGlobal object appears quite valuable to me. Needs review. 0004 adds an extension_state member to PlannedStmt. Unlike the stuff added by 0001, whatever goes into a PlannedStmt has to be a node tree. The proposed usage convention is noted in the comment. There was previous discussion of this kind of thing in http://postgr.es/m/CA+TgmobrkCquFovDMZKRZ9cYQHnrS9sPE98aK0g2A=N1HFk3yQ@mail.gmail.com and the messages leading up to it and I now believe this is exactly the right way to enable what we were talking about over there: give a plugin a chance to propagate whatever it likes from the PlannerGlobal (including extension state) into the PlannedStmt, and then you can use EXPLAIN hooks to print that stuff out -- or use it from anywhere that has access to the PlannedStmt. Again, needs review. 0005 is a demo, not for commit, just to show how these pieces fit together. It uses the hooks from 0001 to count the number of times set_join_pathlist_hook is called and the number of those that are for distinct joinrels. Then it uses planner_shutdown_hook to propagate that into the PlannedStmt, and makes EXPLAIN (DEBUG) print those values out. I think there are far more interesting bits of information that could be preserved and propagated using this infrastructure, though some of them probably also require other changes to make it all work. But this is a simple example to show that the concept is valid even without anything else. For another example of how these patches could be used, see http://postgr.es/m/CA+TgmoZ=6jJi9TGyZCm33vads46HFkyz6Aju_saLT6GFS-iFug@mail.gmail.com and in particular 0001 and 0002. This patch set's planner_setup_hook call would go write after those patches compute default_ssa_mask and default_jsa_mask, allowing the hook to override those values. That's not necessarily the very most interesting thing in the whole world, because the real power of those patches is about manipulating ssa_mask at the per-rel level and jsa_mask at the per-call-to-add_paths_to_joinrel level; setting them for an entire query isn't much better than we ca already do now by frobbing GUCs. But it is a little better, because it allows automatically adjusting the masks on a per-planner-invocation basis without regard to the prevailing GUC values, so you could e.g. decide that whenever the query ID has value X, we automatically set jsa_mask or ssa_mask to value Y. Perhaps more interestingly, I think that planner_setup_hook will prove to be the right place to set up a data structure at the PlannerGlobal level that can be accessed by calls to get_relation_info_hook and others to decide how to these masks should be configured for each RelOptInfo. I guess my point here is that I know this patch set (and the others I've posted) seem a little thin in isolation, but the value starts to compound when you think about them together. That's not to say that I've got everything figured out here, only that I'd request that nobody be too quick to dismiss any of these changes because they don't do enough. The planner is extremely low on extension-author-friendly infrastructure, and no single patch can or should try to solve that problem completely. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
Вложения
- v3-0001-Allow-private-state-in-certain-planner-data-struc.patch
- v3-0002-Remove-PlannerInfo-s-join_search_private-method.patch
- v3-0005-not-for-commit-count-distinct-joinrels-and-joinre.patch
- v3-0003-Add-planner_setup_hook-and-planner_shutdown_hook.patch
- v3-0004-Add-extension_state-member-to-PlannedStmt.patch
On 25/8/2025 21:46, Robert Haas wrote: > On Wed, Aug 20, 2025 at 3:13 PM Robert Haas <robertmhaas@gmail.com> wrote: >> Here's v2. 0001 is what you saw before with an attempt to fix the >> memory context handling. 0002 removes join_search_private. All I've >> tested is that the tests pass. > > Here's v3 with a few more patches. I'm now fairly confident I have the > basic approach correct here, but let's see what others think. > > 0001 is the core "private state" patch for PlannerGlobal, PlannerInfo, > and RelOptInfo. It is unchanged since v2, and contains only the fix > for memory context handling since v1. However, I've now tested it, and > I think it's OK to commit, barring further review comments. Reading this patch, I didn't find reasoning for the two decisions: 1. Why is it necessary to maintain the GetExplainExtensionId and GetPlannerExtensionId routines? It seems that using a single extension_id (related to the order of the library inside the file_scanner) is more transparent and more straightforward if necessary. 2. Why does the extensibility approach in 0001 differ from that in 0004? I can imagine it is all about limiting extensions, but anyway, a module has access to PlannerInfo, PlannerGlobal, etc. So, this machinery looks a little redundant, doesn't it? > 0003 adds two new planner hooks. In experimenting with 0001, I > discovered that it was a little hard to use. PlannerGlobal has to do > with what happens in a whole planning cycle, but the only hook we have > that's in approximately the right place is planner_hook, and it can't > see the PlannerGlobal object. So, I added these hooks. The first fires > after PlannerGlobal is fully initialized and before we start using it, > and the second fires just before we throw PlannerGlobal away. I > considered some other approaches, specifically: (1) making > subquery_planner a hook, (2) making grouping_planner a hook, and (3) > doing as the patch does but with the call before rather than after > assembling the PlannedStmt. Those proved inferior; the hook at the > very end of planner() just before we discard the PlannerGlobal object > appears quite valuable to me. Needs review.These hooks look contradictory to me. If we store data inside a RelOptInfo, it will be challenging to match this RelOptInfo with specific Plan node(s) in the shutdown hook. That's why I prefer to use create_plan_hook, which may also utilise PlannerGlobal and store the extension's data within the plan. I support the subquery_planner hook idea because each subplan represents a separate planning space, and it can be challenging to distinguish between two similar subplans that exist at the same query level. -- Andrei Lepikhov -- regards, Andrei Lepikhov
On Mon, Aug 25, 2025 at 3:46 PM Robert Haas <robertmhaas@gmail.com> wrote: > 0005 is a demo, not for commit, just to show how these pieces fit > together. It uses the hooks from 0001 to count the number of times > set_join_pathlist_hook is called and the number of those that are for > distinct joinrels. Then it uses planner_shutdown_hook to propagate > that into the PlannedStmt, and makes EXPLAIN (DEBUG) print those > values out. I think there are far more interesting bits of information > that could be preserved and propagated using this infrastructure, > though some of them probably also require other changes to make it all > work. But this is a simple example to show that the concept is valid > even without anything else. While mulling this over, I realized that this only works if you don't mind propagating information into the final plan regardless without knowing whether or not EXPLAIN was actually used. That's pretty sad, because whatever you want to propagate into the final plan has to be a node tree, and you probably had the data in some more digestible form during planning, and so now you have to go do a whole bunch of work to convert it into a format that the node infrastructure can digest and most of the time that's going to be useless. Now, as far as I can see that's not really an argument against anything in the patch set, which I'm still hoping someone will review, but it's probably a good argument that something more is needed. The simplest idea that comes to mind for me is to make pg_plan_query() take an ExplainState * argument and pass it through to planner(). Non-EXPLAIN callers can pass NULL, and planner extensions can get control in via planner_hook() and GetExplainExtensionState() on any provided ExplainState to figure out what they want to do. However, a hole in this plan is the case where we call ExplainExecuteQuery(). In that case, and I believe only that case, pg_plan_query() is not called; instead, we fetch a plan from the plan cache, and that may be an already-existing plan, in which case there's no option to retroactively go back in time and save more or different information. I don't really know what to do about that. We could ignore that case and let extensions that work in this way document this as a caveat, or we could try to force GetCachedPlan() to re-plan if an ExplainState is provided, or maybe there's some other option. -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Aug 26, 2025 at 4:58 AM Andrei Lepikhov <lepihov@gmail.com> wrote: > > 1. Why is it necessary to maintain the GetExplainExtensionId and > GetPlannerExtensionId routines? It seems that using a single > extension_id (related to the order of the library inside the > file_scanner) is more transparent and more straightforward if necessary. But this wouldn't work for in-core use cases like GEQO, right? Also, how would it work if there are multiple "extensions" in the same .so file? > 2. Why does the extensibility approach in 0001 differ from that in 0004? > I can imagine it is all about limiting extensions, but anyway, a module > has access to PlannerInfo, PlannerGlobal, etc. So, this machinery looks > a little redundant, doesn't it? What do you mean that the extensibility approach differs? Like that the type of extension_state is different? - Melanie
On Mon, Aug 25, 2025 at 3:47 PM Robert Haas <robertmhaas@gmail.com> wrote: > > 0001 is the core "private state" patch for PlannerGlobal, PlannerInfo, > and RelOptInfo. It is unchanged since v2, and contains only the fix > for memory context handling since v1. However, I've now tested it, and > I think it's OK to commit, barring further review comments. A few nits on 0001 > From 1aa43c063edb325548fa3db30b9991bf0831f6f5 Mon Sep 17 00:00:00 2001 > From: Robert Haas <rhaas@postgresql.org> > Date: Mon, 18 Aug 2025 16:11:10 -0400 > Subject: [PATCH v3 1/5] Allow private state in certain planner data > + * extendplan.c > + * Extend core planner objects with additional private state > + * > + * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group > + * Portions Copyright (c) 1994-5, Regents of the University of California > + * > + * The interfaces defined in this file make it possible for loadable > + * modules to their own private state inside of key planner data You're missing a word above -- like "modules to store their own" > + * uses set_join_pathlist_hook can arrange to compute a key intermediate > + * result once per joinrel rather than on every call. > + * > + * IDENTIFICATION > + * src/backend/commands/extendplan.c This path does not reflect where you put the file > + * > +int > +GetPlannerExtensionId(const char *extension_name) > +{ <--snip--> > + > + /* If there's an array but it's currently full, expand it. */ > + if (PlannerExtensionNamesAssigned >= PlannerExtensionNamesAllocated) > + { > + int i = pg_nextpower2_32(PlannerExtensionNamesAssigned + 1); Storing a uint32 in a signed int that could be 32-bit stuck out to me. > + > + PlannerExtensionNameArray = (const char **) > + repalloc(PlannerExtensionNameArray, i * sizeof(char *)); > + PlannerExtensionNamesAllocated = i; > + } > + > + /* Assign and return new ID. */ > + PlannerExtensionNameArray[PlannerExtensionNamesAssigned] = extension_name; Since you don't copy the extension name, it might be worth mentioning that the caller should provide a literal or at least something that will be around later. > diff --git a/src/include/optimizer/extendplan.h b/src/include/optimizer/extendplan.h > new file mode 100644 > +extern void SetPlannerInfoExtensionState(PlannerInfo *root, int extension_id, > + void *opaque); > +extern void SetRelOptInfoExtensionState(RelOptInfo *root, int extension_id, > + void *opaque); You used a different variable name here than in the implementation for the RelOptInfo parameter. > 0002 removes join_search_private, as before. Whether it makes sense to > go ahead with this is debatable. Needs review, and needs an opinion on > whether this should be considered a PoC only (and discarded) or > something that should go forward to commit. Is there a downside to going forward with it? As for the code itself, I thought assumeReplanning was a bit vague since it seems like whether or not replanning is allowed could come up outside of join order search -- but perhaps that's okay. > For another example of how these patches could be used, see > http://postgr.es/m/CA+TgmoZ=6jJi9TGyZCm33vads46HFkyz6Aju_saLT6GFS-iFug@mail.gmail.com > and in particular 0001 and 0002. This patch set's planner_setup_hook > call would go write after those patches compute default_ssa_mask and > default_jsa_mask, allowing the hook to override those values. So, are you saying that you would rewrite the patches in that set to use the infrastructure in this set -- e.g. remove that set's PlannerGlobal.default_jsa_mask and instead put it in PlannerGlobal.extension_state? Or am I misunderstanding? - Melanie
Hmm. I don't have a copy of Andrei's email in my gmail. I see it in the archives but I have not got it. I don't understand how that happened. I now wonder if there are other emails from Andrei I haven't received. On Fri, Sep 12, 2025 at 4:34 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > On Tue, Aug 26, 2025 at 4:58 AM Andrei Lepikhov <lepihov@gmail.com> wrote: > > 1. Why is it necessary to maintain the GetExplainExtensionId and > > GetPlannerExtensionId routines? It seems that using a single > > extension_id (related to the order of the library inside the > > file_scanner) is more transparent and more straightforward if necessary. > > But this wouldn't work for in-core use cases like GEQO, right? Also, > how would it work if there are multiple "extensions" in the same .so > file? We probably don't want to all extensions on any topic to be allocating extension IDs from the same space, because it's used as a list index and we don't want to have to null-pad lists excessively. Combining the explain and planner cases wouldn't be too much of a stretch, perhaps, but it's also not really costing us anything to have separate IDs for those cases. > > 2. Why does the extensibility approach in 0001 differ from that in 0004? > > I can imagine it is all about limiting extensions, but anyway, a module > > has access to PlannerInfo, PlannerGlobal, etc. So, this machinery looks > > a little redundant, doesn't it? > > What do you mean that the extensibility approach differs? Like that > the type of extension_state is different? I suspect the question here is about why not use the index-by-planner-extension-ID approach for 0004. That could maybe work, but here everything has to be a Node, so I feel like it would be more contorted than the existing cases. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Sep 12, 2025 at 6:34 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > You're missing a word above -- like "modules to store their own" > This path does not reflect where you put the file Thanks. > Storing a uint32 in a signed int that could be 32-bit stuck out to me. "git grep pg_nextpower2_32" finds examples of assigning the result to both "int" and "uint32", and I see no practical risk here. > Since you don't copy the extension name, it might be worth mentioning > that the caller should provide a literal or at least something that > will be around later. Maybe, but there's no obvious reason for any caller to use anything other than a string literal. > You used a different variable name here than in the implementation for > the RelOptInfo parameter. Oops. > > 0002 removes join_search_private, as before. Whether it makes sense to > > go ahead with this is debatable. Needs review, and needs an opinion on > > whether this should be considered a PoC only (and discarded) or > > something that should go forward to commit. > > Is there a downside to going forward with it? I think it's just a stylistic preference, whether people like it this way better or not. > As for the code itself, I thought assumeReplanning was a bit vague > since it seems like whether or not replanning is allowed could come up > outside of join order search -- but perhaps that's okay. Yeah, there is room for bikeshedding that name. > > For another example of how these patches could be used, see > > http://postgr.es/m/CA+TgmoZ=6jJi9TGyZCm33vads46HFkyz6Aju_saLT6GFS-iFug@mail.gmail.com > > and in particular 0001 and 0002. This patch set's planner_setup_hook > > call would go write after those patches compute default_ssa_mask and > > default_jsa_mask, allowing the hook to override those values. > > So, are you saying that you would rewrite the patches in that set to > use the infrastructure in this set -- e.g. remove that set's > PlannerGlobal.default_jsa_mask and instead put it in > PlannerGlobal.extension_state? Or am I misunderstanding? No, that's not what I'm saying. What I'm saying is that with both patches applied, planner_setup_hook() from this patch ends up getting called right after default_jsa_mask is set, so another thing this hook can do is adjust that value. Or, for example, you can write a patch that uses this infrastructure to associate state with each RelOptInfo, and then you can use that state to decide how to set jsa_mask in join_path_setup_hook. In other words, it's easier to make effective use of those patches if you have the infrastructure provided by these patches. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Sep 5, 2025 at 4:19 PM Robert Haas <robertmhaas@gmail.com> wrote: > While mulling this over, I realized that this only works if you don't > mind propagating information into the final plan regardless without > knowing whether or not EXPLAIN was actually used. That's pretty sad, [...] > The simplest idea that comes to mind for me is to make pg_plan_query() > take an ExplainState * argument and pass it through to planner(). Here's a new version that implements this idea and also cleans up a few points that Melanie noted. -- Robert Haas EDB: http://www.enterprisedb.com
Вложения
- v4-0003-Add-ExplainState-argument-to-pg_plan_query-and-pl.patch
- v4-0002-Remove-PlannerInfo-s-join_search_private-method.patch
- v4-0004-Add-planner_setup_hook-and-planner_shutdown_hook.patch
- v4-0001-Allow-private-state-in-certain-planner-data-struc.patch
- v4-0005-Add-extension_state-member-to-PlannedStmt.patch
- v4-0006-not-for-commit-count-distinct-joinrels-and-joinre.patch
On 13/9/2025 02:16, Robert Haas wrote: > On Fri, Sep 12, 2025 at 4:34 PM Melanie Plageman > <melanieplageman@gmail.com> wrote: >> On Tue, Aug 26, 2025 at 4:58 AM Andrei Lepikhov <lepihov@gmail.com> wrote: >>> 1. Why is it necessary to maintain the GetExplainExtensionId and >>> GetPlannerExtensionId routines? It seems that using a single >>> extension_id (related to the order of the library inside the >>> file_scanner) is more transparent and more straightforward if necessary. >> >> But this wouldn't work for in-core use cases like GEQO, right? Also, >> how would it work if there are multiple "extensions" in the same .so >> file? As I see, the core has never utilised extensibility machinery and implemented separate fields/hooks for personal needs (FDW is a suitable example). And I think there are reasons for that. Not the last one, I guess, security issues. I have never seen cases with multiple extensions in the same module. I wonder what the reason is for doing this and why the core should support it?> > We probably don't want to all extensions on any topic to be allocating > extension IDs from the same space, because it's used as a list index > and we don't want to have to null-pad lists excessively. Combining the > explain and planner cases wouldn't be too much of a stretch, perhaps, > but it's also not really costing us anything to have separate IDs for > those cases. Yes, but it costs extension developers to complicate the code. Considering that extensions, implementing planner tricks usually want to show the user (on an EXPLAIN request) how they impacted the query plan, I guess it makes sense to suggest the same ID. But I still vote against extension_id in the planner. The main reason for me to let go such a solution in EXPLAIN was the 'settings' option, because extensions may fight for a 'nice' name. But each extension has a native ID - its personal name, isn't it?> >>> 2. Why does the extensibility approach in 0001 differ from that in 0004? >>> I can imagine it is all about limiting extensions, but anyway, a module >>> has access to PlannerInfo, PlannerGlobal, etc. So, this machinery looks >>> a little redundant, doesn't it? >> >> What do you mean that the extensibility approach differs? Like that >> the type of extension_state is different? PlannedStmt in 0004 has an extension list that should contain DefElem nodes. However, the optimiser nodes use a different approach: the extension developer must operate with an ID allocation. I propose using a unified way for extensibility - a list with DefElem, uniquely identified by module name, looks much more flexible.> > I suspect the question here is about why not use the > index-by-planner-extension-ID approach for 0004. That could maybe > work, but here everything has to be a Node, so I feel like it would be > more contorted than the existing cases. I tried to highlight here that 0004 looks better and more universal than 0001. -- regards, Andrei Lepikhov
On Tue, Sep 16, 2025 at 4:12 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Sep 5, 2025 at 4:19 PM Robert Haas <robertmhaas@gmail.com> wrote: > > While mulling this over, I realized that this only works if you don't > > mind propagating information into the final plan regardless without > > knowing whether or not EXPLAIN was actually used. That's pretty sad, > [...] > > The simplest idea that comes to mind for me is to make pg_plan_query() > > take an ExplainState * argument and pass it through to planner(). > > Here's a new version that implements this idea and also cleans up a > few points that Melanie noted. I think that was rebased over a patch I inadvertently committed to my local master branch. Trying again. -- Robert Haas EDB: http://www.enterprisedb.com
Вложения
- v5-0001-Allow-private-state-in-certain-planner-data-struc.patch
- v5-0005-Add-extension_state-member-to-PlannedStmt.patch
- v5-0002-Remove-PlannerInfo-s-join_search_private-method.patch
- v5-0004-Add-planner_setup_hook-and-planner_shutdown_hook.patch
- v5-0003-Add-ExplainState-argument-to-pg_plan_query-and-pl.patch
- v5-0006-not-for-commit-count-distinct-joinrels-and-joinre.patch
Robert Haas <robertmhaas@gmail.com> writes: > I think that was rebased over a patch I inadvertently committed to my > local master branch. Trying again. I looked through the v5 patchset. 0001: The allocation logic in Set*ExtensionState fails to guarantee that it's made the array(s) big enough to hold the passed ID, which would be problematic in the face of lots of extensions. I think this'd be enough to fix it: - i = pg_nextpower2_32(glob->extension_state_allocated + 1); + i = pg_nextpower2_32(extension_id + 1); But maybe you should also reconsider whether blindly starting the array sizes at 4, rather than say Max(4, extension_id + 1), is good. Now that I look, SetExplainExtensionState also has these issues. Also a couple nitpicks: * alphabetization fail in util/Makefile * utils/palloc.h is already included by postgres.h 0002: LGTM 0003: In the wake of 70407d39b, you should avoid "struct ExplainState" in favor of using duplicate typedefs. Also, although you're not the first sinner, I really think this new argument to planner() should be documented. Maybe the rest too while we're at it. 0004: maybe the planner_shutdown_hook should be called before DestroyPartitionDirectory? It's not entirely clear whether the hook might like to look at that. Also "struct ExplainState" again. 0005: okay regards, tom lane
On Mon, Sep 22, 2025 at 3:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I looked through the v5 patchset. Thanks! Here's a new set of patches. I've added a new 0001 at the beginning to fix the bug you identified in SetExplainExtensionState; this will need to be back-patched to v18 once the release freeze lifts. I've also adjusted the previous patch, now 0002, to fix the equivalent problem with the new Set*ExplainState functions,and I've attempted to fix the other problems you mentioned, with this exception: > Also, although you're not the first sinner, I really think > this new argument to planner() should be documented. Maybe > the rest too while we're at it. I'm a little nervous about this. I fear that the comments are all going to be of the form "to save a file, click the File menu, then click Save," which doesn't actually help anyone. That might be a slight exaggeration, but I feel like it's pretty obvious on visual inspection that Query *parse is what we're planning, char *query_string is the text version of that, cursorOptions is some kind of flag mask, boundParams is the parameter values. It's fair to say that ExplainState *es is a little less obvious than the others, but saying "If this function is being invoked by EXPLAIN, then ExplainState *es is the ExplainState, else it is NULL" doesn't really seem all that helpful to me. I mean, what else would it be? My thought here would be that if you want to write some comments that you consider helpful for the existing arguments, I'll try to write a new comment for this one in the same style (or you can suggest one) and hold my nose if I don't find it helpful, or alternatively, we could proceed with these patches without the comment and you can add whatever comment text you want after-the-fact. If you want the comment changes in this patch set, then I need a suggestion as to what that should look like. -- Robert Haas EDB: http://www.enterprisedb.com
Вложения
- v6-0001-Fix-array-allocation-bugs-in-SetExplainExtensionS.patch
- v6-0002-Allow-private-state-in-certain-planner-data-struc.patch
- v6-0003-Remove-PlannerInfo-s-join_search_private-method.patch
- v6-0004-Add-ExplainState-argument-to-pg_plan_query-and-pl.patch
- v6-0005-Add-planner_setup_hook-and-planner_shutdown_hook.patch
- v6-0006-Add-extension_state-member-to-PlannedStmt.patch
- v6-0007-not-for-commit-count-distinct-joinrels-and-joinre.patch
Robert Haas <robertmhaas@gmail.com> writes: > Here's a new set of patches. I've added a new 0001 at the beginning to > fix the bug you identified in SetExplainExtensionState; this will need > to be back-patched to v18 once the release freeze lifts. I'm good with 0001, and the release freeze is over, so push that whenever you like. I'll try to look at the rest soon. regards, tom lane
On Wed, Sep 24, 2025 at 12:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm good with 0001, and the release freeze is over, so push that > whenever you like. I'll try to look at the rest soon. Cool... except I thought that the release freeze wouldn't lift until we release, which I thought was tomorrow? -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > Cool... except I thought that the release freeze wouldn't lift until > we release, which I thought was tomorrow? Nope, the freeze is over as soon as the git tag for the release is pushed, cf https://wiki.postgresql.org/wiki/Committing_checklist#Release_freezes We'd unfreeze when the stamping commit is made, except that we want some breathing room for a re-wrap if a packager discovers a critical problem. We give them 24 hours to report that. (What happens if a critical problem is discovered shortly later? We'd probably use a new minor release number in that case.) regards, tom lane
On Wed, Sep 24, 2025 at 12:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm good with 0001, and the release freeze is over, so push that > whenever you like. I'll try to look at the rest soon. Done now. Here's a rebase of the rest, plus I tweaked the GEQO patch to try to avoid a compiler warning that cfbot was complaining about. -- Robert Haas EDB: http://www.enterprisedb.com
Вложения
- v7-0004-Add-planner_setup_hook-and-planner_shutdown_hook.patch
- v7-0001-Allow-private-state-in-certain-planner-data-struc.patch
- v7-0002-Remove-PlannerInfo-s-join_search_private-method.patch
- v7-0003-Add-ExplainState-argument-to-pg_plan_query-and-pl.patch
- v7-0005-Add-extension_state-member-to-PlannedStmt.patch
- v7-0006-not-for-commit-count-distinct-joinrels-and-joinre.patch
Robert Haas <robertmhaas@gmail.com> writes: > Done now. Here's a rebase of the rest, plus I tweaked the GEQO patch > to try to avoid a compiler warning that cfbot was complaining about. I'm good with the v7 patch set, except for the complaint I raised previously that we really ought to have more than zero documentation for planner()'s parameters. If you don't care to write such text, attached is a cut at it. regards, tom lane diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 41bd8353430..c249f34eb8e 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -280,6 +280,23 @@ static void create_partial_unique_paths(PlannerInfo *root, RelOptInfo *input_rel * * Query optimizer entry point * + * Inputs: + * parse: an analyzed-and-rewritten query tree for an optimizable statement + * query_string: source text for the query tree (used for error reports) + * cursorOptions: bitmask of CURSOR_OPT_XXX flags, see parsenodes.h + * boundParams: passed-in parameter values, or NULL if none + * es: ExplainState if being called from EXPLAIN, else NULL + * + * The result is a PlannedStmt tree. + * + * PARAM_EXTERN Param nodes within the parse tree can be replaced by Consts + * using values from boundParams, if those values are marked PARAM_FLAG_CONST. + * Parameter values not so marked are still relied on for estimation purposes. + * + * The ExplainState pointer is not currently used by the core planner, but it + * is passed through to some planner hooks so that they can report information + * back to EXPLAIN extension hooks. + * * To support loadable plugins that monitor or modify planner behavior, * we provide a hook variable that lets a plugin get control before and * after the standard planning process. The plugin would normally call
On Sun, Sep 28, 2025 at 11:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > Done now. Here's a rebase of the rest, plus I tweaked the GEQO patch > > to try to avoid a compiler warning that cfbot was complaining about. > > I'm good with the v7 patch set, except for the complaint I raised > previously that we really ought to have more than zero documentation > for planner()'s parameters. If you don't care to write such text, > attached is a cut at it. Oh, nice, thanks! I'm going to be on vacation the rest of this week so I plan to deal with this next week. However, if you feel like committing it before then, please feel free. -- Robert Haas EDB: http://www.enterprisedb.com