Обсуждение: RFC: extensible planner state

Поиск

Список

Период

Сортировка

RFC: extensible planner state

От

Robert Haas

Дата:

19 августа, 19:47:34

I've been working on planner extensibility for some time now, and am
still not quite ready to make a full-fledged proposal, but see
partially-fledged proposals here and here:

http://postgr.es/m/CA+TgmoZY+baV-T-5ifDn6P=L=aV-VkVBrPmi0TQkcEq-5Finww@mail.gmail.com
http://postgr.es/m/CA+TgmoZxQO8svE_vtNCkEubnCYrnrCEnhftdbkdZ496Nfhg=wQ@mail.gmail.com

While trying to build out a real, working example based on those
patches, I ran into the problem that it's rather difficult for
multiple planner hooks to coordinate with each other. For example, you
might want to do some calculation once per query, or once per
RelOptnfo, and that's somewhat difficult to arrange right now. I tried
having my planner hook push an item onto a state stack before calling
standard_planner() and pop it afterward, and then any hooks called
during planning can look at the top of the state stack. But that
doesn't quite work because plan_cluster_use_sort() and
plan_create_index_workers() can provide a backdoor into the planner
code, allowing get_relation_info() to be called not in reference to
the most recent call to planner(). My first instinct was to invent
QSRC_DUMMY and have those functions use that, which as far as I can
see is an adequate solution to that immediate problem, since
get_relation_info() can now identify those cases cleanly.

But that still requires the extension to do a lot of bookkeeping just
for the privilege of storing some per-query private state, and it
seems to me that you might well want to store some private state
per-RelOptInfo or possibly per-PlannerInfo, which seems to require an
even-more-unreasonable amount of effort. An extension might be able to
spin up a hash table keyed by pointer address or maybe some
identifying properties of a RelOptInfo, but I think it's going to be
slow, fragile, and ugly. So what I'd like to propose instead is
something along the lines of the private-ExplainState-data system:

http://postgr.es/m/CA+TgmoYSzg58hPuBmei46o8D3SKX+SZoO4K_aGQGwiRzvRApLg@mail.gmail.com
https://git.postgresql.org/pg/commitdiff/c65bc2e1d14a2d4daed7c1921ac518f2c5ac3d17

The attached (untested) patch shows how this could work, allowing
extensible state in each PlannerGlobal, PlannerInfo, and RelOptInfo,
which seem like the logical places to me. I have use cases for the
first and the third at present, so the second could be omitted on
suspicion of being unuseful, but I bet it isn't. As compared with
c65bc2e1d14a2d4daed7c1921ac518f2c5ac3d17, I reduced the initial
allocation size to 4 from 16 and made the getter functions static
inline, out of the feeling that you're not likely to have more than
one ExplainState and the speed of EXPLAIN doesn't matter much, but you
might store and access private per-RelOptInfo state a lot of times in
one query planner invocation.

I'm not altogether convinced this is the right design. It seems
slightly unwieldy, and having to allocate an extra array, even if a
small one, for every RelOptInfo that has private state seems like it
could add a noticeable amount of overhead. On the other hand, I
strongly suspect that assuming that there's only ever one planner
extension in operation is short-sighted. The fact that we have none
right now seems to me to be evidence of the absence of infrastructure
rather than the absence of demand. If that is correct then I don't
quite see how to do better than this. But I'm interested in hearing
what other people think. If people like this design, I will propose it
here or on another thread for commit, after suitable testing and
polishing. If people do not like this design, then I would like to
know what alternative they would prefer.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

v1-0001-Allow-private-state-in-certain-planner-data-struc.patch

Re: RFC: extensible planner state

От

Tom Lane

Дата:

19 августа, 20:18:02

Robert Haas <robertmhaas@gmail.com> writes:
> But that still requires the extension to do a lot of bookkeeping just
> for the privilege of storing some per-query private state, and it
> seems to me that you might well want to store some private state
> per-RelOptInfo or possibly per-PlannerInfo, which seems to require an
> even-more-unreasonable amount of effort. An extension might be able to
> spin up a hash table keyed by pointer address or maybe some
> identifying properties of a RelOptInfo, but I think it's going to be
> slow, fragile, and ugly. So what I'd like to propose instead is
> something along the lines of the private-ExplainState-data system:
> http://postgr.es/m/CA+TgmoYSzg58hPuBmei46o8D3SKX+SZoO4K_aGQGwiRzvRApLg@mail.gmail.com
> https://git.postgresql.org/pg/commitdiff/c65bc2e1d14a2d4daed7c1921ac518f2c5ac3d17

This seems generally reasonable to me.  I agree that it's slightly
annoying to bloat every RelOptInfo with two more fields, but I don't
see a better alternative to that.  In any case, sizeof(RelOptInfo)
is 448 right now on my dev machine; making it 464 isn't going to
change anything.

I wonder if we couldn't get rid of PlannerInfo.join_search_private
in favor of expecting join search hooks to use this mechanism
(thus, GEQO would become an in-core consumer of the mechanism).

Another idea is to get rid of RelOptInfo.fdw_private, although
that has a little more excuse to live in that it's reasonably
clear who gets to use it, namely the FDW supporting the relation.
(Too bad there's no comment explaining that.)

Nitpicks:

* The initial allocations of the arrays need to take
more care than this about which context the arrays go into,
ie it had better be planner_cxt for PlannerInfo or PlannerGlobal,
and the same context the RelOptInfo is in for RelOptInfo.
Otherwise you risk a mess under GEQO.

* Surely, if extension_state etc is read_write_ignore, then
extension_state_allocated etc had better be as well?  I don't
understand the rationale for preserving one without the other.

            regards, tom lane

Re: RFC: extensible planner state

От

Robert Haas

Дата:

19 августа, 21:00:04

On Tue, Aug 19, 2025 at 1:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> This seems generally reasonable to me.

Cool.

> I wonder if we couldn't get rid of PlannerInfo.join_search_private
> in favor of expecting join search hooks to use this mechanism
> (thus, GEQO would become an in-core consumer of the mechanism).

Let me try that.

> * The initial allocations of the arrays need to take
> more care than this about which context the arrays go into,
> ie it had better be planner_cxt for PlannerInfo or PlannerGlobal,
> and the same context the RelOptInfo is in for RelOptInfo.
> Otherwise you risk a mess under GEQO.

It's easy to do this for PlannerInfo, but PlannerGlobal has no
planner_cxt member. GetMemoryChunkContext() could be used but I'm not
sure we want to spread reliance on that to more places. What's your
thought?

> * Surely, if extension_state etc is read_write_ignore, then
> extension_state_allocated etc had better be as well?  I don't
> understand the rationale for preserving one without the other.

I figured we can't print a void** but we can print an integer and the
user might want to see it. Wrong idea?

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: RFC: extensible planner state

От

Tom Lane

Дата:

19 августа, 21:28:03

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Aug 19, 2025 at 1:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> * The initial allocations of the arrays need to take
>> more care than this about which context the arrays go into,
>> ie it had better be planner_cxt for PlannerInfo or PlannerGlobal,
>> and the same context the RelOptInfo is in for RelOptInfo.
>> Otherwise you risk a mess under GEQO.

> It's easy to do this for PlannerInfo, but PlannerGlobal has no
> planner_cxt member. GetMemoryChunkContext() could be used but I'm not
> sure we want to spread reliance on that to more places. What's your
> thought?

You'll presumably have to use GetMemoryChunkContext() for RelOptInfo,
so I don't see much downside from using it in one or even both of the
other cases too.

>> * Surely, if extension_state etc is read_write_ignore, then
>> extension_state_allocated etc had better be as well?  I don't
>> understand the rationale for preserving one without the other.

> I figured we can't print a void** but we can print an integer and the
> user might want to see it. Wrong idea?

Hm.  We don't have read support for these structs, so maybe it's fine.
It looks weird though.

            regards, tom lane

Re: RFC: extensible planner state

От

Andrei Lepikhov

Дата:

20 августа, 10:14:29

On 19/8/2025 18:47, Robert Haas wrote:
> polishing. If people do not like this design, then I would like to
> know what alternative they would prefer.Thanks for these efforts!

Generally, such an interface seems good for the extension's purposes. It 
is OK in this specific context because all these structures are created 
during the planning process. Going further into the plan, which is more 
stable and reusable, you would need to think about read/write rules.

I utilise PlannerGlobal extensibility to notify my extension when a 
transformation or optimisation occurs, enabling it to initiate 
replanning from the top level with alternative settings if necessary. 
RelOptInfo extensibility serves multiple purposes, but its most notable 
feature is the inclusion of a node signature that enables the 
identification of a specific RelOptInfo instance during re-optimisation 
of all or only a part of the query. I haven't used PlannerInfo 
extensibility yet, but I think it makes sense - if an extension performs 
a complicated planning job that spans multiple planning stages, it makes 
sense to store intermediate data in this 'cache'.

The weak points of this approach are:
1. Needs a new core routine for each node to be extended.
2. Doesn't propose copy/read/write node machinery.
3. Allocates more memory than needed. I frequently see installations 
with 5-10 modules installed. If the 9th extension employs the RelOptInfo 
extensibility, it would be unfortunate to see another eight elements 
allocated unnecessarily. What if we ever consider extending the Path node?

I have been using a slightly different approach [1] for years, which 
involves adding a List at the end of each structure. Any extension, by 
convention, may add elements as an ExtensibleNode. Such an approach 
saves memory, resolves read/write/copy node issues and allows an 
extension to correctly identify its data in parallel workers and across 
backends (see [2] for the reasoning). This approach appears more general 
(though less restrictive) and can be applied to extend any node in the 
same way, which offers a clear benefit, because tracking query planning 
decisions often requires extensibility in Query, RelOptInfo, and 
PlannedStmt as well.

Although I prefer the ExtensibleNode / extension list approach, I will 
be OK with your method as well, especially if you add extensibility to 
the PlannedStmt node too.

[1] https://commitfest.postgresql.org/patch/5965/
[2] https://www.postgresql.org/message-id/aKQIeXKMifXqV58R@jrouhaud

-- 
regards, Andrei Lepikhov

Re: RFC: extensible planner state

От

Robert Haas

Дата:

20 августа, 22:13:51

On Tue, Aug 19, 2025 at 2:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> You'll presumably have to use GetMemoryChunkContext() for RelOptInfo,
> so I don't see much downside from using it in one or even both of the
> other cases too.

Pointer dereference must be faster than a function call.

> Hm.  We don't have read support for these structs, so maybe it's fine.
> It looks weird though.

Left this one as-is for now.

Here's v2. 0001 is what you saw before with an attempt to fix the
memory context handling. 0002 removes join_search_private. All I've
tested is that the tests pass.

--
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: RFC: extensible planner state

От

Robert Haas

Дата:

25 августа, 22:46:59

On Wed, Aug 20, 2025 at 3:13 PM Robert Haas <robertmhaas@gmail.com> wrote:
> Here's v2. 0001 is what you saw before with an attempt to fix the
> memory context handling. 0002 removes join_search_private. All I've
> tested is that the tests pass.

Here's v3 with a few more patches. I'm now fairly confident I have the
basic approach correct here, but let's see what others think.

0001 is the core "private state" patch for PlannerGlobal, PlannerInfo,
and RelOptInfo. It is unchanged since v2, and contains only the fix
for memory context handling since v1. However, I've now tested it, and
I think it's OK to commit, barring further review comments.

0002 removes join_search_private, as before. Whether it makes sense to
go ahead with this is debatable. Needs review, and needs an opinion on
whether this should be considered a PoC only (and discarded) or
something that should go forward to commit.

0003 adds two new planner hooks. In experimenting with 0001, I
discovered that it was a little hard to use. PlannerGlobal has to do
with what happens in a whole planning cycle, but the only hook we have
that's in approximately the right place is planner_hook, and it can't
see the PlannerGlobal object. So, I added these hooks. The first fires
after PlannerGlobal is fully initialized and before we start using it,
and the second fires just before we throw PlannerGlobal away. I
considered some other approaches, specifically: (1) making
subquery_planner a hook, (2) making grouping_planner a hook, and (3)
doing as the patch does but with the call before rather than after
assembling the PlannedStmt. Those proved inferior; the hook at the
very end of planner() just before we discard the PlannerGlobal object
appears quite valuable to me. Needs review.

0004 adds an extension_state member to PlannedStmt. Unlike the stuff
added by 0001, whatever goes into a PlannedStmt has to be a node tree.
The proposed usage convention is noted in the comment. There was
previous discussion of this kind of thing in
http://postgr.es/m/CA+TgmobrkCquFovDMZKRZ9cYQHnrS9sPE98aK0g2A=N1HFk3yQ@mail.gmail.com
and the messages leading up to it and I now believe this is exactly
the right way to enable what we were talking about over there: give a
plugin a chance to propagate whatever it likes from the PlannerGlobal
(including extension state) into the PlannedStmt, and then you can use
EXPLAIN hooks to print that stuff out -- or use it from anywhere that
has access to the PlannedStmt. Again, needs review.

0005 is a demo, not for commit, just to show how these pieces fit
together. It uses the hooks from 0001 to count the number of times
set_join_pathlist_hook is called and the number of those that are for
distinct joinrels. Then it uses planner_shutdown_hook to propagate
that into the PlannedStmt, and makes EXPLAIN (DEBUG) print those
values out. I think there are far more interesting bits of information
that could be preserved and propagated using this infrastructure,
though some of them probably also require other changes to make it all
work. But this is a simple example to show that the concept is valid
even without anything else.

For another example of how these patches could be used, see
http://postgr.es/m/CA+TgmoZ=6jJi9TGyZCm33vads46HFkyz6Aju_saLT6GFS-iFug@mail.gmail.com
and in particular 0001 and 0002. This patch set's planner_setup_hook
call would go write after those patches compute default_ssa_mask and
default_jsa_mask, allowing the hook to override those values. That's
not necessarily the very most interesting thing in the whole world,
because the real power of those patches is about manipulating ssa_mask
at the per-rel level and jsa_mask at the
per-call-to-add_paths_to_joinrel level; setting them for an entire
query isn't much better than we ca already do now by frobbing GUCs.
But it is a little better, because it allows automatically adjusting
the masks on a per-planner-invocation basis without regard to the
prevailing GUC values, so you could e.g. decide that whenever the
query ID has value X, we automatically set jsa_mask or ssa_mask to
value Y. Perhaps more interestingly, I think that planner_setup_hook
will prove to be the right place to set up a data structure at the
PlannerGlobal level that can be accessed by calls to
get_relation_info_hook and others to decide how to these masks should
be configured for each RelOptInfo.

I guess my point here is that I know this patch set (and the others
I've posted) seem a little thin in isolation, but the value starts to
compound when you think about them together. That's not to say that
I've got everything figured out here, only that I'd request that
nobody be too quick to dismiss any of these changes because they don't
do enough. The planner is extremely low on extension-author-friendly
infrastructure, and no single patch can or should try to solve that
problem completely.

Thanks,

--
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: RFC: extensible planner state

От

Andrei Lepikhov

Дата:

26 августа, 11:58:17

On 25/8/2025 21:46, Robert Haas wrote:
> On Wed, Aug 20, 2025 at 3:13 PM Robert Haas <robertmhaas@gmail.com> wrote:
>> Here's v2. 0001 is what you saw before with an attempt to fix the
>> memory context handling. 0002 removes join_search_private. All I've
>> tested is that the tests pass.
> 
> Here's v3 with a few more patches. I'm now fairly confident I have the
> basic approach correct here, but let's see what others think.
> 
> 0001 is the core "private state" patch for PlannerGlobal, PlannerInfo,
> and RelOptInfo. It is unchanged since v2, and contains only the fix
> for memory context handling since v1. However, I've now tested it, and
> I think it's OK to commit, barring further review comments.
Reading this patch, I didn't find reasoning for the two decisions:
1. Why is it necessary to maintain the GetExplainExtensionId and 
GetPlannerExtensionId routines? It seems that using a single 
extension_id (related to the order of the library inside the 
file_scanner) is more transparent and more straightforward if necessary.
2. Why does the extensibility approach in 0001 differ from that in 0004? 
I can imagine it is all about limiting extensions, but anyway, a module 
has access to PlannerInfo, PlannerGlobal, etc. So, this machinery looks 
a little redundant, doesn't it?
> 0003 adds two new planner hooks. In experimenting with 0001, I
> discovered that it was a little hard to use. PlannerGlobal has to do
> with what happens in a whole planning cycle, but the only hook we have
> that's in approximately the right place is planner_hook, and it can't
> see the PlannerGlobal object. So, I added these hooks. The first fires
> after PlannerGlobal is fully initialized and before we start using it,
> and the second fires just before we throw PlannerGlobal away. I
> considered some other approaches, specifically: (1) making
> subquery_planner a hook, (2) making grouping_planner a hook, and (3)
> doing as the patch does but with the call before rather than after
> assembling the PlannedStmt. Those proved inferior; the hook at the
> very end of planner() just before we discard the PlannerGlobal object
> appears quite valuable to me. Needs review.These hooks look contradictory to me. If we store data inside a 
RelOptInfo, it will be challenging to match this RelOptInfo with 
specific Plan node(s) in the shutdown hook. That's why I prefer to use 
create_plan_hook, which may also utilise PlannerGlobal and store the 
extension's data within the plan.

I support the subquery_planner hook idea because each subplan represents 
a separate planning space, and it can be challenging to distinguish 
between two similar subplans that exist at the same query level.

--
Andrei Lepikhov

-- 
regards, Andrei Lepikhov

Re: RFC: extensible planner state

От

Robert Haas

Дата:

05 сентября, 23:19:12

On Mon, Aug 25, 2025 at 3:46 PM Robert Haas <robertmhaas@gmail.com> wrote:
> 0005 is a demo, not for commit, just to show how these pieces fit
> together. It uses the hooks from 0001 to count the number of times
> set_join_pathlist_hook is called and the number of those that are for
> distinct joinrels. Then it uses planner_shutdown_hook to propagate
> that into the PlannedStmt, and makes EXPLAIN (DEBUG) print those
> values out. I think there are far more interesting bits of information
> that could be preserved and propagated using this infrastructure,
> though some of them probably also require other changes to make it all
> work. But this is a simple example to show that the concept is valid
> even without anything else.

While mulling this over, I realized that this only works if you don't
mind propagating information into the final plan regardless without
knowing whether or not EXPLAIN was actually used. That's pretty sad,
because whatever you want to propagate into the final plan has to be a
node tree, and you probably had the data in some more digestible form
during planning, and so now you have to go do a whole bunch of work to
convert it into a format that the node infrastructure can digest and
most of the time that's going to be useless. Now, as far as I can see
that's not really an argument against anything in the patch set, which
I'm still hoping someone will review, but it's probably a good
argument that something more is needed.

The simplest idea that comes to mind for me is to make pg_plan_query()
take an ExplainState * argument and pass it through to planner().
Non-EXPLAIN callers can pass NULL, and planner extensions can get
control in via planner_hook() and GetExplainExtensionState() on any
provided ExplainState to figure out what they want to do. However, a
hole in this plan is the case where we call ExplainExecuteQuery(). In
that case, and I believe only that case, pg_plan_query() is not
called; instead, we fetch a plan from the plan cache, and that may be
an already-existing plan, in which case there's no option to
retroactively go back in time and save more or different information.
I don't really know what to do about that. We could ignore that case
and let extensions that work in this way document this as a caveat, or
we could try to force GetCachedPlan() to re-plan if an ExplainState is
provided, or maybe there's some other option.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: RFC: extensible planner state

От

Melanie Plageman

Дата:

12 сентября, 23:33:57

On Tue, Aug 26, 2025 at 4:58 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
>
> 1. Why is it necessary to maintain the GetExplainExtensionId and
> GetPlannerExtensionId routines? It seems that using a single
> extension_id (related to the order of the library inside the
> file_scanner) is more transparent and more straightforward if necessary.

But this wouldn't work for in-core use cases like GEQO, right? Also,
how would it work if there are multiple "extensions" in the same .so
file?

> 2. Why does the extensibility approach in 0001 differ from that in 0004?
> I can imagine it is all about limiting extensions, but anyway, a module
> has access to PlannerInfo, PlannerGlobal, etc. So, this machinery looks
> a little redundant, doesn't it?

What do you mean that the extensibility approach differs? Like that
the type of extension_state is different?

- Melanie

Re: RFC: extensible planner state

От

Melanie Plageman

Дата:

13 сентября, 01:34:04

On Mon, Aug 25, 2025 at 3:47 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> 0001 is the core "private state" patch for PlannerGlobal, PlannerInfo,
> and RelOptInfo. It is unchanged since v2, and contains only the fix
> for memory context handling since v1. However, I've now tested it, and
> I think it's OK to commit, barring further review comments.

A few nits on 0001

> From 1aa43c063edb325548fa3db30b9991bf0831f6f5 Mon Sep 17 00:00:00 2001
> From: Robert Haas <rhaas@postgresql.org>
> Date: Mon, 18 Aug 2025 16:11:10 -0400
> Subject: [PATCH v3 1/5] Allow private state in certain planner data
> + * extendplan.c
> + *      Extend core planner objects with additional private state
> + *
> + * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
> + * Portions Copyright (c) 1994-5, Regents of the University of California
> + *
> + * The interfaces defined in this file make it possible for loadable
> + * modules to their own private state inside of key planner data

You're missing a word above -- like "modules to store their own"

> + * uses set_join_pathlist_hook can arrange to compute a key intermediate
> + * result once per joinrel rather than on every call.
> + *
> + * IDENTIFICATION
> + *      src/backend/commands/extendplan.c

This path does not reflect where you put the file

> + *
> +int
> +GetPlannerExtensionId(const char *extension_name)
> +{
<--snip-->
> +
> +    /* If there's an array but it's currently full, expand it. */
> +    if (PlannerExtensionNamesAssigned >= PlannerExtensionNamesAllocated)
> +    {
> +        int            i = pg_nextpower2_32(PlannerExtensionNamesAssigned + 1);

Storing a uint32 in a signed int that could be 32-bit stuck out to me.

> +
> +        PlannerExtensionNameArray = (const char **)
> +            repalloc(PlannerExtensionNameArray, i * sizeof(char *));
> +        PlannerExtensionNamesAllocated = i;
> +    }
> +
> +    /* Assign and return new ID. */
> +    PlannerExtensionNameArray[PlannerExtensionNamesAssigned] = extension_name;

Since you don't copy the extension name, it might be worth mentioning
that the caller should provide a literal or at least something that
will be around later.

> diff --git a/src/include/optimizer/extendplan.h b/src/include/optimizer/extendplan.h
> new file mode 100644
> +extern void SetPlannerInfoExtensionState(PlannerInfo *root, int extension_id,
> +                                         void *opaque);
> +extern void SetRelOptInfoExtensionState(RelOptInfo *root, int extension_id,
> +                                        void *opaque);

You used a different variable name here than in the implementation for
the RelOptInfo parameter.

> 0002 removes join_search_private, as before. Whether it makes sense to
> go ahead with this is debatable. Needs review, and needs an opinion on
> whether this should be considered a PoC only (and discarded) or
> something that should go forward to commit.

Is there a downside to going forward with it?

As for the code itself, I thought assumeReplanning was a bit vague
since it seems like whether or not replanning is allowed could come up
outside of join order search -- but perhaps that's okay.

> For another example of how these patches could be used, see
> http://postgr.es/m/CA+TgmoZ=6jJi9TGyZCm33vads46HFkyz6Aju_saLT6GFS-iFug@mail.gmail.com
> and in particular 0001 and 0002. This patch set's planner_setup_hook
> call would go write after those patches compute default_ssa_mask and
> default_jsa_mask, allowing the hook to override those values.

So, are you saying that you would rewrite the patches in that set to
use the infrastructure in this set -- e.g. remove that set's
PlannerGlobal.default_jsa_mask and instead put it in
PlannerGlobal.extension_state? Or am I misunderstanding?

- Melanie

Re: RFC: extensible planner state

От

Robert Haas

Дата:

13 сентября, 03:16:11

Hmm. I don't have a copy of Andrei's email in my gmail. I see it in
the archives but I have not got it. I don't understand how that
happened. I now wonder if there are other emails from Andrei I haven't
received.

On Fri, Sep 12, 2025 at 4:34 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> On Tue, Aug 26, 2025 at 4:58 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
> > 1. Why is it necessary to maintain the GetExplainExtensionId and
> > GetPlannerExtensionId routines? It seems that using a single
> > extension_id (related to the order of the library inside the
> > file_scanner) is more transparent and more straightforward if necessary.
>
> But this wouldn't work for in-core use cases like GEQO, right? Also,
> how would it work if there are multiple "extensions" in the same .so
> file?

We probably don't want to all extensions on any topic to be allocating
extension IDs from the same space, because it's used as a list index
and we don't want to have to null-pad lists excessively. Combining the
explain and planner cases wouldn't be too much of a stretch, perhaps,
but it's also not really costing us anything to have separate IDs for
those cases.

> > 2. Why does the extensibility approach in 0001 differ from that in 0004?
> > I can imagine it is all about limiting extensions, but anyway, a module
> > has access to PlannerInfo, PlannerGlobal, etc. So, this machinery looks
> > a little redundant, doesn't it?
>
> What do you mean that the extensibility approach differs? Like that
> the type of extension_state is different?

I suspect the question here is about why not use the
index-by-planner-extension-ID approach for 0004. That could maybe
work, but here everything has to be a Node, so I feel like it would be
more contorted than the existing cases.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: RFC: extensible planner state

От

Robert Haas

Дата:

15 сентября, 21:35:55

On Fri, Sep 12, 2025 at 6:34 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> You're missing a word above -- like "modules to store their own"
> This path does not reflect where you put the file

Thanks.

> Storing a uint32 in a signed int that could be 32-bit stuck out to me.

"git grep pg_nextpower2_32" finds examples of assigning the result to
both "int" and "uint32", and I see no practical risk here.

> Since you don't copy the extension name, it might be worth mentioning
> that the caller should provide a literal or at least something that
> will be around later.

Maybe, but there's no obvious reason for any caller to use anything
other than a string literal.

> You used a different variable name here than in the implementation for
> the RelOptInfo parameter.

Oops.

> > 0002 removes join_search_private, as before. Whether it makes sense to
> > go ahead with this is debatable. Needs review, and needs an opinion on
> > whether this should be considered a PoC only (and discarded) or
> > something that should go forward to commit.
>
> Is there a downside to going forward with it?

I think it's just a stylistic preference, whether people like it this
way better or not.

> As for the code itself, I thought assumeReplanning was a bit vague
> since it seems like whether or not replanning is allowed could come up
> outside of join order search -- but perhaps that's okay.

Yeah, there is room for bikeshedding that name.

> > For another example of how these patches could be used, see
> > http://postgr.es/m/CA+TgmoZ=6jJi9TGyZCm33vads46HFkyz6Aju_saLT6GFS-iFug@mail.gmail.com
> > and in particular 0001 and 0002. This patch set's planner_setup_hook
> > call would go write after those patches compute default_ssa_mask and
> > default_jsa_mask, allowing the hook to override those values.
>
> So, are you saying that you would rewrite the patches in that set to
> use the infrastructure in this set -- e.g. remove that set's
> PlannerGlobal.default_jsa_mask and instead put it in
> PlannerGlobal.extension_state? Or am I misunderstanding?

No, that's not what I'm saying. What I'm saying is that with both
patches applied, planner_setup_hook() from this patch ends up getting
called right after default_jsa_mask is set, so another thing this hook
can do is adjust that value. Or, for example, you can write a patch
that uses this infrastructure to associate state with each RelOptInfo,
and then you can use that state to decide how to set jsa_mask in
join_path_setup_hook. In other words, it's easier to make effective
use of those patches if you have the infrastructure provided by these
patches.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: RFC: extensible planner state

От

Robert Haas

Дата:

16 сентября, 23:12:39

On Fri, Sep 5, 2025 at 4:19 PM Robert Haas <robertmhaas@gmail.com> wrote:
> While mulling this over, I realized that this only works if you don't
> mind propagating information into the final plan regardless without
> knowing whether or not EXPLAIN was actually used. That's pretty sad,
[...]
> The simplest idea that comes to mind for me is to make pg_plan_query()
> take an ExplainState * argument and pass it through to planner().

Here's a new version that implements this idea and also cleans up a
few points that Melanie noted.

--
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: RFC: extensible planner state

От

Andrei Lepikhov

Дата:

17 сентября, 19:42:09

On 13/9/2025 02:16, Robert Haas wrote:
> On Fri, Sep 12, 2025 at 4:34 PM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
>> On Tue, Aug 26, 2025 at 4:58 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
>>> 1. Why is it necessary to maintain the GetExplainExtensionId and
>>> GetPlannerExtensionId routines? It seems that using a single
>>> extension_id (related to the order of the library inside the
>>> file_scanner) is more transparent and more straightforward if necessary.
>>
>> But this wouldn't work for in-core use cases like GEQO, right? Also,
>> how would it work if there are multiple "extensions" in the same .so
>> file?
As I see, the core has never utilised extensibility machinery and 
implemented separate fields/hooks for personal needs (FDW is a suitable 
example). And I think there are reasons for that. Not the last one, I 
guess, security issues.
I have never seen cases with multiple extensions in the same module. I 
wonder what the reason is for doing this and why the core should support 
it?>
> We probably don't want to all extensions on any topic to be allocating
> extension IDs from the same space, because it's used as a list index
> and we don't want to have to null-pad lists excessively. Combining the
> explain and planner cases wouldn't be too much of a stretch, perhaps,
> but it's also not really costing us anything to have separate IDs for
> those cases.
Yes, but it costs extension developers to complicate the code. 
Considering that extensions, implementing planner tricks usually want to 
show the user (on an EXPLAIN request) how they impacted the query plan, 
I guess it makes sense to suggest the same ID.

But I still vote against extension_id in the planner. The main reason 
for me to let go such a solution in EXPLAIN was the 'settings' option, 
because extensions may fight for a 'nice' name. But each extension has a 
native ID - its personal name, isn't it?>
>>> 2. Why does the extensibility approach in 0001 differ from that in 0004?
>>> I can imagine it is all about limiting extensions, but anyway, a module
>>> has access to PlannerInfo, PlannerGlobal, etc. So, this machinery looks
>>> a little redundant, doesn't it?
>>
>> What do you mean that the extensibility approach differs? Like that
>> the type of extension_state is different?
PlannedStmt in 0004 has an extension list that should contain DefElem 
nodes. However, the optimiser nodes use a different approach: the 
extension developer must operate with an ID allocation. I propose using 
a unified way for extensibility - a list with DefElem, uniquely 
identified by module name, looks much more flexible.>
> I suspect the question here is about why not use the
> index-by-planner-extension-ID approach for 0004. That could maybe
> work, but here everything has to be a Node, so I feel like it would be
> more contorted than the existing cases.
I tried to highlight here that 0004 looks better and more universal than 
0001.

-- 
regards, Andrei Lepikhov

Re: RFC: extensible planner state

От

Robert Haas

Дата:

18 сентября, 19:16:29

On Tue, Sep 16, 2025 at 4:12 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Sep 5, 2025 at 4:19 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > While mulling this over, I realized that this only works if you don't
> > mind propagating information into the final plan regardless without
> > knowing whether or not EXPLAIN was actually used. That's pretty sad,
> [...]
> > The simplest idea that comes to mind for me is to make pg_plan_query()
> > take an ExplainState * argument and pass it through to planner().
>
> Here's a new version that implements this idea and also cleans up a
> few points that Melanie noted.

I think that was rebased over a patch I inadvertently committed to my
local master branch. Trying again.

--
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: RFC: extensible planner state

От

Tom Lane

Дата:

22 сентября, 22:51:51

Robert Haas <robertmhaas@gmail.com> writes:
> I think that was rebased over a patch I inadvertently committed to my
> local master branch. Trying again.

I looked through the v5 patchset.

0001: The allocation logic in Set*ExtensionState fails to guarantee
that it's made the array(s) big enough to hold the passed ID, which
would be problematic in the face of lots of extensions.
I think this'd be enough to fix it:
-        i = pg_nextpower2_32(glob->extension_state_allocated + 1);
+        i = pg_nextpower2_32(extension_id + 1);
But maybe you should also reconsider whether blindly starting
the array sizes at 4, rather than say Max(4, extension_id + 1),
is good.

Now that I look, SetExplainExtensionState also has these issues.

Also a couple nitpicks:
* alphabetization fail in util/Makefile
* utils/palloc.h is already included by postgres.h

0002: LGTM

0003: In the wake of 70407d39b, you should avoid "struct
ExplainState" in favor of using duplicate typedefs.
Also, although you're not the first sinner, I really think
this new argument to planner() should be documented.  Maybe
the rest too while we're at it.

0004: maybe the planner_shutdown_hook should be called before
DestroyPartitionDirectory?  It's not entirely clear whether the hook
might like to look at that.  Also "struct ExplainState" again.

0005: okay

            regards, tom lane

Re: RFC: extensible planner state

От

Robert Haas

Дата:

24 сентября, 19:02:58

On Mon, Sep 22, 2025 at 3:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I looked through the v5 patchset.

Thanks!

Here's a new set of patches. I've added a new 0001 at the beginning to
fix the bug you identified in SetExplainExtensionState; this will need
to be back-patched to v18 once the release freeze lifts. I've also
adjusted the previous patch, now 0002, to fix the equivalent problem
with the new Set*ExplainState functions,and I've attempted to fix the
other problems you mentioned, with this exception:

> Also, although you're not the first sinner, I really think
> this new argument to planner() should be documented.  Maybe
> the rest too while we're at it.

I'm a little nervous about this. I fear that the comments are all
going to be of the form "to save a file, click the File menu, then
click Save," which doesn't actually help anyone. That might be a
slight exaggeration, but I feel like it's pretty obvious on visual
inspection that Query *parse is what we're planning, char
*query_string is the text version of that, cursorOptions is some kind
of flag mask, boundParams is the parameter values. It's fair to say
that ExplainState *es is a little less obvious than the others, but
saying "If this function is being invoked by EXPLAIN, then
ExplainState *es is the ExplainState, else it is NULL" doesn't really
seem all that helpful to me. I mean, what else would it be? My thought
here would be that if you want to write some comments that you
consider helpful for the existing arguments, I'll try to write a new
comment for this one in the same style (or you can suggest one) and
hold my nose if I don't find it helpful, or alternatively, we could
proceed with these patches without the comment and you can add
whatever comment text you want after-the-fact. If you want the comment
changes in this patch set, then I need a suggestion as to what that
should look like.

--
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: RFC: extensible planner state

От

Tom Lane

Дата:

24 сентября, 19:18:22

Robert Haas <robertmhaas@gmail.com> writes:
> Here's a new set of patches. I've added a new 0001 at the beginning to
> fix the bug you identified in SetExplainExtensionState; this will need
> to be back-patched to v18 once the release freeze lifts.

I'm good with 0001, and the release freeze is over, so push that
whenever you like.  I'll try to look at the rest soon.

            regards, tom lane

Re: RFC: extensible planner state

От

Robert Haas

Дата:

24 сентября, 19:36:03

On Wed, Sep 24, 2025 at 12:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm good with 0001, and the release freeze is over, so push that
> whenever you like.  I'll try to look at the rest soon.

Cool... except I thought that the release freeze wouldn't lift until
we release, which I thought was tomorrow?

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: RFC: extensible planner state

От

Tom Lane

Дата:

24 сентября, 19:49:14

Robert Haas <robertmhaas@gmail.com> writes:
> Cool... except I thought that the release freeze wouldn't lift until
> we release, which I thought was tomorrow?

Nope, the freeze is over as soon as the git tag for the release is
pushed, cf

https://wiki.postgresql.org/wiki/Committing_checklist#Release_freezes

We'd unfreeze when the stamping commit is made, except that we want
some breathing room for a re-wrap if a packager discovers a critical
problem.  We give them 24 hours to report that.

(What happens if a critical problem is discovered shortly later?
We'd probably use a new minor release number in that case.)

            regards, tom lane

Re: RFC: extensible planner state

От

Robert Haas

Дата:

25 сентября, 19:03:50

On Wed, Sep 24, 2025 at 12:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm good with 0001, and the release freeze is over, so push that
> whenever you like.  I'll try to look at the rest soon.

Done now. Here's a rebase of the rest, plus I tweaked the GEQO patch
to try to avoid a compiler warning that cfbot was complaining about.

--
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: RFC: extensible planner state

От

Tom Lane

Дата:

28 сентября, 18:41:41

Robert Haas <robertmhaas@gmail.com> writes:
> Done now. Here's a rebase of the rest, plus I tweaked the GEQO patch
> to try to avoid a compiler warning that cfbot was complaining about.

I'm good with the v7 patch set, except for the complaint I raised
previously that we really ought to have more than zero documentation
for planner()'s parameters.  If you don't care to write such text,
attached is a cut at it.

            regards, tom lane

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 41bd8353430..c249f34eb8e 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -280,6 +280,23 @@ static void create_partial_unique_paths(PlannerInfo *root, RelOptInfo *input_rel
  *
  *       Query optimizer entry point
  *
+ * Inputs:
+ *    parse: an analyzed-and-rewritten query tree for an optimizable statement
+ *    query_string: source text for the query tree (used for error reports)
+ *    cursorOptions: bitmask of CURSOR_OPT_XXX flags, see parsenodes.h
+ *    boundParams: passed-in parameter values, or NULL if none
+ *    es: ExplainState if being called from EXPLAIN, else NULL
+ *
+ * The result is a PlannedStmt tree.
+ *
+ * PARAM_EXTERN Param nodes within the parse tree can be replaced by Consts
+ * using values from boundParams, if those values are marked PARAM_FLAG_CONST.
+ * Parameter values not so marked are still relied on for estimation purposes.
+ *
+ * The ExplainState pointer is not currently used by the core planner, but it
+ * is passed through to some planner hooks so that they can report information
+ * back to EXPLAIN extension hooks.
+ *
  * To support loadable plugins that monitor or modify planner behavior,
  * we provide a hook variable that lets a plugin get control before and
  * after the standard planning process.  The plugin would normally call

Re: RFC: extensible planner state

От

Robert Haas

Дата:

29 сентября, 16:28:55

On Sun, Sep 28, 2025 at 11:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > Done now. Here's a rebase of the rest, plus I tweaked the GEQO patch
> > to try to avoid a compiler warning that cfbot was complaining about.
>
> I'm good with the v7 patch set, except for the complaint I raised
> previously that we really ought to have more than zero documentation
> for planner()'s parameters.  If you don't care to write such text,
> attached is a cut at it.

Oh, nice, thanks! I'm going to be on vacation the rest of this week so
I plan to deal with this next week. However, if you feel like
committing it before then, please feel free.

--
Robert Haas
EDB: http://www.enterprisedb.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: RFC: extensible planner state

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения