Обсуждение: Re: [PATCH] Optionally record Plan IDs to track plan changes for a query
On 1/3/25 03:46, Lukas Fittl wrote: > My overall perspective is that (1) is best done in-core to keep overhead > low, whilst (2) could be done outside of core (or merged with a future > pg_stat_statements) and is included here mainly for illustration purposes. Thank you for the patch and your attention to this issue! I am pleased with the export of the jumbling functions and their generalisation. I may not be close to the task monitoring area, but I utilise queryId and other tools to differ plan nodes inside extensions. Initially, like queryId serves as a class identifier for queries, plan_id identifies a class of nodes, not a single node. In the implementation provided here, nodes with the same hash can represent different subtrees. For example, JOIN(A, JOIN(B,C)) and JOIN(JOIN(B,C),A) may have the same ID. Moreover, I wonder if this version of plan_id reacts to the join level change. It appears that only a change of the join clause alters the plan_id hash value, which means you would end up with a single hash for very different plan nodes. Is that acceptable? To address this, we should consider the hashes of the left and right subtrees and the hashes of each subplan (especially in the case of Append). Overall, similar to discussions on queryId, various extensions may want different logic for generating plan_id (more or less unique guarantees, for example). Hence, it would be beneficial to separate this logic and allow extensions to provide different plan_ids. IMO, What we need is a 'List *ext' field in each of the Plan, Path, PlanStmt, and Query structures. Such 'ext' field may contain different stuff that extensions want to push without interference between them - specific plan_id as an example. Additionally, we could bridge the gap between the cloud of paths and the plan by adding a hook at the end of the create_plan_recurse routine. This may facilitate the transfer of information regarding optimiser decisions that could be influenced by an extension into the plan. -- regards, Andrei Lepikhov
Hi Andrei,
On Fri, Jan 24, 2025 at 1:23 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
I may not be close to the task monitoring area, but I utilise queryId
and other tools to differ plan nodes inside extensions. Initially, like
queryId serves as a class identifier for queries, plan_id identifies a
class of nodes, not a single node. In the implementation provided here,
nodes with the same hash can represent different subtrees. For example,
JOIN(A, JOIN(B,C)) and JOIN(JOIN(B,C),A) may have the same ID.
Moreover, I wonder if this version of plan_id reacts to the join level
change. It appears that only a change of the join clause alters the
plan_id hash value, which means you would end up with a single hash for
very different plan nodes. Is that acceptable? To address this, we
should consider the hashes of the left and right subtrees and the hashes
of each subplan (especially in the case of Append).
I looked back at this again just to confirm we're not missing anything:
I don't think any of the posted patch versions (including the just shared v4) have a problem with distinguishing two plans that are very similar but only differ in JOIN order. Since we descend into the inner/outer plans via the setrefs.c treewalk, the placement of JOIN nodes vs other nodes should cause a different plan jumble (and we include both the node tag for the join/scan nodes, as well as the RT index the scans point to in the jumble).
Do you have a reproducer that shows these two generate the same plan ID?
Thanks,
Lukas
Lukas Fittl
On 2/5/25 09:16, Lukas Fittl wrote: > Hi Andrei, > > On Fri, Jan 24, 2025 at 1:23 AM Andrei Lepikhov <lepihov@gmail.com > <mailto:lepihov@gmail.com>> wrote: > > I may not be close to the task monitoring area, but I utilise queryId > and other tools to differ plan nodes inside extensions. Initially, like > queryId serves as a class identifier for queries, plan_id identifies a > class of nodes, not a single node. In the implementation provided here, > nodes with the same hash can represent different subtrees. For example, > JOIN(A, JOIN(B,C)) and JOIN(JOIN(B,C),A) may have the same ID. > > > Moreover, I wonder if this version of plan_id reacts to the join level > change. It appears that only a change of the join clause alters the > plan_id hash value, which means you would end up with a single hash for > very different plan nodes. Is that acceptable? To address this, we > should consider the hashes of the left and right subtrees and the > hashes > of each subplan (especially in the case of Append). > > > I looked back at this again just to confirm we're not missing anything: > > I don't think any of the posted patch versions (including the just > shared v4) have a problem with distinguishing two plans that are very > similar but only differ in JOIN order. Since we descend into the inner/ > outer plans via the setrefs.c treewalk, the placement of JOIN nodes vs > other nodes should cause a different plan jumble (and we include both > the node tag for the join/scan nodes, as well as the RT index the scans > point to in the jumble). Maybe. I haven't dive into that stuff deeply yet. It is not difficult to check. The main point was that different extensions want different plan_ids. For example, planner extensions want to guarantee the distinctness and sort of stability of this field inside a query plan. Does the hash value guarantee that? We have discussed how queryId should be generated more than once. That's why I think the plan_id generation logic should be implemented inside an extension, not in the core. -- regards, Andrei Lepikhov