Обсуждение: [sqlsmith] Failed assertion during partition pruning
Hi, testing master at 3df51ca8 with sqlsmith triggers the following assertion: TRAP: FailedAssertion("!bms_is_empty(present_parts)", File: "partprune.c", Line: 588, PID: 8540) I looked at a dozen backtraces and they all sport a window aggregate but that may still be random chance since sqlsmith really likes generating these a lot... Below is the shortest recipe I found to reproduce it on a fresh regression database: --8<---------------cut here---------------start------------->8--- regression=# insert into trigger_parted values (1); ERROR: control reached end of trigger procedure without RETURN CONTEXT: PL/pgSQL function trigger_parted_trigfunc() regression=# select myaggp05a(a) over (partition by a order by a) from trigger_parted where pg_trigger_depth() <> a limit40; server closed the connection unexpectedly --8<---------------cut here---------------end--------------->8--- Backtrace of this one below. regards, Andreas #2 0x000055bafaa95b81 in ExceptionalCondition (conditionName=conditionName@entry=0x55bafabe7812 "!bms_is_empty(present_parts)",errorType=errorType@entry=0x55bafaae901d "FailedAssertion", fileName=0x7ffee4fbbe40 "b[\251\372\272U", fileName@entry=0x55bafabe765b "partprune.c", lineNumber=lineNumber@entry=588) at assert.c:69 #3 0x000055bafa8f02c0 in make_partitionedrel_pruneinfo (matchedsubplans=<synthetic pointer>, prunequal=<optimized out>,partrelids=<optimized out>, relid_subplan_map=0x55bafba17fd0, parentrel=0x55bafb928050, root=0x55bafb9f4f78) at partprune.c:588 #4 make_partition_pruneinfo (root=root@entry=0x55bafb9f4f78, parentrel=parentrel@entry=0x55bafb928050, subpaths=0x55bafba16c10,partitioned_rels=0x55bafba16d58, prunequal=prunequal@entry=0x55bafba17f78) at partprune.c:274 #5 0x000055bafa8b05cd in create_append_plan (root=0x55bafb9f4f78, best_path=0x55bafba16cc0, flags=6) at createplan.c:1249 #6 0x000055bafa8acd5e in create_windowagg_plan (best_path=0x55bafba174c0, root=0x55bafb9f4f78) at createplan.c:2452 #7 create_plan_recurse (root=0x55bafb9f4f78, best_path=0x55bafba174c0, flags=1) at createplan.c:492 #8 0x000055bafa8ad341 in create_limit_plan (flags=1, best_path=0x55bafba17910, root=0x55bafb9f4f78) at createplan.c:2699 #9 create_plan_recurse (root=0x55bafb9f4f78, best_path=0x55bafba17910, flags=1) at createplan.c:514 #10 0x000055bafa8b00d1 in create_plan (root=root@entry=0x55bafb9f4f78, best_path=<optimized out>) at createplan.c:333 #11 0x000055bafa8bf013 in standard_planner (parse=0x55bafb9287a8, query_string=<optimized out>, cursorOptions=256, boundParams=<optimizedout>) at planner.c:409 #12 0x000055bafa989da8 in pg_plan_query (querytree=0x55bafb9287a8, querytree@entry=0x7ffee4fbc620, query_string=query_string@entry=0x55bafb9082f0 "explain select myaggp05a(a) over (partition by a order by a) from trigger_partedwhere pg_trigger_depth() <> a limit 40;", cursorOptions=cursorOptions@entry=256, boundParams=boundParams@entry=0x0)at postgres.c:875 #13 0x000055bafa7b5dcf in ExplainOneQuery (query=0x7ffee4fbc620, cursorOptions=256, into=0x0, es=0x55bafb927f80, queryString=0x55bafb9082f0"explain select myaggp05a(a) over (partition by a order by a) from trigger_parted where pg_trigger_depth()<> a limit 40;", params=0x0, queryEnv=0x0) at explain.c:391 #14 0x000055bafa7b6507 in ExplainQuery (pstate=0x55bafb92a1c0, stmt=0x55bafb909930, params=0x0, dest=0x55bafb92a128) at ../../../src/include/nodes/nodes.h:592 #15 0x000055bafa98f87d in standard_ProcessUtility (pstmt=0x55bafb9c6090, queryString=0x55bafb9082f0 "explain select myaggp05a(a)over (partition by a order by a) from trigger_parted where pg_trigger_depth() <> a limit 40;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x55bafb92a128, qc=0x7ffee4fbc8c0) at utility.c:829 #16 0x000055bafa98cb36 in PortalRunUtility (portal=0x55bafb96b590, pstmt=0x55bafb9c6090, isTopLevel=<optimized out>, setHoldSnapshot=<optimizedout>, dest=0x55bafb92a128, qc=0x7ffee4fbc8c0) at pquery.c:1159 #17 0x000055bafa98d910 in FillPortalStore (portal=0x55bafb96b590, isTopLevel=<optimized out>) at ../../../src/include/nodes/nodes.h:592 #18 0x000055bafa98e54d in PortalRun (portal=portal@entry=0x55bafb96b590, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true,run_once=run_once@entry=true, dest=dest@entry=0x55bafb9c6180, altdest=altdest@entry=0x55bafb9c6180,qc=0x7ffee4fbcac0) at pquery.c:751 #19 0x000055bafa98a29c in exec_simple_query (query_string=0x55bafb9082f0 "explain select myaggp05a(a) over (partition bya order by a) from trigger_parted where pg_trigger_depth() <> a limit 40;") at postgres.c:1239 #20 0x000055bafa98beaa in PostgresMain (argc=argc@entry=1, argv=argv@entry=0x7ffee4fbcff0, dbname=<optimized out>, username=<optimizedout>) at postgres.c:4308 #21 0x000055bafa9054fa in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4488 #22 BackendStartup (port=<optimized out>) at postmaster.c:4210 #23 ServerLoop () at postmaster.c:1727 #24 0x000055bafa906452 in PostmasterMain (argc=<optimized out>, argv=0x55bafb902c70) at postmaster.c:1400 #25 0x000055bafa65e980 in main (argc=3, argv=0x55bafb902c70) at main.c:209
Andreas Seltenreich <seltenreich@gmx.de> writes: > testing master at 3df51ca8 with sqlsmith triggers the following > assertion: > TRAP: FailedAssertion("!bms_is_empty(present_parts)", File: "partprune.c", Line: 588, PID: 8540) > I looked at a dozen backtraces and they all sport a window aggregate but > that may still be random chance since sqlsmith really likes generating > these a lot... Yeah, it doesn't seem to need a window aggregate: regression=# select a from trigger_parted where pg_trigger_depth() <> a order by a limit 40; server closed the connection unexpectedly What it looks like to me is that the code for setting up run-time partition pruning has failed to consider the possibility of nested partitioning: it's expecting that every partitioned table will have at least one direct child that is a leaf. I'm not sure though whether just the Assert is wrong, or there's more fundamental issues here. It's also somewhat interesting that you need the "order by a limit 40" to get a crash. Poking around in the failing backend, I can see that that causes the leaf-partition subplan to be an indexscan not a seqscan, but it's far from clear why that'd make any difference to the partition pruning logic. regards, tom lane
I wrote: > What it looks like to me is that the code for setting up run-time > partition pruning has failed to consider the possibility of nested > partitioning: it's expecting that every partitioned table will have > at least one direct child that is a leaf. I'm not sure though > whether just the Assert is wrong, or there's more fundamental > issues here. After looking into the git history I realized that this assertion is quite new, stemming from David's a929e17e5a8 of 2020-11-02. So there's something not right about that. regards, tom lane
I wrote: >> What it looks like to me is that the code for setting up run-time >> partition pruning has failed to consider the possibility of nested >> partitioning: it's expecting that every partitioned table will have >> at least one direct child that is a leaf. I'm not sure though >> whether just the Assert is wrong, or there's more fundamental >> issues here. > After looking into the git history I realized that this assertion is > quite new, stemming from David's a929e17e5a8 of 2020-11-02. So there's > something not right about that. I took some more time to poke at this today, and I now think that the assertion in make_partitionedrel_pruneinfo is probably OK, and what it's pointing out is a bug upstream in path creation. Specifically, I noted that in select a from trigger_parted where pg_trigger_depth() <> a order by a; we arrive at make_partitionedrel_pruneinfo with partrelids equal to (b 1 2), which seems to be correct. The RTE list is RTE 1: trigger_parted RTE 2: trigger_parted_p1 RTE 3: trigger_parted_p1_1 Like so much else of the partitioning code, AppendPath.partitioned_rels is abysmally underdocumented, but what I think it means is the set of non-leaf partitioned tables that are notionally scanned by the AppendPath. The only table directly mentioned by the AppendPath's subpath is RTE 3, so that all seems fine. However, upon adding a LIMIT: select a from trigger_parted where pg_trigger_depth() <> a order by a limit 40; server closed the connection unexpectedly we arrive at make_partitionedrel_pruneinfo with partrelids equal to just (b 1); trigger_parted_p1 has been left out. The Path in this case has been made by generate_orderedappend_paths, which is what's responsible for computing AppendPath.partitioned_rels that eventually winds up as the argument to make_partitionedrel_pruneinfo. So I think that that code is somehow failing to account for nested partitioning, while the non-ordered-append code is doing it right. But I didn't spot exactly where the discrepancy is. regards, tom lane
Hi,
I wonder if the (failed) assertion should be converted to an if statement:
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index fac921eea5..d646f08a07 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -585,7 +585,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* partitioned tables that we have no sub-paths or
* sub-PartitionedRelPruneInfo for.
*/
- Assert(!bms_is_empty(present_parts));
+ if (bms_is_empty(present_parts)) return NIL;
/* Record the maps and other information. */
pinfo->present_parts = present_parts;
index fac921eea5..d646f08a07 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -585,7 +585,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* partitioned tables that we have no sub-paths or
* sub-PartitionedRelPruneInfo for.
*/
- Assert(!bms_is_empty(present_parts));
+ if (bms_is_empty(present_parts)) return NIL;
/* Record the maps and other information. */
pinfo->present_parts = present_parts;
Cheers
On Fri, Jan 29, 2021 at 12:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
>> What it looks like to me is that the code for setting up run-time
>> partition pruning has failed to consider the possibility of nested
>> partitioning: it's expecting that every partitioned table will have
>> at least one direct child that is a leaf. I'm not sure though
>> whether just the Assert is wrong, or there's more fundamental
>> issues here.
> After looking into the git history I realized that this assertion is
> quite new, stemming from David's a929e17e5a8 of 2020-11-02. So there's
> something not right about that.
I took some more time to poke at this today, and I now think that
the assertion in make_partitionedrel_pruneinfo is probably OK,
and what it's pointing out is a bug upstream in path creation.
Specifically, I noted that in
select a from trigger_parted where pg_trigger_depth() <> a order by a;
we arrive at make_partitionedrel_pruneinfo with partrelids equal
to (b 1 2), which seems to be correct. The RTE list is
RTE 1: trigger_parted
RTE 2: trigger_parted_p1
RTE 3: trigger_parted_p1_1
Like so much else of the partitioning code, AppendPath.partitioned_rels
is abysmally underdocumented, but what I think it means is the set of
non-leaf partitioned tables that are notionally scanned by the
AppendPath. The only table directly mentioned by the AppendPath's
subpath is RTE 3, so that all seems fine.
However, upon adding a LIMIT:
select a from trigger_parted where pg_trigger_depth() <> a order by a limit 40;
server closed the connection unexpectedly
we arrive at make_partitionedrel_pruneinfo with partrelids equal
to just (b 1); trigger_parted_p1 has been left out. The Path
in this case has been made by generate_orderedappend_paths, which
is what's responsible for computing AppendPath.partitioned_rels that
eventually winds up as the argument to make_partitionedrel_pruneinfo.
So I think that that code is somehow failing to account for nested
partitioning, while the non-ordered-append code is doing it right.
But I didn't spot exactly where the discrepancy is.
regards, tom lane
Zhihong Yu <zyu@yugabyte.com> writes: > I wonder if the (failed) assertion should be converted to an if statement: As I said, I'm now thinking it's not the Assert that's faulty. If I'm right about that, it's likely that the mistaken labeling of these paths has other consequences beyond triggering this assertion. (If it has none, I think we'd be better off to remove these Path fields altogether, and re-identify the parent rels here from the RelOptInfo data.) regards, tom lane
I wrote: > As I said, I'm now thinking it's not the Assert that's faulty. > If I'm right about that, it's likely that the mistaken labeling > of these paths has other consequences beyond triggering this > assertion. (If it has none, I think we'd be better off to remove > these Path fields altogether, and re-identify the parent rels > here from the RelOptInfo data.) After more time poking at this, I've concluded that my parenthetical remark is the right solution: [Merge]AppendPath.partitioned_rels ought to be nuked from orbit. It requires a whole lot of squirrely, now-known-to-be-buggy logic in allpaths.c, and practically the only place where we use the result is in make_partition_pruneinfo(). That means we're expending a lot of work *per AppendPath*, which is quite foolish unless there's some reason we can't get the information at create_plan() time instead. Which we can. For simplicity of review I divided the patch into two parts. 0001 revises make_partition_pruneinfo() and children to identify the relevant parent partitions for themselves, which is not too hard to do by chasing up the child-to-parent AppendRelInfo links. Not formerly documented, AFAICT, is that we want to collect just the parent partitions that are between the Append path's own rel (possibly equal to it) and the subpaths' rels. I'd first tried to code this by using the top_parent_relids and part_rels[] links in the RelOptInfos, but that turns out not to work. We might ascend to a top parent that's above the Append's rel (if considering an appendpath for a sub-partition, which happens in partitionwise join). We could also descend to a child at or below some subpath level, since there are also cases where subpaths correspond to unflattened non-leaf partitions. Either of those things result in failures. But once you wrap your head around handling those restrictions, it's quite simple. Then 0002 just nukes all the no-longer-necessary logic to compute the partitioned_rels fields. There are a couple of changes that are not quite trivial. create_[merge_]append_plan were testing best_path->partitioned_rels != NIL to shortcut calling make_partition_pruneinfo at all. I just dropped that, reasoning that it wasn't saving enough to worry about. Also, create_append_path was similarly testing partitioned_rels != NIL to decide if it needed to go to the trouble of using get_baserel_parampathinfo. I changed that to an IS_PARTITIONED_REL() test, which isn't *quite* the same thing but seems close enough. (It changes no regression results, anyway.) This fixes the cases reported by Andreas and Jaime, leaving me more confident that there's nothing wrong with David's Assert. I did not try to add a regression test case, mainly because it's not quite clear to me where the original bug is. (I'm a little suspicious that the blame might lie with the "match_partition_order" cases in generate_orderedappend_paths, which try to bypass accumulate_append_subpath without updating the partitioned_rels data. But I'm not excited about trying to prove that.) I wonder whether we should consider back-patching this. Another thing that seems unclear is whether there is any serious consequence to omitting some intermediate partitions from the set considered by make_partition_pruneinfo. Presumably it could lead to missing some potential run-time-pruning opportunities, but is there any worse issue? If there isn't, this is a bigger change than I want to put in the back braches. regards, tom lane diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c index 42c7c5f554..c08cfdaaaa 100644 --- a/src/backend/partitioning/partprune.c +++ b/src/backend/partitioning/partprune.c @@ -138,10 +138,12 @@ typedef struct PruneStepResult } PruneStepResult; +static List *add_part_rel_list(List *partrellists, List *partrellist); static List *make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, + List *partrellist, + List *prunequal, int *relid_subplan_map, - Relids partrelids, List *prunequal, Bitmapset **matchedsubplans); static void gen_partprune_steps(RelOptInfo *rel, List *clauses, PartClauseTarget target, @@ -213,67 +215,103 @@ static void partkey_datum_from_expr(PartitionPruneContext *context, * * 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list * of scan paths for its child rels. - * - * 'partitioned_rels' is a List containing Lists of relids of partitioned - * tables (a/k/a non-leaf partitions) that are parents of some of the child - * rels. Here we attempt to populate the PartitionPruneInfo by adding a - * 'prune_infos' item for each sublist in the 'partitioned_rels' list. - * However, some of the sets of partitioned relations may not require any - * run-time pruning. In these cases we'll simply not include a 'prune_infos' - * item for that set and instead we'll add all the subplans which belong to - * that set into the PartitionPruneInfo's 'other_subplans' field. Callers - * will likely never want to prune subplans which are mentioned in this field. - * - * 'prunequal' is a list of potential pruning quals. + * 'prunequal' is a list of potential pruning quals (i.e., restriction + * clauses that are applicable to the appendrel). */ PartitionPruneInfo * make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, - List *subpaths, List *partitioned_rels, + List *subpaths, List *unused_partitioned_rels, List *prunequal) { PartitionPruneInfo *pruneinfo; Bitmapset *allmatchedsubplans = NULL; + List *partrellists; + List *prunerelinfos; int *relid_subplan_map; ListCell *lc; - List *prunerelinfos; int i; /* - * Construct a temporary array to map from planner relids to subplan - * indexes. For convenience, we use 1-based indexes here, so that zero - * can represent an un-filled array entry. + * Scan the subpaths to see which ones are scans of partition child + * relations, and make lists of their parent partitioned rels. (We must + * restrict the parent partitioned rels to be parentrel or children of + * parentrel, otherwise we couldn't translate prunequal to match.) Also + * construct a temporary array to map from partition-child-relation relid + * to the index in 'subpaths' of the scan plan for that partition. (Use + * of "subplan" rather than "subpath" is a bit of a misnomer, but we'll + * let it stand.) For convenience, we use 1-based indexes here, so that + * zero can represent an un-filled array entry. */ + partrellists = NIL; relid_subplan_map = palloc0(sizeof(int) * root->simple_rel_array_size); - /* - * relid_subplan_map maps relid of a leaf partition to the index in - * 'subpaths' of the scan plan for that partition. - */ i = 1; foreach(lc, subpaths) { Path *path = (Path *) lfirst(lc); RelOptInfo *pathrel = path->parent; - Assert(IS_SIMPLE_REL(pathrel)); - Assert(pathrel->relid < root->simple_rel_array_size); - /* No duplicates please */ - Assert(relid_subplan_map[pathrel->relid] == 0); + /* We don't consider partitioned joins here */ + if (pathrel->reloptkind == RELOPT_OTHER_MEMBER_REL) + { + RelOptInfo *prel = pathrel; + List *partrellist = NIL; - relid_subplan_map[pathrel->relid] = i++; + /* + * Traverse up to the pathrel's topmost partitioned parent; but + * stop if we reach parentrel. (Normally, the topmost partitioned + * parent is either parentrel or a UNION ALL appendrel child of + * parentrel. But when handling partitionwise joins of + * multi-level partitioning trees, we can see an append path whose + * parentrel is an intermediate partitioned table.) + */ + do + { + AppendRelInfo *appinfo; + + Assert(prel->relid < root->simple_rel_array_size); + appinfo = root->append_rel_array[prel->relid]; + prel = find_base_rel(root, appinfo->parent_relid); + if (!IS_PARTITIONED_REL(prel)) + break; /* reached a non-partitioned parent */ + /* accept this level as an interesting parent */ + partrellist = lcons(prel, partrellist); + if (prel == parentrel) + break; /* don't traverse above appendrel's level */ + } while (prel->reloptkind == RELOPT_OTHER_MEMBER_REL); + + if (partrellist) + { + /* + * Found some relevant parent partitions, which may or may not + * overlap with partition trees we already found. Add new + * information to the partrellists list-of-lists. + */ + partrellists = add_part_rel_list(partrellists, partrellist); + /* Also record the subplan in relid_subplan_map[] */ + /* No duplicates please */ + Assert(relid_subplan_map[pathrel->relid] == 0); + relid_subplan_map[pathrel->relid] = i; + } + } + i++; } - /* We now build a PartitionedRelPruneInfo for each partitioned rel. */ + /* + * We now build a PartitionedRelPruneInfo for each topmost partitioned rel + * (omitting any that turn out not to have useful pruning quals). + */ prunerelinfos = NIL; - foreach(lc, partitioned_rels) + foreach(lc, partrellists) { - Relids partrelids = (Relids) lfirst(lc); + List *partrellist = (List *) lfirst(lc); List *pinfolist; Bitmapset *matchedsubplans = NULL; pinfolist = make_partitionedrel_pruneinfo(root, parentrel, + partrellist, + prunequal, relid_subplan_map, - partrelids, prunequal, &matchedsubplans); /* When pruning is possible, record the matched subplans */ @@ -299,7 +337,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, pruneinfo->prune_infos = prunerelinfos; /* - * Some subplans may not belong to any of the listed partitioned rels. + * Some subplans may not belong to any of the identified partitioned rels. * This can happen for UNION ALL queries which include a non-partitioned * table, or when some of the hierarchies aren't run-time prunable. Build * a bitmapset of the indexes of all such subplans, so that the executor @@ -321,28 +359,77 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, return pruneinfo; } +/* + * add_part_rel_list + * Add new partitioning info to a list of lists of partitioned rels. + * + * Within partrellists, there is one sublist for each topmost parent + * partitioned rel. Within that sublist, after the topmost parent, + * we include each child non-leaf partition it has, omitting duplicates. + * Within sublists, children appear before their grandchildren. + * + * partrellist is a nonempty list of partitioned rels to add to this forest. + * It must list ancestors before children. + * + * Note that the result contains only rels that are parents of some scan-level + * relation appearing in the 'subpaths' that make_partition_pruneinfo() is + * dealing with. Also, "topmost" parents are not allowed to be higher than + * the 'parentrel' associated with the append path. In this way, we avoid + * expending cycles on partitioned rels that can't contribute useful pruning + * information for the problem at hand. (Note that it is possible for + * 'parentrel' to be a child partitioned table, and it is also possible for + * scan-level relations to be child partitioned tables rather than leaf + * partitions. Hence we must construct this relation set with reference to + * the particular append path we're dealing with, rather than looking at the + * full partitioning structure represented in the RelOptInfos.) + */ +static List * +add_part_rel_list(List *partrellists, List *partrellist) +{ + RelOptInfo *targetpart = (RelOptInfo *) linitial(partrellist); + ListCell *lc; + + /* Look for a matching topmost parent */ + foreach(lc, partrellists) + { + List *thislist = (List *) lfirst(lc); + RelOptInfo *thistarget = (RelOptInfo *) linitial(thislist); + + if (thistarget == targetpart) + { + /* Found a match, so add any new children to this sublist */ + thislist = list_concat_unique_ptr(thislist, partrellist); + lfirst(lc) = thislist; /* not really necessary */ + return partrellists; + } + } + /* No match, so begin a new sublist */ + return lappend(partrellists, partrellist); +} + /* * make_partitionedrel_pruneinfo * Build a List of PartitionedRelPruneInfos, one for each partitioned * rel. These can be used in the executor to allow additional partition * pruning to take place. * - * Here we generate partition pruning steps for 'prunequal' and also build a - * data structure which allows mapping of partition indexes into 'subpaths' - * indexes. - * - * If no non-Const expressions are being compared to the partition key in any - * of the 'partitioned_rels', then we return NIL to indicate no run-time - * pruning should be performed. Run-time pruning would be useless since the - * pruning done during planning will have pruned everything that can be. - * - * On non-NIL return, 'matchedsubplans' is set to the subplan indexes which - * were matched to this partition hierarchy. + * parentrel: rel associated with the appendpath being considered + * partrellist: list of a parent partitioned table's RelOptInfo, followed + * by RelOptInfos for relevant nonleaf children + * prunequal: potential pruning quals, represented for parentrel + * relid_subplan_map[]: maps child relation relids to subplan indexes + * matchedsubplans: on success, receives the set of subplan indexes which + * were matched to this partition hierarchy + * + * If we cannot find any useful run-time pruning steps, return NIL. + * However, on success, each element of partrellist will have an element + * in the result list, even if some of them are useless. */ static List * make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, + List *partrellist, + List *prunequal, int *relid_subplan_map, - Relids partrelids, List *prunequal, Bitmapset **matchedsubplans) { RelOptInfo *targetpart = NULL; @@ -351,7 +438,6 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, int *relid_subpart_map; Bitmapset *subplansfound = NULL; ListCell *lc; - int rti; int i; /* @@ -365,10 +451,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, relid_subpart_map = palloc0(sizeof(int) * root->simple_rel_array_size); i = 1; - rti = -1; - while ((rti = bms_next_member(partrelids, rti)) > 0) + foreach(lc, partrellist) { - RelOptInfo *subpart = find_base_rel(root, rti); + RelOptInfo *subpart = (RelOptInfo *) lfirst(lc); PartitionedRelPruneInfo *pinfo; List *partprunequal; List *initial_pruning_steps; @@ -384,8 +469,8 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, * partition. We use 1-based indexes here, so that zero can represent * an un-filled array entry. */ - Assert(rti < root->simple_rel_array_size); - relid_subpart_map[rti] = i++; + Assert(subpart->relid < root->simple_rel_array_size); + relid_subpart_map[subpart->relid] = i++; /* * Translate pruning qual, if necessary, for this partition. @@ -509,7 +594,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, /* Begin constructing the PartitionedRelPruneInfo for this rel */ pinfo = makeNode(PartitionedRelPruneInfo); - pinfo->rtindex = rti; + pinfo->rtindex = subpart->relid; pinfo->initial_pruning_steps = initial_pruning_steps; pinfo->exec_pruning_steps = exec_pruning_steps; pinfo->execparamids = execparamids; diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 8392be6d44..2206c98080 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -1871,7 +1871,6 @@ _outAppendPath(StringInfo str, const AppendPath *node) _outPathInfo(str, (const Path *) node); - WRITE_NODE_FIELD(partitioned_rels); WRITE_NODE_FIELD(subpaths); WRITE_INT_FIELD(first_partial_path); WRITE_FLOAT_FIELD(limit_tuples, "%.0f"); @@ -1884,7 +1883,6 @@ _outMergeAppendPath(StringInfo str, const MergeAppendPath *node) _outPathInfo(str, (const Path *) node); - WRITE_NODE_FIELD(partitioned_rels); WRITE_NODE_FIELD(subpaths); WRITE_FLOAT_FIELD(limit_tuples, "%.0f"); } diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 026a4b0848..cd3fdd259c 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -99,18 +99,13 @@ static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte); static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel, List *live_childrels, - List *all_child_pathkeys, - List *partitioned_rels); + List *all_child_pathkeys); static Path *get_cheapest_parameterized_child_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer); -static List *accumulate_partitioned_rels(List *partitioned_rels, - List *sub_partitioned_rels, - bool flatten_partitioned_rels); static void accumulate_append_subpath(Path *path, - List **subpaths, List **special_subpaths, - List **partitioned_rels, - bool flatten_partitioned_rels); + List **subpaths, + List **special_subpaths); static Path *get_singleton_append_subpath(Path *path); static void set_dummy_rel_pathlist(RelOptInfo *rel); static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel, @@ -1299,38 +1294,11 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel, List *all_child_pathkeys = NIL; List *all_child_outers = NIL; ListCell *l; - List *partitioned_rels = NIL; - List *partial_partitioned_rels = NIL; - List *pa_partitioned_rels = NIL; double partial_rows = -1; - bool flatten_partitioned_rels; /* If appropriate, consider parallel append */ pa_subpaths_valid = enable_parallel_append && rel->consider_parallel; - /* What we do with the partitioned_rels list is different for UNION ALL */ - flatten_partitioned_rels = (rel->rtekind != RTE_SUBQUERY); - - /* - * For partitioned tables, we accumulate a list of Relids of each - * partitioned table which has at least one of its subpartitions directly - * present as a subpath in this Append. This is used later for run-time - * partition pruning. We must maintain separate lists for each Append - * Path that we create as some paths that we create here can't flatten - * sub-Appends and sub-MergeAppends into the top-level Append. We needn't - * bother doing this for join rels as no run-time pruning is done on - * those. - */ - if (rel->reloptkind != RELOPT_JOINREL && rel->part_scheme != NULL) - { - partitioned_rels = list_make1(bms_make_singleton(rel->relid)); - partial_partitioned_rels = list_make1(bms_make_singleton(rel->relid)); - - /* skip this one if we're not going to make a Parallel Append path */ - if (pa_subpaths_valid) - pa_partitioned_rels = list_make1(bms_make_singleton(rel->relid)); - } - /* * For every non-dummy child, remember the cheapest path. Also, identify * all pathkeys (orderings) and parameterizations (required_outer sets) @@ -1353,8 +1321,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel, if (childrel->pathlist != NIL && childrel->cheapest_total_path->param_info == NULL) accumulate_append_subpath(childrel->cheapest_total_path, - &subpaths, NULL, &partitioned_rels, - flatten_partitioned_rels); + &subpaths, NULL); else subpaths_valid = false; @@ -1363,9 +1330,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel, { cheapest_partial_path = linitial(childrel->partial_pathlist); accumulate_append_subpath(cheapest_partial_path, - &partial_subpaths, NULL, - &partial_partitioned_rels, - flatten_partitioned_rels); + &partial_subpaths, NULL); } else partial_subpaths_valid = false; @@ -1394,10 +1359,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel, Assert(cheapest_partial_path != NULL); accumulate_append_subpath(cheapest_partial_path, &pa_partial_subpaths, - &pa_nonpartial_subpaths, - &pa_partitioned_rels, - flatten_partitioned_rels); - + &pa_nonpartial_subpaths); } else { @@ -1416,9 +1378,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel, */ accumulate_append_subpath(nppath, &pa_nonpartial_subpaths, - NULL, - &pa_partitioned_rels, - flatten_partitioned_rels); + NULL); } } @@ -1495,7 +1455,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel, if (subpaths_valid) add_path(rel, (Path *) create_append_path(root, rel, subpaths, NIL, NIL, NULL, 0, false, - partitioned_rels, -1)); + -1)); /* * Consider an append of unordered, unparameterized partial paths. Make @@ -1538,7 +1498,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel, appendpath = create_append_path(root, rel, NIL, partial_subpaths, NIL, NULL, parallel_workers, enable_parallel_append, - partial_partitioned_rels, -1); + -1); /* * Make sure any subsequent partial paths use the same row count @@ -1587,7 +1547,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel, appendpath = create_append_path(root, rel, pa_nonpartial_subpaths, pa_partial_subpaths, NIL, NULL, parallel_workers, true, - pa_partitioned_rels, partial_rows); + partial_rows); add_partial_path(rel, (Path *) appendpath); } @@ -1597,8 +1557,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel, */ if (subpaths_valid) generate_orderedappend_paths(root, rel, live_childrels, - all_child_pathkeys, - partitioned_rels); + all_child_pathkeys); /* * Build Append paths for each parameterization seen among the child rels. @@ -1617,10 +1576,6 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel, { Relids required_outer = (Relids) lfirst(l); ListCell *lcr; - List *part_rels = NIL; - - if (rel->reloptkind != RELOPT_JOINREL && rel->part_scheme != NULL) - part_rels = list_make1(bms_make_singleton(rel->relid)); /* Select the child paths for an Append with this parameterization */ subpaths = NIL; @@ -1646,15 +1601,14 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel, subpaths_valid = false; break; } - accumulate_append_subpath(subpath, &subpaths, NULL, &part_rels, - flatten_partitioned_rels); + accumulate_append_subpath(subpath, &subpaths, NULL); } if (subpaths_valid) add_path(rel, (Path *) create_append_path(root, rel, subpaths, NIL, NIL, required_outer, 0, false, - part_rels, -1)); + -1)); } /* @@ -1681,7 +1635,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel, appendpath = create_append_path(root, rel, NIL, list_make1(path), NIL, NULL, path->parallel_workers, true, - partitioned_rels, partial_rows); + partial_rows); add_partial_path(rel, (Path *) appendpath); } } @@ -1717,26 +1671,13 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel, static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel, List *live_childrels, - List *all_child_pathkeys, - List *partitioned_rels) + List *all_child_pathkeys) { ListCell *lcp; List *partition_pathkeys = NIL; List *partition_pathkeys_desc = NIL; bool partition_pathkeys_partial = true; bool partition_pathkeys_desc_partial = true; - List *startup_partitioned_rels = NIL; - List *total_partitioned_rels = NIL; - bool flatten_partitioned_rels; - - /* Set up the method for building the partitioned rels lists */ - flatten_partitioned_rels = (rel->rtekind != RTE_SUBQUERY); - - if (rel->reloptkind != RELOPT_JOINREL && rel->part_scheme != NULL) - { - startup_partitioned_rels = list_make1(bms_make_singleton(rel->relid)); - total_partitioned_rels = list_make1(bms_make_singleton(rel->relid)); - } /* * Some partitioned table setups may allow us to use an Append node @@ -1878,13 +1819,9 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel, * child paths for the MergeAppend. */ accumulate_append_subpath(cheapest_startup, - &startup_subpaths, NULL, - &startup_partitioned_rels, - flatten_partitioned_rels); + &startup_subpaths, NULL); accumulate_append_subpath(cheapest_total, - &total_subpaths, NULL, - &total_partitioned_rels, - flatten_partitioned_rels); + &total_subpaths, NULL); } } @@ -1900,7 +1837,6 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel, NULL, 0, false, - startup_partitioned_rels, -1)); if (startup_neq_total) add_path(rel, (Path *) create_append_path(root, @@ -1911,7 +1847,6 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel, NULL, 0, false, - total_partitioned_rels, -1)); } else @@ -1921,15 +1856,13 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel, rel, startup_subpaths, pathkeys, - NULL, - startup_partitioned_rels)); + NULL)); if (startup_neq_total) add_path(rel, (Path *) create_merge_append_path(root, rel, total_subpaths, pathkeys, - NULL, - total_partitioned_rels)); + NULL)); } } } @@ -2008,54 +1941,6 @@ get_cheapest_parameterized_child_path(PlannerInfo *root, RelOptInfo *rel, return cheapest; } -/* - * accumulate_partitioned_rels - * Record 'sub_partitioned_rels' in the 'partitioned_rels' list, - * flattening as appropriate. - */ -static List * -accumulate_partitioned_rels(List *partitioned_rels, - List *sub_partitioned_rels, - bool flatten) -{ - if (flatten) - { - /* - * We're only called with flatten == true when the partitioned_rels - * list has at most 1 element. So we can just add the members from - * sub list's first element onto the first element of - * partitioned_rels. Only later in planning when doing UNION ALL - * Append processing will we see flatten == false. partitioned_rels - * may end up with more than 1 element then, but we never expect to be - * called with flatten == true again after that, so we needn't bother - * doing anything here for anything but the initial element. - */ - if (partitioned_rels != NIL && sub_partitioned_rels != NIL) - { - Relids partrels = (Relids) linitial(partitioned_rels); - Relids subpartrels = (Relids) linitial(sub_partitioned_rels); - - /* Ensure the above comment holds true */ - Assert(list_length(partitioned_rels) == 1); - Assert(list_length(sub_partitioned_rels) == 1); - - linitial(partitioned_rels) = bms_add_members(partrels, subpartrels); - } - } - else - { - /* - * Handle UNION ALL to partitioned tables. This always occurs after - * we've done the accumulation for sub-partitioned tables, so there's - * no need to consider how adding multiple elements to the top level - * list affects the flatten == true case above. - */ - partitioned_rels = list_concat(partitioned_rels, sub_partitioned_rels); - } - - return partitioned_rels; -} - /* * accumulate_append_subpath * Add a subpath to the list being built for an Append or MergeAppend. @@ -2076,24 +1961,9 @@ accumulate_partitioned_rels(List *partitioned_rels, * children to subpaths and the rest to special_subpaths. If the latter is * NULL, we don't flatten the path at all (unless it contains only partial * paths). - * - * When pulling up sub-Appends and sub-Merge Appends, we also gather the - * path's list of partitioned tables and store in 'partitioned_rels'. The - * exact behavior here depends on the value of 'flatten_partitioned_rels'. - * - * When 'flatten_partitioned_rels' is true, 'partitioned_rels' will contain at - * most one element which is a Relids of the partitioned relations which there - * are subpaths for. In this case, we just add the RT indexes for the - * partitioned tables for the subpath we're pulling up to the single entry in - * 'partitioned_rels'. When 'flatten_partitioned_rels' is false we - * concatenate the path's partitioned rel list onto the top-level list. This - * done for UNION ALLs which could have a partitioned table in each union - * branch. */ static void -accumulate_append_subpath(Path *path, List **subpaths, List **special_subpaths, - List **partitioned_rels, - bool flatten_partitioned_rels) +accumulate_append_subpath(Path *path, List **subpaths, List **special_subpaths) { if (IsA(path, AppendPath)) { @@ -2102,9 +1972,6 @@ accumulate_append_subpath(Path *path, List **subpaths, List **special_subpaths, if (!apath->path.parallel_aware || apath->first_partial_path == 0) { *subpaths = list_concat(*subpaths, apath->subpaths); - *partitioned_rels = accumulate_partitioned_rels(*partitioned_rels, - apath->partitioned_rels, - flatten_partitioned_rels); return; } else if (special_subpaths != NULL) @@ -2120,9 +1987,6 @@ accumulate_append_subpath(Path *path, List **subpaths, List **special_subpaths, apath->first_partial_path); *special_subpaths = list_concat(*special_subpaths, new_special_subpaths); - *partitioned_rels = accumulate_partitioned_rels(*partitioned_rels, - apath->partitioned_rels, - flatten_partitioned_rels); return; } } @@ -2131,9 +1995,6 @@ accumulate_append_subpath(Path *path, List **subpaths, List **special_subpaths, MergeAppendPath *mpath = (MergeAppendPath *) path; *subpaths = list_concat(*subpaths, mpath->subpaths); - *partitioned_rels = accumulate_partitioned_rels(*partitioned_rels, - mpath->partitioned_rels, - flatten_partitioned_rels); return; } @@ -2195,7 +2056,7 @@ set_dummy_rel_pathlist(RelOptInfo *rel) /* Set up the dummy path */ add_path(rel, (Path *) create_append_path(NULL, rel, NIL, NIL, NIL, rel->lateral_relids, - 0, false, NIL, -1)); + 0, false, -1)); /* * We set the cheapest-path fields immediately, just in case they were diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c index e19d2e8228..0dbe2ac726 100644 --- a/src/backend/optimizer/path/joinrels.c +++ b/src/backend/optimizer/path/joinrels.c @@ -1279,7 +1279,7 @@ mark_dummy_rel(RelOptInfo *rel) /* Set up the dummy path */ add_path(rel, (Path *) create_append_path(NULL, rel, NIL, NIL, NIL, rel->lateral_relids, - 0, false, NIL, -1)); + 0, false, -1)); /* Set or update cheapest_total_path and related fields */ set_cheapest(rel); diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index 25d4750ca6..6c8305c977 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -1227,8 +1227,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags) * pruning during execution. Gather information needed by the executor to * do partition pruning. */ - if (enable_partition_pruning && - best_path->partitioned_rels != NIL) + if (enable_partition_pruning) { List *prunequal; @@ -1249,7 +1248,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags) partpruneinfo = make_partition_pruneinfo(root, rel, best_path->subpaths, - best_path->partitioned_rels, prunequal); } @@ -1393,8 +1391,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path, * pruning during execution. Gather information needed by the executor to * do partition pruning. */ - if (enable_partition_pruning && - best_path->partitioned_rels != NIL) + if (enable_partition_pruning) { List *prunequal; @@ -1414,7 +1411,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path, if (prunequal != NIL) partpruneinfo = make_partition_pruneinfo(root, rel, best_path->subpaths, - best_path->partitioned_rels, prunequal); } diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 4e6497ff32..adf68d8790 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -1726,7 +1726,7 @@ inheritance_planner(PlannerInfo *root) /* Make a dummy path, cf set_dummy_rel_pathlist() */ dummy_path = (Path *) create_append_path(NULL, final_rel, NIL, NIL, NIL, NULL, 0, false, - NIL, -1); + -1); /* These lists must be nonempty to make a valid ModifyTable node */ subpaths = list_make1(dummy_path); @@ -4006,7 +4006,6 @@ create_degenerate_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel, NULL, 0, false, - NIL, -1); } else diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index 86f794c193..becdcbb872 100644 --- a/src/backend/optimizer/prep/prepunion.c +++ b/src/backend/optimizer/prep/prepunion.c @@ -620,7 +620,7 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root, * Append the child results together. */ path = (Path *) create_append_path(root, result_rel, pathlist, NIL, - NIL, NULL, 0, false, NIL, -1); + NIL, NULL, 0, false, -1); /* * For UNION ALL, we just need the Append path. For UNION, need to add @@ -677,7 +677,7 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root, create_append_path(root, result_rel, NIL, partial_pathlist, NIL, NULL, parallel_workers, enable_parallel_append, - NIL, -1); + -1); ppath = (Path *) create_gather_path(root, result_rel, ppath, result_rel->reltarget, NULL, NULL); @@ -787,7 +787,7 @@ generate_nonunion_paths(SetOperationStmt *op, PlannerInfo *root, * Append the child results together. */ path = (Path *) create_append_path(root, result_rel, pathlist, NIL, - NIL, NULL, 0, false, NIL, -1); + NIL, NULL, 0, false, -1); /* Identify the grouping semantics */ groupList = generate_setop_grouplist(op, tlist); diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index d465b9e213..9be0c4a6af 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -1217,7 +1217,7 @@ create_append_path(PlannerInfo *root, List *subpaths, List *partial_subpaths, List *pathkeys, Relids required_outer, int parallel_workers, bool parallel_aware, - List *partitioned_rels, double rows) + double rows) { AppendPath *pathnode = makeNode(AppendPath); ListCell *l; @@ -1230,15 +1230,14 @@ create_append_path(PlannerInfo *root, /* * When generating an Append path for a partitioned table, there may be - * parameters that are useful so we can eliminate certain partitions - * during execution. Here we'll go all the way and fully populate the - * parameter info data as we do for normal base relations. However, we - * need only bother doing this for RELOPT_BASEREL rels, as - * RELOPT_OTHER_MEMBER_REL's Append paths are merged into the base rel's - * Append subpaths. It would do no harm to do this, we just avoid it to - * save wasting effort. + * parameterized quals that are useful for run-time pruning. Hence, + * compute path.param_info the same way as for any other baserel, so that + * such quals will be available for make_partition_pruneinfo(). (This + * would not work right for a non-baserel, ie a scan on a non-leaf child + * partition, and it's not necessary anyway in that case. Must skip it if + * we don't have "root", too.) */ - if (partitioned_rels != NIL && root && rel->reloptkind == RELOPT_BASEREL) + if (root && rel->reloptkind == RELOPT_BASEREL && IS_PARTITIONED_REL(rel)) pathnode->path.param_info = get_baserel_parampathinfo(root, rel, required_outer); @@ -1250,7 +1249,6 @@ create_append_path(PlannerInfo *root, pathnode->path.parallel_safe = rel->consider_parallel; pathnode->path.parallel_workers = parallel_workers; pathnode->path.pathkeys = pathkeys; - pathnode->partitioned_rels = list_copy(partitioned_rels); /* * For parallel append, non-partial paths are sorted by descending total @@ -1378,8 +1376,7 @@ create_merge_append_path(PlannerInfo *root, RelOptInfo *rel, List *subpaths, List *pathkeys, - Relids required_outer, - List *partitioned_rels) + Relids required_outer) { MergeAppendPath *pathnode = makeNode(MergeAppendPath); Cost input_startup_cost; @@ -1395,7 +1392,6 @@ create_merge_append_path(PlannerInfo *root, pathnode->path.parallel_safe = rel->consider_parallel; pathnode->path.parallel_workers = 0; pathnode->path.pathkeys = pathkeys; - pathnode->partitioned_rels = list_copy(partitioned_rels); pathnode->subpaths = subpaths; /* @@ -3848,7 +3844,6 @@ reparameterize_path(PlannerInfo *root, Path *path, apath->path.pathkeys, required_outer, apath->path.parallel_workers, apath->path.parallel_aware, - apath->partitioned_rels, -1); } default: diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c index c08cfdaaaa..1ea2d1db11 100644 --- a/src/backend/partitioning/partprune.c +++ b/src/backend/partitioning/partprune.c @@ -220,7 +220,7 @@ static void partkey_datum_from_expr(PartitionPruneContext *context, */ PartitionPruneInfo * make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, - List *subpaths, List *unused_partitioned_rels, + List *subpaths, List *prunequal) { PartitionPruneInfo *pruneinfo; diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index cde2637798..0ec93e648c 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -1403,9 +1403,6 @@ typedef struct CustomPath typedef struct AppendPath { Path path; - List *partitioned_rels; /* List of Relids containing RT indexes of - * non-leaf tables for each partition - * hierarchy whose paths are in 'subpaths' */ List *subpaths; /* list of component Paths */ /* Index of first partial path in subpaths; list_length(subpaths) if none */ int first_partial_path; @@ -1430,9 +1427,6 @@ extern bool is_dummy_rel(RelOptInfo *rel); typedef struct MergeAppendPath { Path path; - List *partitioned_rels; /* List of Relids containing RT indexes of - * non-leaf tables for each partition - * hierarchy whose paths are in 'subpaths' */ List *subpaths; /* list of component Paths */ double limit_tuples; /* hard limit on output tuples, or -1 */ } MergeAppendPath; diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h index 23dec14cbd..8dfc36a4e1 100644 --- a/src/include/optimizer/pathnode.h +++ b/src/include/optimizer/pathnode.h @@ -67,13 +67,12 @@ extern AppendPath *create_append_path(PlannerInfo *root, RelOptInfo *rel, List *subpaths, List *partial_subpaths, List *pathkeys, Relids required_outer, int parallel_workers, bool parallel_aware, - List *partitioned_rels, double rows); + double rows); extern MergeAppendPath *create_merge_append_path(PlannerInfo *root, RelOptInfo *rel, List *subpaths, List *pathkeys, - Relids required_outer, - List *partitioned_rels); + Relids required_outer); extern GroupResultPath *create_group_result_path(PlannerInfo *root, RelOptInfo *rel, PathTarget *target, diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h index 1f9544c0dc..5f51e73a4d 100644 --- a/src/include/partitioning/partprune.h +++ b/src/include/partitioning/partprune.h @@ -71,7 +71,6 @@ typedef struct PartitionPruneContext extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root, struct RelOptInfo *parentrel, List *subpaths, - List *partitioned_rels, List *prunequal); extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel); extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
On Sun, 31 Jan 2021 at 11:42, Tom Lane <tgl@sss.pgh.pa.us> wrote: > This fixes the cases reported by Andreas and Jaime, leaving me > more confident that there's nothing wrong with David's Assert. I agree that there is nothing wrong with the Assert. The commit message of a929e17e5 mentions: > Here we tighten that up so that partitioned_rels only ever contains the RT > index for partitioned tables which actually have subpaths in the given > Append/MergeAppend. We can now Assert that every PartitionedRelPruneInfo > has a non-empty present_parts. That should allow us to catch any weird > corner cases that have been missed. It seems in this case the Assert *did* find a case that I missed. Unfortunately, I just missed the email that reported the problem, until yesterday. > I did not try to add a regression test case, mainly because it's > not quite clear to me where the original bug is. (I'm a little > suspicious that the blame might lie with the "match_partition_order" > cases in generate_orderedappend_paths, which try to bypass > accumulate_append_subpath without updating the partitioned_rels > data. But I'm not excited about trying to prove that.) Yeah, that suspicion is correct. More specifically when get_singleton_append_subpath() finds a single subpath Append / MergeAppend, the single subpath is returned. So what's happening is that we first build the Append paths for the sub-partitioned table which, in the example case, will have a single subpath Append path with partitioned_rels containing just the parent of that partition (i.e trigger_parted_p1). When it comes to doing generate_orderedappend_paths() for the base relation (trigger_parted), we find match_partition_order to be true and allow get_singleton_append_subpath() to pullup the single-subplan Append path. Unfortunately we just completely ignore that Append path's partitioned_rels. Back in generate_orderedappend_paths() the startup_partitioned_rels and total_partitioned_rels variables are used, both of which only mention the trigger_parted table. The mention of trigger_parted_p1 is lost. It could be fixed by modifying get_singleton_append_subpath() to modify the partitioned_rels when it collapses these single-subpath Append/MergeAppend path, but I'm very open to looking at just getting rid of the partitioned_rels field. Prior to a929e17e5 that field was very loosely set and in serval cases could contain RT indexes of partitioned tables that didn't even have any subpaths in the given Append/MergeAppend. a929e17e5 attempted to tighten all that up but looks like I missed the case above. > I wonder whether we should consider back-patching this. Another > thing that seems unclear is whether there is any serious consequence > to omitting some intermediate partitions from the set considered > by make_partition_pruneinfo. Presumably it could lead to missing > some potential run-time-pruning opportunities, but is there any > worse issue? If there isn't, this is a bigger change than I want > to put in the back braches. It shouldn't be backpatched. a929e17e5 only exists in master. Prior to that AppendPath/MergeAppendPath's partitioned_rels field could only contain additional partitioned table RT index. There are no known cases of missing ones prior to a929e17e5, so this bug shouldn't exist in PG13. a929e17e5 introduced run-time partition pruning for sub-Append/MergeAppend paths. The commit message of that explains that there are plenty of legitimate cases where we can't flatten these sub-Append/MergeAppend paths down into the top-level Append/MergeAppend. Originally I had thought we should only bother doing run-time pruning on the top-level Append/MergeAppend because I thought these cases were very uncommon. I've now changed my mind. For a929e17e5, it was not just a matter of removing the lines in [1] to allow run-time pruning on nested Append/MergeAppends. I also needed to clean up the sloppy setting of partitioned_rels. The remainder of the patch attempted that. FWIW, I hacked together a patch which fixes the bug by passing a Bitmapset ** pointer to get_singleton_append_subpath(), which set the bits for the Append / MergeAppend path's partitioned_rels that we get rid of when it only has a single subpath. This stops the Assert failure mentioned here. However, I'd much rather explore getting rid of partitioned_rels completely. I'll now have a look at the patch you're proposing for that. Thanks for investigating this and writing the patch. Apologies for this email missing my attention closer to the time to when it was initially reported. David [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=src/backend/optimizer/plan/createplan.c;h=40abe6f9f623ed2922ccc4e1991e97e90322f47d;hp=94280a730c4d9abb2143416fca8e74e76dada042;hb=a929e17e5;hpb=dfc797730fc7a07c0e6bd636ad1a564aecab3161
On Mon, Feb 1, 2021 at 8:50 AM David Rowley <dgrowleyml@gmail.com> wrote: > On Sun, 31 Jan 2021 at 11:42, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > This fixes the cases reported by Andreas and Jaime, leaving me > > more confident that there's nothing wrong with David's Assert. > > It could be fixed by modifying get_singleton_append_subpath() to > modify the partitioned_rels when it collapses these single-subpath > Append/MergeAppend path, but I'm very open to looking at just getting > rid of the partitioned_rels field. Prior to a929e17e5 that field was > very loosely set and in serval cases could contain RT indexes of > partitioned tables that didn't even have any subpaths in the given > Append/MergeAppend. a929e17e5 attempted to tighten all that up but > looks like I missed the case above. > > > I wonder whether we should consider back-patching this. Another > > thing that seems unclear is whether there is any serious consequence > > to omitting some intermediate partitions from the set considered > > by make_partition_pruneinfo. Presumably it could lead to missing > > some potential run-time-pruning opportunities, but is there any > > worse issue? If there isn't, this is a bigger change than I want > > to put in the back braches. > > It shouldn't be backpatched. a929e17e5 only exists in master. Prior to > that AppendPath/MergeAppendPath's partitioned_rels field could only > contain additional partitioned table RT index. There are no known > cases of missing ones prior to a929e17e5, so this bug shouldn't exist > in PG13. > > a929e17e5 introduced run-time partition pruning for > sub-Append/MergeAppend paths. The commit message of that explains > that there are plenty of legitimate cases where we can't flatten these > sub-Append/MergeAppend paths down into the top-level > Append/MergeAppend. Originally I had thought we should only bother > doing run-time pruning on the top-level Append/MergeAppend because I > thought these cases were very uncommon. I've now changed my mind. > > For a929e17e5, it was not just a matter of removing the lines in [1] > to allow run-time pruning on nested Append/MergeAppends. I also needed > to clean up the sloppy setting of partitioned_rels. The remainder of > the patch attempted that. > > FWIW, I hacked together a patch which fixes the bug by passing a > Bitmapset ** pointer to get_singleton_append_subpath(), which set the > bits for the Append / MergeAppend path's partitioned_rels that we get > rid of when it only has a single subpath. This stops the Assert > failure mentioned here. However, I'd much rather explore getting rid > of partitioned_rels completely. I'll now have a look at the patch > you're proposing for that. I've read Tom's patch (0001) and would definitely vote for that over having partitioned_rels in Paths anymore. > Thanks for investigating this and writing the patch. +1. My apologies as well for failing to notice this thread sooner. -- Amit Langote EDB: http://www.enterprisedb.com
On Sun, 31 Jan 2021 at 11:42, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > For simplicity of review I divided the patch into two parts. > 0001 revises make_partition_pruneinfo() and children to identify > the relevant parent partitions for themselves, which is not too > hard to do by chasing up the child-to-parent AppendRelInfo links. > Not formerly documented, AFAICT, is that we want to collect just > the parent partitions that are between the Append path's own rel > (possibly equal to it) and the subpaths' rels. I'd first tried > to code this by using the top_parent_relids and part_rels[] links > in the RelOptInfos, but that turns out not to work. We might > ascend to a top parent that's above the Append's rel (if considering > an appendpath for a sub-partition, which happens in partitionwise > join). We could also descend to a child at or below some subpath > level, since there are also cases where subpaths correspond to > unflattened non-leaf partitions. Either of those things result > in failures. But once you wrap your head around handling those > restrictions, it's quite simple. I had a look at this one and it all makes sense and the logic for obtaining the lineage of parent partitioned tables seems fine. What I can't understand is why you changed to a List-of-Lists rather than a List-of-Relids. This makes the patch both bigger than it needs to be and the processing quite a bit less efficient. For example, in make_partition_pruneinfo() when you're walking up to the top-level target partitioned table you must lcons() each new list member to ensure the children always come after the parents. These chains are likely to be short so the horrible overheads of the memmove() in lcons() won't cost that much, but there's more to it. The other seemingly needlessly slow part is in the list_concat_unique_ptr() call. This needs to be done for every subpath in the Append/Merge append. It would be good to get rid of that. Given, the list are most likely going to be small, but that's still a quadratic function, so it seems like a good idea to try to avoid using it if there is another way to do it. The memory allocations could also be more efficient for Relids rather than Lists. Since we're working up from the child to the parent in the lineage calculation code in make_partition_pruneinfo(), we'll always allocate the Bitmapset to the correct size right away, rather than possibly having to increase the size if the next RT index were not to fit in the current number of words. Of course, with a List-of-Lists, not every lcons() would require a new allocation, but there's an above zero chance that it might. I've attached a version of your 0001 patch which just maintains using a List-of-Relids. This shrinks the diff down about 3kb. Parent RT indexes are guaranteed to be lower than their children RT indexes, so it's pretty simple to figure out the target RT index by just looking at the lowest set bit. Doing it this way also simplifies things as add_part_rel_list() no longer must insist that the sublists are in parent-to-child order. David
Вложения
David Rowley <dgrowleyml@gmail.com> writes: > What I can't understand is why you changed to a List-of-Lists rather > than a List-of-Relids. Yeah, I spent no effort on micro-optimizing the data structure. I figured that since we were not including leaf partitions, there would never be enough rels involved to worry about. Perhaps that's wrong though. > Parent RT indexes are guaranteed to be lower than their children RT > indexes, I was intentionally avoiding that assumption ;-). Maybe it buys enough to be worth the loss of generality, but ... regards, tom lane
I wrote: > David Rowley <dgrowleyml@gmail.com> writes: >> Parent RT indexes are guaranteed to be lower than their children RT >> indexes, > I was intentionally avoiding that assumption ;-). Maybe it buys enough > to be worth the loss of generality, but ... Oh, it's too late at night. I now remember that the real problem I had with that representation was that it cannot work for joinrels. Currently we only apply this logic to partitioned baserels, but don't you think it might someday be called on to optimize partitionwise joins? regards, tom lane
On Mon, 1 Feb 2021 at 18:57, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > >> Parent RT indexes are guaranteed to be lower than their children RT > >> indexes, > > > I was intentionally avoiding that assumption ;-). Maybe it buys enough > > to be worth the loss of generality, but ... > > Oh, it's too late at night. I now remember that the real problem > I had with that representation was that it cannot work for joinrels. > Currently we only apply this logic to partitioned baserels, but > don't you think it might someday be called on to optimize > partitionwise joins? I've not looked in detail, but I think the code would need a pretty big overhaul before that could happen. For example, ever since we allowed ATTACH PARTITION to work without taking an AEL we now have a PartitionedRelPruneInfo.relid_map field that stores Oids for the executor to look at to see if it can figure out if a partition has been added since the plan was generated. Not sure how that can work with non-base rels as we have no Oid for join rels. Perhaps I'm just not thinking hard enough, but either way, it does seem like it would take a pretty big hit with a hammer to make it work. My current thinking is that being unable to represent join rels in a set of Relids is fairly insignificant compared to what would be required to get the feature to work correctly. David
David Rowley <dgrowleyml@gmail.com> writes: > On Mon, 1 Feb 2021 at 18:57, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Oh, it's too late at night. I now remember that the real problem >> I had with that representation was that it cannot work for joinrels. >> Currently we only apply this logic to partitioned baserels, but >> don't you think it might someday be called on to optimize >> partitionwise joins? > I've not looked in detail, but I think the code would need a pretty > big overhaul before that could happen. Yeah, there's no doubt that a lot of other work would be needed too; I'd just figured that maybe this code could be ready for it. But on looking at it again, I agree that the bitmapset approach is simpler and cheaper than what I did. I realized that a good bit of what I did not like about the pre-existing logic is that it was calling the values "Relids" rather than Bitmapsets. To my mind, the Relids typedef is meant to denote values that identify individual relations (which might be joins) to the planner. The values we're dealing with in this code do *not* correspond to anything that is or could be a RelOptInfo; rather they are sets of distinct relations. It's fine to use a Bitmapset if it's a convenient representation, but we should call it that and not a Relids. I renamed things that way, did some more work on the comments, and pushed it. Thanks for reviewing! regards, tom lane
On Tue, 2 Feb 2021 at 08:58, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I renamed things that way, did some more work on the comments, > and pushed it. Thanks for reviewing! Thanks for working on this and coming up with the idea to nuke partitioned_rels. David