Обсуждение: Killing off removed rels properly

Поиск
Список
Период
Сортировка

Killing off removed rels properly

От
Tom Lane
Дата:
Outer-join removal does this:

    /*
     * Mark the rel as "dead" to show it is no longer part of the join tree.
     * (Removing it from the baserel array altogether seems too risky.)
     */
    rel->reloptkind = RELOPT_DEADREL;

which apparently I thought was a good idea in 2010 (cf b78f6264e),
but looking at it now it just seems like an invitation to fail to
detect bugs.  We've had a couple of recent reports of indxpath.c
failing like this:

Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig(at)entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f0b57bc6859 in __GI_abort () at abort.c:79
#2  0x0000555ec56d3ff3 in ExceptionalCondition (conditionName=0x555ec5887ff0
"outer_rel->rows > 0", fileName=0x555ec5887f2c "indxpath.c",
lineNumber=1909) at assert.c:66
#3  0x0000555ec538a67a in get_loop_count (root=0x555ec5f72680, cur_relid=3,
outer_relids=0x555ec5f93960) at indxpath.c:1909
#4  0x0000555ec5388b5e in build_index_paths (root=0x555ec5f72680,
rel=0x555ec5f8f648, index=0x555ec5f8ca90, clauses=0x7fffeea57480,
useful_predicate=false, scantype=ST_BITMAPSCAN, skip_nonnative_saop=0x0,
skip_lower_saop=0x0) at indxpath.c:957

This is pretty impenetrable at first glance, but what it boils
down to is something accessing a "dead" rel and taking its contents
at face value.  Fortunately the contents triggered an assertion,
but that hardly seems like something to count on to detect bugs.

I think it's time to clean this up by removing the rel from the
planner data structures altogether.  The attached passes check-world,
and if it does trigger any problems I would say that's a clear
sign of bugs elsewhere.

            regards, tom lane

diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index de335fdb4d..9132ce235f 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -729,8 +729,7 @@ get_eclass_for_sort_expr(PlannerInfo *root,
                 continue;
             }

-            Assert(rel->reloptkind == RELOPT_BASEREL ||
-                   rel->reloptkind == RELOPT_DEADREL);
+            Assert(rel->reloptkind == RELOPT_BASEREL);

             rel->eclass_indexes = bms_add_member(rel->eclass_indexes,
                                                  ec_index);
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 7fd11df9af..0dfefd71f2 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -331,12 +331,6 @@ remove_rel_from_query(PlannerInfo *root, int relid, int ojrelid,
     Index        rti;
     ListCell   *l;

-    /*
-     * Mark the rel as "dead" to show it is no longer part of the join tree.
-     * (Removing it from the baserel array altogether seems too risky.)
-     */
-    rel->reloptkind = RELOPT_DEADREL;
-
     /*
      * Remove references to the rel from other baserels' attr_needed arrays.
      */
@@ -487,6 +481,16 @@ remove_rel_from_query(PlannerInfo *root, int relid, int ojrelid,
      * There may be references to the rel in root->fkey_list, but if so,
      * match_foreign_keys_to_quals() will get rid of them.
      */
+
+    /*
+     * Finally, remove the rel from the baserel array to prevent it from being
+     * referenced again.  (We can't do this earlier because
+     * remove_join_clause_from_rels will touch it.)
+     */
+    root->simple_rel_array[relid] = NULL;
+
+    /* And nuke the RelOptInfo, just in case there's another access path */
+    pfree(rel);
 }

 /*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0d4b1ec4e4..278f7ae80e 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -823,8 +823,7 @@ typedef enum RelOptKind
     RELOPT_OTHER_MEMBER_REL,
     RELOPT_OTHER_JOINREL,
     RELOPT_UPPER_REL,
-    RELOPT_OTHER_UPPER_REL,
-    RELOPT_DEADREL
+    RELOPT_OTHER_UPPER_REL
 } RelOptKind;

 /*

Re: Killing off removed rels properly

От
Richard Guo
Дата:

On Sat, Feb 11, 2023 at 4:50 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think it's time to clean this up by removing the rel from the
planner data structures altogether.  The attached passes check-world,
and if it does trigger any problems I would say that's a clear
sign of bugs elsewhere.

+1. The patch looks good to me.  One minor comment is that we should
also remove the comments about RELOPT_DEADREL in pathnodes.h.

 * Lastly, there is a RelOptKind for "dead" relations, which are base rels
 * that we have proven we don't need to join after all.

Thanks
Richard 

Re: Killing off removed rels properly

От
Tom Lane
Дата:
Richard Guo <guofenglinux@gmail.com> writes:
> On Sat, Feb 11, 2023 at 4:50 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think it's time to clean this up by removing the rel from the
>> planner data structures altogether.  The attached passes check-world,
>> and if it does trigger any problems I would say that's a clear
>> sign of bugs elsewhere.

> +1. The patch looks good to me.

Pushed, thanks for looking at it!

> One minor comment is that we should
> also remove the comments about RELOPT_DEADREL in pathnodes.h.

Yeah, I noticed that shortly after posting the patch :-(.  Searching
the optimizer code for other references to "dead" rels turned up another
place where the comments need fixed, namely match_foreign_keys_to_quals
... which is someplace I should have thought to check before, given the
reference to it in remove_rel_from_query.  Its code is fine as-is though.

            regards, tom lane



Re: Killing off removed rels properly

От
Alexander Lakhin
Дата:
Hello Tom,

13.02.2023 21:39, Tom Lane wrote:
> Richard Guo <guofenglinux@gmail.com> writes:
>> On Sat, Feb 11, 2023 at 4:50 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I think it's time to clean this up by removing the rel from the
>>> planner data structures altogether.  The attached passes check-world,
>>> and if it does trigger any problems I would say that's a clear
>>> sign of bugs elsewhere.
After this change the following query triggers an assert:

CREATE TABLE tt (tid integer PRIMARY KEY) PARTITION BY LIST (tid);
CREATE TABLE ttp PARTITION OF tt DEFAULT;
CREATE TABLE st (sid integer);

MERGE INTO tt USING st ON tt.tid = st.sid WHEN NOT MATCHED THEN INSERT 
VALUES (st.sid);
...
#5  0x0000556fe84647eb in ExceptionalCondition 
(conditionName=0x556fe8619a46 "operation != CMD_MERGE",
     fileName=0x556fe8618b73 "createplan.c", lineNumber=7121) at assert.c:66
#6  0x0000556fe8126502 in make_modifytable (root=0x556fe945be40, 
subplan=0x556fe9474420, operation=CMD_MERGE,
     canSetTag=true, nominalRelation=1, rootRelation=1, 
partColsUpdated=false, resultRelations=0x556fe9475bb0,
     updateColnosLists=0x0, withCheckOptionLists=0x0, 
returningLists=0x0, rowMarks=0x0, onconflict=0x0,
     mergeActionLists=0x556fe9475c00, epqParam=0) at createplan.c:7121
#7  0x0000556fe811d479 in create_modifytable_plan (root=0x556fe945be40, 
best_path=0x556fe9474a40)
     at createplan.c:2820
#8  0x0000556fe811912a in create_plan_recurse (root=0x556fe945be40, 
best_path=0x556fe9474a40, flags=1)
     at createplan.c:530
#9  0x0000556fe8118ca8 in create_plan (root=0x556fe945be40, 
best_path=0x556fe9474a40) at createplan.c:347
#10 0x0000556fe812d4fd in standard_planner (parse=0x556fe937c2d0,
     query_string=0x556fe937b178 "MERGE INTO tt USING st ON tt.tid = 
st.sid WHEN NOT MATCHED THEN INSERT VALUES (st.sid);", 
cursorOptions=2048, boundParams=0x0) at planner.c:418
...

It seems that before e9a20e451 the other branch of the following 
condition in make_modifytable() was executed:
         /*
          * If possible, we want to get the FdwRoutine from our 
RelOptInfo for
          * the table.  But sometimes we don't have a RelOptInfo and 
must get
          * it the hard way.  (In INSERT, the target relation is not 
scanned,
          * so it's not a baserel; and there are also corner cases for
          * updatable views where the target rel isn't a baserel.)
          */
         if (rti < root->simple_rel_array_size &&
             root->simple_rel_array[rti] != NULL)
         {
...

Best regards,
Alexander



Re: Killing off removed rels properly

От
Tom Lane
Дата:
Alexander Lakhin <exclusion@gmail.com> writes:
> After this change the following query triggers an assert:

> CREATE TABLE tt (tid integer PRIMARY KEY) PARTITION BY LIST (tid);
> CREATE TABLE ttp PARTITION OF tt DEFAULT;
> CREATE TABLE st (sid integer);

> MERGE INTO tt USING st ON tt.tid = st.sid WHEN NOT MATCHED THEN INSERT 
> VALUES (st.sid);

Hmph.  Yeah, I think that's just wrong: the cases of found-a-baserel
and didn't-find-a-baserel should be treating MERGE-rejection identically.
This is probably broken even before e9a20e451.

Thanks for the report!

            regards, tom lane



Re: Killing off removed rels properly

От
Tom Lane
Дата:
Hmm, there's something else going on here.  After getting rid of the
assertion failure, I see that the plan looks like

# explain MERGE INTO tt USING st ON tt.tid = st.sid WHEN NOT MATCHED THEN INSERT
VALUES (st.sid);
                         QUERY PLAN
-------------------------------------------------------------
 Merge on tt  (cost=0.00..35.50 rows=0 width=0)
   ->  Seq Scan on st  (cost=0.00..35.50 rows=2550 width=10)
(2 rows)

which is fairly nonsensical and doesn't match v15's plan:

 Merge on tt  (cost=0.15..544.88 rows=0 width=0)
   Merge on ttp tt_1
   ->  Nested Loop Left Join  (cost=0.15..544.88 rows=32512 width=14)
         ->  Seq Scan on st  (cost=0.00..35.50 rows=2550 width=4)
         ->  Index Scan using ttp_pkey on ttp tt_1  (cost=0.15..0.19 rows=1 widt
h=14)
               Index Cond: (tid = st.sid)

It looks like we're somehow triggering the elide-a-left-join code
when we shouldn't?  That explains why the target table's RelOptInfo
has gone missing and broken make_modifytable's expectations.
That code is still unnecessarily fragile so I intend to rearrange it,
but there's more to do here.

            regards, tom lane



Re: Killing off removed rels properly

От
Tom Lane
Дата:
I wrote:
> Hmm, there's something else going on here.  After getting rid of the
> assertion failure, I see that the plan looks like

> # explain MERGE INTO tt USING st ON tt.tid = st.sid WHEN NOT MATCHED THEN INSERT
> VALUES (st.sid);
>                          QUERY PLAN
> -------------------------------------------------------------
>  Merge on tt  (cost=0.00..35.50 rows=0 width=0)
>    ->  Seq Scan on st  (cost=0.00..35.50 rows=2550 width=10)
> (2 rows)

> which is fairly nonsensical and doesn't match v15's plan:

>  Merge on tt  (cost=0.15..544.88 rows=0 width=0)
>    Merge on ttp tt_1
>    ->  Nested Loop Left Join  (cost=0.15..544.88 rows=32512 width=14)
>          ->  Seq Scan on st  (cost=0.00..35.50 rows=2550 width=4)
>          ->  Index Scan using ttp_pkey on ttp tt_1  (cost=0.15..0.19 rows=1 widt
> h=14)
>                Index Cond: (tid = st.sid)

> It looks like we're somehow triggering the elide-a-left-join code
> when we shouldn't?

A quick bisect session shows that this broke at

3c569049b7b502bb4952483d19ce622ff0af5fd6 is the first bad commit
commit 3c569049b7b502bb4952483d19ce622ff0af5fd6
Author: David Rowley <drowley@postgresql.org>
Date:   Mon Jan 9 17:15:08 2023 +1300

    Allow left join removals and unique joins on partitioned tables

but I suspect that that's merely exposed a pre-existing deficiency
in MERGE planning.  ttp should not have been a candidate for join
removal, because the plan should require fetching (at least) its
ctid.  I suspect that somebody cowboy-coded the MERGE support in
such a way that the required row identity vars don't get added to
relation targetlists, or at least not till too late to stop join
removal.  I've not run it to earth though.

But while I'm looking at this, 3c569049b seems kind of broken on
its own terms.  Why is it populating so little of the IndexOptInfo
for a partitioned index?  I realize that we're not going to directly
plan anything using such an index, but not populating fields like
sortopfamily seems like it's at least leaving stuff on the table,
and at worst making faulty decisions.

            regards, tom lane



Re: Killing off removed rels properly

От
Tom Lane
Дата:
I wrote:
>> It looks like we're somehow triggering the elide-a-left-join code
>> when we shouldn't?

So the reason why we see this with a partitioned target table and not
with a regular target table reduces to this bit in preprocess_targetlist:

    /*
     * For non-inherited UPDATE/DELETE/MERGE, register any junk column(s)
     * needed to allow the executor to identify the rows to be updated or
     * deleted.  In the inheritance case, we do nothing now, leaving this to
     * be dealt with when expand_inherited_rtentry() makes the leaf target
     * relations.  (But there might not be any leaf target relations, in which
     * case we must do this in distribute_row_identity_vars().)
     */
    if ((command_type == CMD_UPDATE || command_type == CMD_DELETE ||
         command_type == CMD_MERGE) &&
        !target_rte->inh)
    {
        /* row-identity logic expects to add stuff to processed_tlist */
        root->processed_tlist = tlist;
        add_row_identity_columns(root, result_relation,
                                 target_rte, target_relation);
        tlist = root->processed_tlist;
    }

This happens before join removal, so that we see use of the row identity
columns of a regular table as a reason not to do join removal.  But
expand_inherited_rtentry will happen after that, too late to stop join
removal.

I think the best fix is just to teach join removal that it mustn't
remove the result relation, as attached (needs a regression test).

I'm a little inclined to back-patch this, even though I think it's
probably unreachable in v15.  It's a cheap enough safety measure,
and at the very least it will save a few cycles deciding that we
can't remove the target table of a MERGE.

            regards, tom lane

diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 0dfefd71f2..f79bc4430c 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -183,6 +183,14 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
     if (!bms_get_singleton_member(sjinfo->min_righthand, &innerrelid))
         return false;

+    /*
+     * Never try to eliminate a left join to the query result rel.  Although
+     * the case is syntactically impossible in standard SQL, MERGE will build
+     * a join tree that looks exactly like that.
+     */
+    if (innerrelid == root->parse->resultRelation)
+        return false;
+
     innerrel = find_base_rel(root, innerrelid);

     /*

Re: Killing off removed rels properly

От
Tom Lane
Дата:
I wrote:
> But while I'm looking at this, 3c569049b seems kind of broken on
> its own terms.  Why is it populating so little of the IndexOptInfo
> for a partitioned index?  I realize that we're not going to directly
> plan anything using such an index, but not populating fields like
> sortopfamily seems like it's at least leaving stuff on the table,
> and at worst making faulty decisions.

I fixed the other issues discussed in this thread, but along the
way I grew even more concerned about 3c569049b, because I discovered
that it's changed plans in more ways than what its commit message
suggests.  For example, given the setup

CREATE TABLE pa_target (tid integer PRIMARY KEY) PARTITION BY LIST (tid);
CREATE TABLE pa_targetp PARTITION OF pa_target DEFAULT;
CREATE TABLE pa_source (sid integer);

then I get this as of commit 3c569049b7^:

# explain select * from pa_source s left join pa_target t on s.sid = t.tid;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.15..544.88 rows=32512 width=8)
   ->  Seq Scan on pa_source s  (cost=0.00..35.50 rows=2550 width=4)
   ->  Index Only Scan using pa_targetp_pkey on pa_targetp t  (cost=0.15..0.19 rows=1 width=4)
         Index Cond: (tid = s.sid)
(4 rows)

and this as of 3c569049b7 and later:

# explain select * from pa_source s left join pa_target t on s.sid = t.tid;
                                 QUERY PLAN
----------------------------------------------------------------------------
 Hash Left Join  (cost=67.38..109.58 rows=2550 width=8)
   Hash Cond: (s.sid = t.tid)
   ->  Seq Scan on pa_source s  (cost=0.00..35.50 rows=2550 width=4)
   ->  Hash  (cost=35.50..35.50 rows=2550 width=4)
         ->  Seq Scan on pa_targetp t  (cost=0.00..35.50 rows=2550 width=4)
(5 rows)

Now, I'm not unhappy about that change: it's clearly a win that we now
realize we'll get at most one matching t row for each s row.  What
I'm unhappy about is that this means a partially-populated IndexOptInfo
is being used for rowcount estimation and perhaps other things.
That seems like sheer folly.  Even if it manages to not dump core
from trying to access a missing field, there's a significant risk of
wrong answers, now or in the future.  Why was it done like that?

            regards, tom lane