Обсуждение: Teach predtest about IS [NOT] proofs

Поиск

Список

Период

Сортировка

Teach predtest about IS [NOT] proofs

От

James Coleman

Дата:

11 декабря 2023 г., 19:59:46

Hello,

I recently encountered a case where partial indexes were surprisingly not being used. The issue is that predtest doesn't understand how boolean values and IS <boolean> expressions relate.

For example if I have:

create table foo(i int, bar boolean);

create index on foo(i) where bar is true;

then this query:

select * from foo where i = 1 and bar;

doesn't use the partial index.

Attached is a patch that solves that issue. It also teaches predtest about quite a few more cases involving BooleanTest expressions (e.g., how they relate to NullTest expressions). One thing I could imagine being an objection is that not all of these warrant cycles in planning. If that turns out to be the case there's not a particularly clear line in my mind about where to draw that line.

As noted in a TODO in the patch itself, I think it may be worth refactoring the test_predtest module to run the "x, y" case as well as the "y, x" case with a single call so as to eliminate a lot of repetition in clause/expression test cases. If reviewers agree that's desirable, then I could do that as a precursor.

Regards,

James Coleman

Вложения

v1-0001-Teach-predtest-about-IS-NOT-bool-proofs.patch

Re: Teach predtest about IS [NOT] proofs

От

Tom Lane

Дата:

13 декабря 2023 г., 18:36:14

James Coleman <jtc331@gmail.com> writes:
> Attached is a patch that solves that issue. It also teaches predtest about
> quite a few more cases involving BooleanTest expressions (e.g., how they
> relate to NullTest expressions). One thing I could imagine being an
> objection is that not all of these warrant cycles in planning. If that
> turns out to be the case there's not a particularly clear line in my mind
> about where to draw that line.

I don't have an objection in principle to adding more smarts to
predtest.c.  However, we should be wary of slowing down cases where
no BooleanTests are present to be optimized.  I wonder if it could
help to use a switch on nodeTag rather than a series of if(IsA())
tests.  (I'd be inclined to rewrite the inner if-then-else chains
as switches too, really.  You get some benefit from the compiler
noticing whether you've covered all the enum values.)

I note you've actively broken the function's ability to cope with
NULL input pointers.  Maybe we don't need it to, but I'm not going
to accept a patch that just side-swipes that case without any
justification.

Another way in which the patch needs more effort is that you've
not bothered to update the large comment block atop the function.
Perhaps, rather than hoping people will notice comments that are
potentially offscreen from what they're modifying, we should relocate
those comment paras to be adjacent to the relevant parts of the
function?

I've not gone through the patch in detail to see whether I believe
the proposed proof rules.  It would help to have more comments
justifying them.

> As noted in a TODO in the patch itself, I think it may be worth refactoring
> the test_predtest module to run the "x, y" case as well as the "y, x" case
> with a single call so as to eliminate a lot of repetition in
> clause/expression test cases. If reviewers agree that's desirable, then I
> could do that as a precursor.

I think that's actively undesirable.  It is not typically the case that
a proof rule for A => B also works in the other direction, so this would
encourage wasting cycles in the tests.  I fear it might also cause
confusion about which direction a proof rule is supposed to work in.

            regards, tom lane

Re: Teach predtest about IS [NOT] proofs

От

James Coleman

Дата:

14 декабря 2023 г., 00:35:01

Thanks for taking a look!

On Wed, Dec 13, 2023 at 1:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> James Coleman <jtc331@gmail.com> writes:
> > Attached is a patch that solves that issue. It also teaches predtest about
> > quite a few more cases involving BooleanTest expressions (e.g., how they
> > relate to NullTest expressions). One thing I could imagine being an
> > objection is that not all of these warrant cycles in planning. If that
> > turns out to be the case there's not a particularly clear line in my mind
> > about where to draw that line.
>
> I don't have an objection in principle to adding more smarts to
> predtest.c.  However, we should be wary of slowing down cases where
> no BooleanTests are present to be optimized.  I wonder if it could
> help to use a switch on nodeTag rather than a series of if(IsA())
> tests.  (I'd be inclined to rewrite the inner if-then-else chains
> as switches too, really.  You get some benefit from the compiler
> noticing whether you've covered all the enum values.)

I think I could take this on; would you prefer it as a patch in this
series? Or as a new patch thread?

> I note you've actively broken the function's ability to cope with
> NULL input pointers.  Maybe we don't need it to, but I'm not going
> to accept a patch that just side-swipes that case without any
> justification.

I should have explained that. I don't think I've broken it:

1. predicate_implied_by_simple_clause() is only ever called by
predicate_implied_by_recurse()
2. predicate_implied_by_recurse() starts with:
    pclass = predicate_classify(predicate, &pred_info);
3. predicate_classify(Node *clause, PredIterInfo info) starts off with:
    Assert(clause != NULL);

I believe this means we are currently guaranteed by the caller to
receive a non-NULL pointer, but I could be missing something.

The same argument (just substituting the equivalent "refute" function
names) applies to predicate_refuted_by_simple_clause().

> Another way in which the patch needs more effort is that you've
> not bothered to update the large comment block atop the function.
> Perhaps, rather than hoping people will notice comments that are
> potentially offscreen from what they're modifying, we should relocate
> those comment paras to be adjacent to the relevant parts of the
> function?

Splitting up that block comment makes sense to me.

> I've not gone through the patch in detail to see whether I believe
> the proposed proof rules.  It would help to have more comments
> justifying them.

Most of them are sufficiently simple -- e.g., X IS TRUE implies X --
that I don't think there's a lot to say in justification. In some
cases I've noted the cases that force only strong or weak implication.

There are a few cases, though, (e.g., "X is unknown weakly implies X
is not true") that, reading over this again, don't immediately strike
me as obvious, so I'll expand on those.

> > As noted in a TODO in the patch itself, I think it may be worth refactoring
> > the test_predtest module to run the "x, y" case as well as the "y, x" case
> > with a single call so as to eliminate a lot of repetition in
> > clause/expression test cases. If reviewers agree that's desirable, then I
> > could do that as a precursor.
>
> I think that's actively undesirable.  It is not typically the case that
> a proof rule for A => B also works in the other direction, so this would
> encourage wasting cycles in the tests.  I fear it might also cause
> confusion about which direction a proof rule is supposed to work in.

That makes sense in the general case.

Boolean expressions seem like a special case in that regard: (subject
to what it looks like) would you be OK with a wrapping function that
does both directions (with output that shows which direction is being
tested) used only for the cases where we do want to check both
directions?

Thanks,
James Coleman

Re: Teach predtest about IS [NOT] proofs

От

Tom Lane

Дата:

14 декабря 2023 г., 21:38:17

James Coleman <jtc331@gmail.com> writes:
> On Wed, Dec 13, 2023 at 1:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I don't have an objection in principle to adding more smarts to
>> predtest.c.  However, we should be wary of slowing down cases where
>> no BooleanTests are present to be optimized.  I wonder if it could
>> help to use a switch on nodeTag rather than a series of if(IsA())
>> tests.  (I'd be inclined to rewrite the inner if-then-else chains
>> as switches too, really.  You get some benefit from the compiler
>> noticing whether you've covered all the enum values.)

> I think I could take this on; would you prefer it as a patch in this
> series? Or as a new patch thread?

No, keep it in the same thread (and make a CF entry, if you didn't
already).  It might be best to make a series of 2 patches, first
just refactoring what's there per this discussion, and then a
second one to add BooleanTest logic.

>> I note you've actively broken the function's ability to cope with
>> NULL input pointers.  Maybe we don't need it to, but I'm not going
>> to accept a patch that just side-swipes that case without any
>> justification.

> [ all callers have previously used predicate_classify ]

OK, fair enough.  The checks for nulls are probably from ancient
habit, but I agree we could remove 'em here.

>> Perhaps, rather than hoping people will notice comments that are
>> potentially offscreen from what they're modifying, we should relocate
>> those comment paras to be adjacent to the relevant parts of the
>> function?

> Splitting up that block comment makes sense to me.

Done, let's make it so.

>> I've not gone through the patch in detail to see whether I believe
>> the proposed proof rules.  It would help to have more comments
>> justifying them.

> Most of them are sufficiently simple -- e.g., X IS TRUE implies X --
> that I don't think there's a lot to say in justification. In some
> cases I've noted the cases that force only strong or weak implication.

Yeah, it's the strong-vs-weak distinction that makes me cautious here.
One's high-school-algebra instinct for what's obviously true tends to
not think about NULL/UNKNOWN, and you do have to consider that.

>>> As noted in a TODO in the patch itself, I think it may be worth refactoring
>>> the test_predtest module to run the "x, y" case as well as the "y, x" case

>> I think that's actively undesirable.  It is not typically the case that
>> a proof rule for A => B also works in the other direction, so this would
>> encourage wasting cycles in the tests.  I fear it might also cause
>> confusion about which direction a proof rule is supposed to work in.

> That makes sense in the general case.

> Boolean expressions seem like a special case in that regard: (subject
> to what it looks like) would you be OK with a wrapping function that
> does both directions (with output that shows which direction is being
> tested) used only for the cases where we do want to check both
> directions?

Using a wrapper where appropriate would remove the inefficiency
concern, but I still worry whether it will promote confusion about
which direction we're proving things in.  You'll need to be very clear
about the labeling.

            regards, tom lane

Re: Teach predtest about IS [NOT] proofs

От

James Coleman

Дата:

22 декабря 2023 г., 15:00:50

On Thu, Dec 14, 2023 at 4:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> James Coleman <jtc331@gmail.com> writes:
> > On Wed, Dec 13, 2023 at 1:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I don't have an objection in principle to adding more smarts to
> >> predtest.c.  However, we should be wary of slowing down cases where
> >> no BooleanTests are present to be optimized.  I wonder if it could
> >> help to use a switch on nodeTag rather than a series of if(IsA())
> >> tests.  (I'd be inclined to rewrite the inner if-then-else chains
> >> as switches too, really.  You get some benefit from the compiler
> >> noticing whether you've covered all the enum values.)
>
> > I think I could take this on; would you prefer it as a patch in this
> > series? Or as a new patch thread?
>
> No, keep it in the same thread (and make a CF entry, if you didn't
> already).  It might be best to make a series of 2 patches, first
> just refactoring what's there per this discussion, and then a
> second one to add BooleanTest logic.

CF entry is already created; I'll keep it here then.

> >> I note you've actively broken the function's ability to cope with
> >> NULL input pointers.  Maybe we don't need it to, but I'm not going
> >> to accept a patch that just side-swipes that case without any
> >> justification.
>
> > [ all callers have previously used predicate_classify ]
>
> OK, fair enough.  The checks for nulls are probably from ancient
> habit, but I agree we could remove 'em here.
>
> >> Perhaps, rather than hoping people will notice comments that are
> >> potentially offscreen from what they're modifying, we should relocate
> >> those comment paras to be adjacent to the relevant parts of the
> >> function?
>
> > Splitting up that block comment makes sense to me.
>
> Done, let's make it so.
>
> >> I've not gone through the patch in detail to see whether I believe
> >> the proposed proof rules.  It would help to have more comments
> >> justifying them.
>
> > Most of them are sufficiently simple -- e.g., X IS TRUE implies X --
> > that I don't think there's a lot to say in justification. In some
> > cases I've noted the cases that force only strong or weak implication.
>
> Yeah, it's the strong-vs-weak distinction that makes me cautious here.
> One's high-school-algebra instinct for what's obviously true tends to
> not think about NULL/UNKNOWN, and you do have to consider that.
>
> >>> As noted in a TODO in the patch itself, I think it may be worth refactoring
> >>> the test_predtest module to run the "x, y" case as well as the "y, x" case
>
> >> I think that's actively undesirable.  It is not typically the case that
> >> a proof rule for A => B also works in the other direction, so this would
> >> encourage wasting cycles in the tests.  I fear it might also cause
> >> confusion about which direction a proof rule is supposed to work in.
>
> > That makes sense in the general case.
>
> > Boolean expressions seem like a special case in that regard: (subject
> > to what it looks like) would you be OK with a wrapping function that
> > does both directions (with output that shows which direction is being
> > tested) used only for the cases where we do want to check both
> > directions?
>
> Using a wrapper where appropriate would remove the inefficiency
> concern, but I still worry whether it will promote confusion about
> which direction we're proving things in.  You'll need to be very clear
> about the labeling.

I've not yet applied all of your feedback, but I wanted to get an
initial read on your thoughts on how using switch statements ends up
looking. Attached is a single (pure refactor) patch that converts the
various if/else levels that check things like node tag and
boolean/null test type into switch statements. I've retained 'default'
keyword usages for now for simplicity (my intuition is that we
generally prefer to list out all options for compiler safety benefits,
though I'm not 100% sure that's useful in the outer node tag check
since it's unlikely that someone adding a new node would modify
this...).

My big question is: are you comfortable with the indentation explosion
this creates? IMO it's a lot wordier, but it is also more obvious what
the structural goal is. I'm not sure how we want to make the right
trade-off though.

Once there's agreement on this part, I'll add back a second patch
applying my changes on top of the refactor as well as apply other
feedback (e.g., splitting up the block comment).

Regards,
James Coleman

Вложения

v2-0001-WIP-use-switch-statements.patch

Re: Teach predtest about IS [NOT] proofs

От

Tom Lane

Дата:

22 декабря 2023 г., 19:48:29

James Coleman <jtc331@gmail.com> writes:
> I've not yet applied all of your feedback, but I wanted to get an
> initial read on your thoughts on how using switch statements ends up
> looking. Attached is a single (pure refactor) patch that converts the
> various if/else levels that check things like node tag and
> boolean/null test type into switch statements. I've retained 'default'
> keyword usages for now for simplicity (my intuition is that we
> generally prefer to list out all options for compiler safety benefits,
> though I'm not 100% sure that's useful in the outer node tag check
> since it's unlikely that someone adding a new node would modify
> this...).

> My big question is: are you comfortable with the indentation explosion
> this creates? IMO it's a lot wordier, but it is also more obvious what
> the structural goal is. I'm not sure how we want to make the right
> trade-off though.

Yeah, I see what you mean.  Also, I'd wanted to shove most of
the text in the function header in-line and get rid of the short
restatements of those paras.  I carried that through just for
predicate_implied_by_simple_clause, as attached.  The structure is
definitely clearer, but we end up with an awful lot of indentation,
which makes the comments less readable than I'd like.  (I did some
minor rewording to make them flow better.)

On balance I think this is probably better than what we have, but
maybe we'd be best off to avoid doubly nested switches?  I think
there's a good argument for the outer switch on nodeTag, but
maybe we're getting diminishing returns from an inner switch.

            regards, tom lane

diff --git a/src/backend/optimizer/util/predtest.c b/src/backend/optimizer/util/predtest.c
index fe83e45311..a9accbed53 100644
--- a/src/backend/optimizer/util/predtest.c
+++ b/src/backend/optimizer/util/predtest.c
@@ -1087,38 +1087,12 @@ arrayexpr_cleanup_fn(PredIterInfo info)
 }


-/*----------
+/*
  * predicate_implied_by_simple_clause
  *      Does the predicate implication test for a "simple clause" predicate
  *      and a "simple clause" restriction.
  *
  * We return true if able to prove the implication, false if not.
- *
- * We have several strategies for determining whether one simple clause
- * implies another:
- *
- * A simple and general way is to see if they are equal(); this works for any
- * kind of expression, and for either implication definition.  (Actually,
- * there is an implied assumption that the functions in the expression are
- * immutable --- but this was checked for the predicate by the caller.)
- *
- * Another way that always works is that for boolean x, "x = TRUE" is
- * equivalent to "x", likewise "x = FALSE" is equivalent to "NOT x".
- * These can be worth checking because, while we preferentially simplify
- * boolean comparisons down to "x" and "NOT x", the other form has to be
- * dealt with anyway in the context of index conditions.
- *
- * If the predicate is of the form "foo IS NOT NULL", and we are considering
- * strong implication, we can conclude that the predicate is implied if the
- * clause is strict for "foo", i.e., it must yield false or NULL when "foo"
- * is NULL.  In that case truth of the clause ensures that "foo" isn't NULL.
- * (Again, this is a safe conclusion because "foo" must be immutable.)
- * This doesn't work for weak implication, though.
- *
- * Finally, if both clauses are binary operator expressions, we may be able
- * to prove something using the system's knowledge about operators; those
- * proof rules are encapsulated in operator_predicate_proof().
- *----------
  */
 static bool
 predicate_implied_by_simple_clause(Expr *predicate, Node *clause,
@@ -1127,65 +1101,115 @@ predicate_implied_by_simple_clause(Expr *predicate, Node *clause,
     /* Allow interrupting long proof attempts */
     CHECK_FOR_INTERRUPTS();

-    /* First try the equal() test */
+    /*
+     * A simple and general rule is that a clause implies itself, hence we
+     * check if they are equal(); this works for any kind of expression, and
+     * for either implication definition.  (Actually, there is an implied
+     * assumption that the functions in the expression are immutable --- but
+     * this was checked for the predicate by the caller.)
+     */
     if (equal((Node *) predicate, clause))
         return true;

-    /* Next see if clause is boolean equality to a constant */
-    if (is_opclause(clause) &&
-        ((OpExpr *) clause)->opno == BooleanEqualOperator)
+    /* Our remaining strategies are all clause-type-specific */
+    switch (nodeTag(clause))
     {
-        OpExpr       *op = (OpExpr *) clause;
-        Node       *rightop;
-
-        Assert(list_length(op->args) == 2);
-        rightop = lsecond(op->args);
-        /* We might never see a null Const here, but better check anyway */
-        if (rightop && IsA(rightop, Const) &&
-            !((Const *) rightop)->constisnull)
-        {
-            Node       *leftop = linitial(op->args);
-
-            if (DatumGetBool(((Const *) rightop)->constvalue))
+        case T_OpExpr:
             {
-                /* X = true implies X */
-                if (equal(predicate, leftop))
-                    return true;
+                OpExpr       *op = (OpExpr *) clause;
+
+                /*----------
+                 * For boolean x, "x = TRUE" is equivalent to "x", likewise
+                 * "x = FALSE" is equivalent to "NOT x".  These can be worth
+                 * checking because, while we preferentially simplify boolean
+                 * comparisons down to "x" and "NOT x", the other form has to
+                 * be dealt with anyway in the context of index conditions.
+                 *
+                 * We could likewise check whether the predicate is boolean
+                 * equality to a constant; but there are no known use-cases
+                 * for that at the moment, assuming that the predicate has
+                 * been through constant-folding.
+                 *----------
+                 */
+                if (op->opno == BooleanEqualOperator)
+                {
+                    Node       *rightop;
+
+                    Assert(list_length(op->args) == 2);
+                    rightop = lsecond(op->args);
+
+                    /*
+                     * We might never see a null Const here, but better check
+                     * anyway
+                     */
+                    if (rightop && IsA(rightop, Const) &&
+                        !((Const *) rightop)->constisnull)
+                    {
+                        Node       *leftop = linitial(op->args);
+
+                        if (DatumGetBool(((Const *) rightop)->constvalue))
+                        {
+                            /* X = true implies X */
+                            if (equal(predicate, leftop))
+                                return true;
+                        }
+                        else
+                        {
+                            /* X = false implies NOT X */
+                            if (is_notclause(predicate) &&
+                                equal(get_notclausearg(predicate), leftop))
+                                return true;
+                        }
+                    }
+                }
             }
-            else
+            break;
+        default:
+            break;
+    }
+
+    /* ... or predicate-type-specific */
+    switch (nodeTag(predicate))
+    {
+        case T_NullTest:
             {
-                /* X = false implies NOT X */
-                if (is_notclause(predicate) &&
-                    equal(get_notclausearg(predicate), leftop))
-                    return true;
+                NullTest   *ntest = (NullTest *) predicate;
+
+                switch (ntest->nulltesttype)
+                {
+                    case IS_NOT_NULL:
+
+                        /*
+                         * If the predicate is of the form "foo IS NOT NULL",
+                         * and we are considering strong implication, we can
+                         * conclude that the predicate is implied if the
+                         * clause is strict for "foo", i.e., it must yield
+                         * false or NULL when "foo" is NULL.  In that case
+                         * truth of the clause ensures that "foo" isn't NULL.
+                         * (Again, this is a safe conclusion because "foo"
+                         * must be immutable.)  This doesn't work for weak
+                         * implication, though.  Also, "row IS NOT NULL" does
+                         * not act in the simple way we have in mind.
+                         */
+                        if (!weak &&
+                            !ntest->argisrow &&
+                            clause_is_strict_for(clause, (Node *) ntest->arg, true))
+                            return true;
+                        break;
+                    default:
+                        break;
+                }
             }
-        }
+            break;
+        default:
+            break;
     }

     /*
-     * We could likewise check whether the predicate is boolean equality to a
-     * constant; but there are no known use-cases for that at the moment,
-     * assuming that the predicate has been through constant-folding.
+     * If both clauses are binary operator expressions, we may be able to
+     * prove something using the system's knowledge about operators; those
+     * proof rules are encapsulated in operator_predicate_proof().
      */
-
-    /* Next try the IS NOT NULL case */
-    if (!weak &&
-        predicate && IsA(predicate, NullTest))
-    {
-        NullTest   *ntest = (NullTest *) predicate;
-
-        /* row IS NOT NULL does not act in the simple way we have in mind */
-        if (ntest->nulltesttype == IS_NOT_NULL &&
-            !ntest->argisrow)
-        {
-            /* strictness of clause for foo implies foo IS NOT NULL */
-            if (clause_is_strict_for(clause, (Node *) ntest->arg, true))
-                return true;
-        }
-        return false;            /* we can't succeed below... */
-    }
-
-    /* Else try operator-related knowledge */
     return operator_predicate_proof(predicate, clause, false, weak);
 }

Re: Teach predtest about IS [NOT] proofs

От

James Coleman

Дата:

18 января 2024 г., 00:34:36

On Fri, Dec 22, 2023 at 2:48 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> James Coleman <jtc331@gmail.com> writes:
> > I've not yet applied all of your feedback, but I wanted to get an
> > initial read on your thoughts on how using switch statements ends up
> > looking. Attached is a single (pure refactor) patch that converts the
> > various if/else levels that check things like node tag and
> > boolean/null test type into switch statements. I've retained 'default'
> > keyword usages for now for simplicity (my intuition is that we
> > generally prefer to list out all options for compiler safety benefits,
> > though I'm not 100% sure that's useful in the outer node tag check
> > since it's unlikely that someone adding a new node would modify
> > this...).
>
> > My big question is: are you comfortable with the indentation explosion
> > this creates? IMO it's a lot wordier, but it is also more obvious what
> > the structural goal is. I'm not sure how we want to make the right
> > trade-off though.
>
> Yeah, I see what you mean.  Also, I'd wanted to shove most of
> the text in the function header in-line and get rid of the short
> restatements of those paras.  I carried that through just for
> predicate_implied_by_simple_clause, as attached.  The structure is
> definitely clearer, but we end up with an awful lot of indentation,
> which makes the comments less readable than I'd like.  (I did some
> minor rewording to make them flow better.)
>
> On balance I think this is probably better than what we have, but
> maybe we'd be best off to avoid doubly nested switches?  I think
> there's a good argument for the outer switch on nodeTag, but
> maybe we're getting diminishing returns from an inner switch.
>
>                         regards, tom lane
>

Apologies for the long delay.

Attached is a new patch series.

0001 does the initial pure refactor. 0003 makes a lot of modifications
to what we can prove about implication and refutation. Finally, 0003
isn't intended to be committed, but attempts to validate more
holistically that none of the changes creates any invalid proofs
beyond the mostly happy-path tests added in 0004.

I ended up not tackling changing how test_predtest tests run for now.
That's plausibly still useful, and I'd be happy to add that if you
generally agree with the direction of the patch and with that
abstraction being useful.

I added some additional verifications to the test_predtest module to
prevent additional obvious flaws.

Regards,
James Coleman

Вложения

Re: Teach predtest about IS [NOT] proofs

От

Tom Lane

Дата:

22 января 2024 г., 17:57:41

James Coleman <jtc331@gmail.com> writes:
> 0001 does the initial pure refactor. 0003 makes a lot of modifications
> to what we can prove about implication and refutation. Finally, 0003
> isn't intended to be committed, but attempts to validate more
> holistically that none of the changes creates any invalid proofs
> beyond the mostly happy-path tests added in 0004.

> I ended up not tackling changing how test_predtest tests run for now.
> That's plausibly still useful, and I'd be happy to add that if you
> generally agree with the direction of the patch and with that
> abstraction being useful.

> I added some additional verifications to the test_predtest module to
> prevent additional obvious flaws.

I looked through 0001 and made some additional cosmetic changes,
mostly to get comments closer to the associated code; I also
ran pgindent on it (see v5-0001 attached).  That seems pretty
committable to me at this point.  I also like your 0002 additions to
test_predtest.c (although why the mixture of ERROR and WARNING?
ISTM they should all be WARNING, so we can press on with the test).

One other thought is that maybe separating out
predicate_implied_not_null_by_clause should be part of 0001?

I'm less excited about the rest of v4-0002.

@@ -740,6 +747,16 @@ predicate_refuted_by_recurse(Node *clause, Node *predicate,
                                              !weak))
                 return true;

+            /*
+             * Because weak refutation expands the allowed outcomes for B
+             * from "false" to "false or null", we can additionally prove
+             * weak refutation in the case that strong refutation is proven.
+             */
+            if (weak && not_arg &&
+                predicate_implied_by_recurse(predicate, not_arg,
+                                             true))
+                return true;
+
             switch (pclass)
             {
                 case CLASS_AND:

I don't buy this bit at all.  If the prior recursive call fails to
prove weak refutation in a case where strong refutation holds, how is
that not a bug lower down?  Moreover, in order to mask such a bug,
you're doubling the time taken by failed proofs, which is an
unfortunate thing --- we don't like spending a lot of time on
something that fails to improve the plan.

@@ -1138,32 +1155,114 @@ predicate_implied_by_simple_clause(Expr *predicate, Node *clause,
                     Assert(list_length(op->args) == 2);
                     rightop = lsecond(op->args);

-                    /*
-                     * We might never see a null Const here, but better check
-                     * anyway
-                     */
-                    if (rightop && IsA(rightop, Const) &&
-                        !((Const *) rightop)->constisnull)
+                    if (rightop && IsA(rightop, Const))
                     {
+                        Const    *constexpr = (Const *) rightop;
                         Node       *leftop = linitial(op->args);

-                        if (DatumGetBool(((Const *) rightop)->constvalue))
-                        {
-                            /* X = true implies X */
-                            if (equal(predicate, leftop))
-                                return true;
-                        }
+                        if (constexpr->constisnull)
+                            return false;
+
+                        if (DatumGetBool(constexpr->constvalue))
+                            return equal(predicate, leftop);
                         else
-                        {
-                            /* X = false implies NOT X */
-                            if (is_notclause(predicate) &&
-                                equal(get_notclausearg(predicate), leftop))
-                                return true;
-                        }
+                            return is_notclause(predicate) &&
+                                equal(get_notclausearg(predicate), leftop);
                     }
                 }
             }
             break;

I don't understand what this bit is doing ... and the fact that
the patch removes all the existing comments and adds none isn't
helping that.  What it seems to mostly be doing is adding early
"return false"s, which I'm not sure is a good thing, because
it seems possible that operator_predicate_proof could apply here.

+                    case IS_UNKNOWN:
+                        /*
+                         * When the clause is in the form "foo IS UNKNOWN" then
+                         * we can prove weak implication of a predicate that
+                         * is strict for "foo" and negated. This doesn't work
+                         * for strong implication since if "foo" is "null" then
+                         * the predicate will evaluate to "null" rather than
+                         * "true".
+                         */

The phrasing of this comment seems randomly inconsistent with others
making similar arguments.

+                    case IS_TRUE:
                         /*
-                         * If the predicate is of the form "foo IS NOT NULL",
-                         * and we are considering strong implication, we can
-                         * conclude that the predicate is implied if the
-                         * clause is strict for "foo", i.e., it must yield
-                         * false or NULL when "foo" is NULL.  In that case
-                         * truth of the clause ensures that "foo" isn't NULL.
-                         * (Again, this is a safe conclusion because "foo"
-                         * must be immutable.)  This doesn't work for weak
-                         * implication, though.  Also, "row IS NOT NULL" does
-                         * not act in the simple way we have in mind.
+                         * X implies X is true
+                         *
+                         * We can only prove strong implication here since
+                         * `null is true` is false rather than null.
                          */

This hardly seems like an improvement on the comment.  (Also, here and
elsewhere, could we avoid using two different types of quotes?)

+                                /* X is unknown weakly implies X is not true */
+                                if (weak && clausebtest->booltesttype == IS_UNKNOWN &&
+                                    equal(clausebtest->arg, predbtest->arg))
+                                    return true;

Maybe I'm confused, but why is it only weak?

+                            /*
+                             * When we know what the predicate is in the form
+                             * "foo IS UNKNOWN" then we can prove strong and
+                             * weak refutation together. This is because the
+                             * limits imposed by weak refutation (allowing
+                             * "false" instead of just "null") is equivalently
+                             * helpful since "foo" being "false" also refutes
+                             * the predicate. Hence we pass weak=false here
+                             * always.
+                             */

This comment doesn't make sense to me either.

+    /* TODO: refactor this into switch statements also? */

Let's drop the TODO comments.

+    /*
+     * We can recurse into "not foo" without any additional processing because
+     * "not (null)" evaluates to null. That doesn't work for allow_false,
+     * however, since "not (false)" is true rather than null.
+     */
+    if (is_notclause(clause) &&
+        clause_is_strict_for((Node *) get_notclausearg(clause), subexpr, false))
+        return true;

Not exactly convinced by this.  The way the comment is written, I'd
expect to not call clause_is_strict_for at all if allow_false.  If
it's okay to call it anyway and pass allow_false = false, you need
to defend that, which this comment isn't doing.

            regards, tom lane

diff --git a/src/backend/optimizer/util/predtest.c b/src/backend/optimizer/util/predtest.c
index c37b416e24..3a352ca37f 100644
--- a/src/backend/optimizer/util/predtest.c
+++ b/src/backend/optimizer/util/predtest.c
@@ -1087,38 +1087,12 @@ arrayexpr_cleanup_fn(PredIterInfo info)
 }


-/*----------
+/*
  * predicate_implied_by_simple_clause
  *      Does the predicate implication test for a "simple clause" predicate
  *      and a "simple clause" restriction.
  *
  * We return true if able to prove the implication, false if not.
- *
- * We have several strategies for determining whether one simple clause
- * implies another:
- *
- * A simple and general way is to see if they are equal(); this works for any
- * kind of expression, and for either implication definition.  (Actually,
- * there is an implied assumption that the functions in the expression are
- * immutable --- but this was checked for the predicate by the caller.)
- *
- * Another way that always works is that for boolean x, "x = TRUE" is
- * equivalent to "x", likewise "x = FALSE" is equivalent to "NOT x".
- * These can be worth checking because, while we preferentially simplify
- * boolean comparisons down to "x" and "NOT x", the other form has to be
- * dealt with anyway in the context of index conditions.
- *
- * If the predicate is of the form "foo IS NOT NULL", and we are considering
- * strong implication, we can conclude that the predicate is implied if the
- * clause is strict for "foo", i.e., it must yield false or NULL when "foo"
- * is NULL.  In that case truth of the clause ensures that "foo" isn't NULL.
- * (Again, this is a safe conclusion because "foo" must be immutable.)
- * This doesn't work for weak implication, though.
- *
- * Finally, if both clauses are binary operator expressions, we may be able
- * to prove something using the system's knowledge about operators; those
- * proof rules are encapsulated in operator_predicate_proof().
- *----------
  */
 static bool
 predicate_implied_by_simple_clause(Expr *predicate, Node *clause,
@@ -1127,97 +1101,126 @@ predicate_implied_by_simple_clause(Expr *predicate, Node *clause,
     /* Allow interrupting long proof attempts */
     CHECK_FOR_INTERRUPTS();

-    /* First try the equal() test */
+    /*
+     * A simple and general rule is that a clause implies itself, hence we
+     * check if they are equal(); this works for any kind of expression, and
+     * for either implication definition.  (Actually, there is an implied
+     * assumption that the functions in the expression are immutable --- but
+     * this was checked for the predicate by the caller.)
+     */
     if (equal((Node *) predicate, clause))
         return true;

-    /* Next see if clause is boolean equality to a constant */
-    if (is_opclause(clause) &&
-        ((OpExpr *) clause)->opno == BooleanEqualOperator)
+    /* Next we have some clause-type-specific strategies */
+    switch (nodeTag(clause))
     {
-        OpExpr       *op = (OpExpr *) clause;
-        Node       *rightop;
-
-        Assert(list_length(op->args) == 2);
-        rightop = lsecond(op->args);
-        /* We might never see a null Const here, but better check anyway */
-        if (rightop && IsA(rightop, Const) &&
-            !((Const *) rightop)->constisnull)
-        {
-            Node       *leftop = linitial(op->args);
-
-            if (DatumGetBool(((Const *) rightop)->constvalue))
+        case T_OpExpr:
             {
-                /* X = true implies X */
-                if (equal(predicate, leftop))
-                    return true;
+                OpExpr       *op = (OpExpr *) clause;
+
+                /*----------
+                 * For boolean x, "x = TRUE" is equivalent to "x", likewise
+                 * "x = FALSE" is equivalent to "NOT x".  These can be worth
+                 * checking because, while we preferentially simplify boolean
+                 * comparisons down to "x" and "NOT x", the other form has to
+                 * be dealt with anyway in the context of index conditions.
+                 *
+                 * We could likewise check whether the predicate is boolean
+                 * equality to a constant; but there are no known use-cases
+                 * for that at the moment, assuming that the predicate has
+                 * been through constant-folding.
+                 *----------
+                 */
+                if (op->opno == BooleanEqualOperator)
+                {
+                    Node       *rightop;
+
+                    Assert(list_length(op->args) == 2);
+                    rightop = lsecond(op->args);
+
+                    /* We might never see null Consts here, but better check */
+                    if (rightop && IsA(rightop, Const) &&
+                        !((Const *) rightop)->constisnull)
+                    {
+                        Node       *leftop = linitial(op->args);
+
+                        if (DatumGetBool(((Const *) rightop)->constvalue))
+                        {
+                            /* X = true implies X */
+                            if (equal(predicate, leftop))
+                                return true;
+                        }
+                        else
+                        {
+                            /* X = false implies NOT X */
+                            if (is_notclause(predicate) &&
+                                equal(get_notclausearg(predicate), leftop))
+                                return true;
+                        }
+                    }
+                }
             }
-            else
+            break;
+        default:
+            break;
+    }
+
+    /* ... and some predicate-type-specific ones */
+    switch (nodeTag(predicate))
+    {
+        case T_NullTest:
             {
-                /* X = false implies NOT X */
-                if (is_notclause(predicate) &&
-                    equal(get_notclausearg(predicate), leftop))
-                    return true;
+                NullTest   *predntest = (NullTest *) predicate;
+
+                switch (predntest->nulltesttype)
+                {
+                    case IS_NOT_NULL:
+
+                        /*
+                         * If the predicate is of the form "foo IS NOT NULL",
+                         * and we are considering strong implication, we can
+                         * conclude that the predicate is implied if the
+                         * clause is strict for "foo", i.e., it must yield
+                         * false or NULL when "foo" is NULL.  In that case
+                         * truth of the clause ensures that "foo" isn't NULL.
+                         * (Again, this is a safe conclusion because "foo"
+                         * must be immutable.)  This doesn't work for weak
+                         * implication, though.  Also, "row IS NOT NULL" does
+                         * not act in the simple way we have in mind.
+                         */
+                        if (!weak &&
+                            !predntest->argisrow &&
+                            clause_is_strict_for(clause,
+                                                 (Node *) predntest->arg,
+                                                 true))
+                            return true;
+                        break;
+                    case IS_NULL:
+                        break;
+                }
             }
-        }
+            break;
+        default:
+            break;
     }

     /*
-     * We could likewise check whether the predicate is boolean equality to a
-     * constant; but there are no known use-cases for that at the moment,
-     * assuming that the predicate has been through constant-folding.
+     * Finally, if both clauses are binary operator expressions, we may be
+     * able to prove something using the system's knowledge about operators;
+     * those proof rules are encapsulated in operator_predicate_proof().
      */
-
-    /* Next try the IS NOT NULL case */
-    if (!weak &&
-        predicate && IsA(predicate, NullTest))
-    {
-        NullTest   *ntest = (NullTest *) predicate;
-
-        /* row IS NOT NULL does not act in the simple way we have in mind */
-        if (ntest->nulltesttype == IS_NOT_NULL &&
-            !ntest->argisrow)
-        {
-            /* strictness of clause for foo implies foo IS NOT NULL */
-            if (clause_is_strict_for(clause, (Node *) ntest->arg, true))
-                return true;
-        }
-        return false;            /* we can't succeed below... */
-    }
-
-    /* Else try operator-related knowledge */
     return operator_predicate_proof(predicate, clause, false, weak);
 }

-/*----------
+/*
  * predicate_refuted_by_simple_clause
  *      Does the predicate refutation test for a "simple clause" predicate
  *      and a "simple clause" restriction.
  *
  * We return true if able to prove the refutation, false if not.
  *
- * Unlike the implication case, checking for equal() clauses isn't helpful.
- * But relation_excluded_by_constraints() checks for self-contradictions in a
- * list of clauses, so that we may get here with predicate and clause being
- * actually pointer-equal, and that is worth eliminating quickly.
- *
- * When the predicate is of the form "foo IS NULL", we can conclude that
- * the predicate is refuted if the clause is strict for "foo" (see notes for
- * implication case), or is "foo IS NOT NULL".  That works for either strong
- * or weak refutation.
- *
- * A clause "foo IS NULL" refutes a predicate "foo IS NOT NULL" in all cases.
- * If we are considering weak refutation, it also refutes a predicate that
- * is strict for "foo", since then the predicate must yield false or NULL
- * (and since "foo" appears in the predicate, it's known immutable).
- *
- * (The main motivation for covering these IS [NOT] NULL cases is to support
- * using IS NULL/IS NOT NULL as partition-defining constraints.)
- *
- * Finally, if both clauses are binary operator expressions, we may be able
- * to prove something using the system's knowledge about operators; those
- * proof rules are encapsulated in operator_predicate_proof().
- *----------
+ * The main motivation for covering IS [NOT] NULL cases to support using
+ * IS NULL/IS NOT NULL as partition-defining constraints.
  */
 static bool
 predicate_refuted_by_simple_clause(Expr *predicate, Node *clause,
@@ -1226,61 +1229,150 @@ predicate_refuted_by_simple_clause(Expr *predicate, Node *clause,
     /* Allow interrupting long proof attempts */
     CHECK_FOR_INTERRUPTS();

-    /* A simple clause can't refute itself */
-    /* Worth checking because of relation_excluded_by_constraints() */
+    /*
+     * A simple clause can't refute itself, so unlike the implication case,
+     * checking for equal() clauses isn't helpful.
+     *
+     * But relation_excluded_by_constraints() checks for self-contradictions
+     * in a list of clauses, so that we may get here with predicate and clause
+     * being actually pointer-equal, and that is worth eliminating quickly.
+     */
     if ((Node *) predicate == clause)
         return false;

-    /* Try the predicate-IS-NULL case */
-    if (predicate && IsA(predicate, NullTest) &&
-        ((NullTest *) predicate)->nulltesttype == IS_NULL)
+    /* Next we have some clause-type-specific strategies */
+    switch (nodeTag(clause))
     {
-        Expr       *isnullarg = ((NullTest *) predicate)->arg;
-
-        /* row IS NULL does not act in the simple way we have in mind */
-        if (((NullTest *) predicate)->argisrow)
-            return false;
+        case T_NullTest:
+            {
+                NullTest   *clausentest = (NullTest *) clause;

-        /* strictness of clause for foo refutes foo IS NULL */
-        if (clause_is_strict_for(clause, (Node *) isnullarg, true))
-            return true;
+                /* row IS NULL does not act in the simple way we have in mind */
+                if (clausentest->argisrow)
+                    return false;

-        /* foo IS NOT NULL refutes foo IS NULL */
-        if (clause && IsA(clause, NullTest) &&
-            ((NullTest *) clause)->nulltesttype == IS_NOT_NULL &&
-            !((NullTest *) clause)->argisrow &&
-            equal(((NullTest *) clause)->arg, isnullarg))
-            return true;
+                switch (clausentest->nulltesttype)
+                {
+                    case IS_NULL:
+                        {
+                            switch (nodeTag(predicate))
+                            {
+                                case T_NullTest:
+                                    {
+                                        NullTest   *predntest = (NullTest *) predicate;
+
+                                        /*
+                                         * row IS NULL does not act in the
+                                         * simple way we have in mind
+                                         */
+                                        if (predntest->argisrow)
+                                            return false;
+
+                                        /*
+                                         * foo IS NULL refutes foo IS NOT
+                                         * NULL, at least in the non-row case,
+                                         * for both strong and weak refutation
+                                         */
+                                        if (predntest->nulltesttype == IS_NOT_NULL &&
+                                            equal(predntest->arg, clausentest->arg))
+                                            return true;
+                                    }
+                                default:
+                                    break;
+                            }

-        return false;            /* we can't succeed below... */
+                            /*
+                             * foo IS NULL weakly refutes any predicate that
+                             * is strict for foo, since then the predicate
+                             * must yield false or NULL (and since foo appears
+                             * in the predicate, it's known immutable).
+                             */
+                            if (weak &&
+                                clause_is_strict_for((Node *) predicate,
+                                                     (Node *) clausentest->arg,
+                                                     true))
+                                return true;
+
+                            return false;    /* we can't succeed below... */
+                        }
+                        break;
+                    case IS_NOT_NULL:
+                        break;
+                }
+            }
+        default:
+            break;
     }

-    /* Try the clause-IS-NULL case */
-    if (clause && IsA(clause, NullTest) &&
-        ((NullTest *) clause)->nulltesttype == IS_NULL)
+    /* ... and some predicate-type-specific ones */
+    switch (nodeTag(predicate))
     {
-        Expr       *isnullarg = ((NullTest *) clause)->arg;
+        case T_NullTest:
+            {
+                NullTest   *predntest = (NullTest *) predicate;

-        /* row IS NULL does not act in the simple way we have in mind */
-        if (((NullTest *) clause)->argisrow)
-            return false;
+                /* row IS NULL does not act in the simple way we have in mind */
+                if (predntest->argisrow)
+                    return false;

-        /* foo IS NULL refutes foo IS NOT NULL */
-        if (predicate && IsA(predicate, NullTest) &&
-            ((NullTest *) predicate)->nulltesttype == IS_NOT_NULL &&
-            !((NullTest *) predicate)->argisrow &&
-            equal(((NullTest *) predicate)->arg, isnullarg))
-            return true;
+                switch (predntest->nulltesttype)
+                {
+                    case IS_NULL:
+                        {
+                            switch (nodeTag(clause))
+                            {
+                                case T_NullTest:
+                                    {
+                                        NullTest   *clausentest = (NullTest *) clause;
+
+                                        /*
+                                         * row IS NULL does not act in the
+                                         * simple way we have in mind
+                                         */
+                                        if (clausentest->argisrow)
+                                            return false;
+
+                                        /*
+                                         * foo IS NOT NULL refutes foo IS NULL
+                                         * for both strong and weak refutation
+                                         */
+                                        if (clausentest->nulltesttype == IS_NOT_NULL &&
+                                            equal(clausentest->arg, predntest->arg))
+                                            return true;
+                                    }
+                                    break;
+                                default:
+                                    break;
+                            }

-        /* foo IS NULL weakly refutes any predicate that is strict for foo */
-        if (weak &&
-            clause_is_strict_for((Node *) predicate, (Node *) isnullarg, true))
-            return true;
+                            /*
+                             * When the predicate is of the form "foo IS
+                             * NULL", we can conclude that the predicate is
+                             * refuted if the clause is strict for "foo" (see
+                             * notes for implication case).  That works for
+                             * either strong or weak refutation.
+                             */
+                            if (clause_is_strict_for(clause,
+                                                     (Node *) predntest->arg,
+                                                     true))
+                                return true;
+                        }
+                        break;
+                    case IS_NOT_NULL:
+                        break;
+                }

-        return false;            /* we can't succeed below... */
+                return false;    /* we can't succeed below... */
+            }
+            break;
+        default:
+            break;
     }

-    /* Else try operator-related knowledge */
+    /*
+     * Finally, if both clauses are binary operator expressions, we may be
+     * able to prove something using the system's knowledge about operators.
+     */
     return operator_predicate_proof(predicate, clause, true, weak);
 }

Re: Teach predtest about IS [NOT] proofs

От

James Coleman

Дата:

25 января 2024 г., 01:08:47

Thanks for the feedback.

On Mon, Jan 22, 2024 at 12:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> James Coleman <jtc331@gmail.com> writes:
> > 0001 does the initial pure refactor. 0003 makes a lot of modifications
> > to what we can prove about implication and refutation. Finally, 0003
> > isn't intended to be committed, but attempts to validate more
> > holistically that none of the changes creates any invalid proofs
> > beyond the mostly happy-path tests added in 0004.
>
> > I ended up not tackling changing how test_predtest tests run for now.
> > That's plausibly still useful, and I'd be happy to add that if you
> > generally agree with the direction of the patch and with that
> > abstraction being useful.
>
> > I added some additional verifications to the test_predtest module to
> > prevent additional obvious flaws.
>
> I looked through 0001 and made some additional cosmetic changes,
> mostly to get comments closer to the associated code; I also
> ran pgindent on it (see v5-0001 attached).  That seems pretty
> committable to me at this point.

Great.

> I also like your 0002 additions to
> test_predtest.c (although why the mixture of ERROR and WARNING?
> ISTM they should all be WARNING, so we can press on with the test).

My reasoning is that one is a major error in something larger than
predtest, while the other is clearly "your code change isn't
accurate". The surrounding code seems to be drawing a distinction also
(it uses both ERROR and WARNING), and so I was trying to parallel that
appropriately.

I'm fine with making both WARNING though.

But does that also mean we should make other such cases WARNING as
well? For example, the query not returning two boolean columns doesn't
really seem like a reason to break subsequent tests.

I haven't changed this yet pending these questions.

> One other thought is that maybe separating out
> predicate_implied_not_null_by_clause should be part of 0001?

Would you prefer to commit a refactor along with some functionality
changes? Or one patch with the pure refactor and then a second patch
with the predicate_implied_not_null_by_clause changes?

> I'm less excited about the rest of v4-0002.
>
> @@ -740,6 +747,16 @@ predicate_refuted_by_recurse(Node *clause, Node *predicate,
>                                               !weak))
>                  return true;
>
> +            /*
> +             * Because weak refutation expands the allowed outcomes for B
> +             * from "false" to "false or null", we can additionally prove
> +             * weak refutation in the case that strong refutation is proven.
> +             */
> +            if (weak && not_arg &&
> +                predicate_implied_by_recurse(predicate, not_arg,
> +                                             true))
> +                return true;
> +
>              switch (pclass)
>              {
>                  case CLASS_AND:
>
> I don't buy this bit at all.  If the prior recursive call fails to
> prove weak refutation in a case where strong refutation holds, how is
> that not a bug lower down?

This is one of the last additions I made while authoring the most
recent version of the patch, and at first I thought it suggested a bug
lower down also.

However the cases proven by these lines ("x is not false" is weakly
refuted by "not x", "x is false", and "x = false") correctly do not
have their not arg ("x") strongly implied by "x is not false" since
both "x is null" and "x is true" would have to imply "x", which
obviously doesn't hold. These aren't cases we're handling directly in
predicate_refuted_by_simple_clause.

This is caused by the asymmetry between implication and refutation
that I noted in my addition to the comments nearer the top of the
file:

+ * A notable difference between implication and refutation proofs is that
+ * strong/weak refutations don't vary the input of A (both must be true) but
+ * vary the allowed outcomes of B (false vs. non-truth), while for implications
+ * we vary both A (truth vs. non-falsity) and B (truth vs. non-falsity).

Put another way in the comments I added in test_predtest.c:

+    /* Because weak refutation proofs are a strict subset of strong refutation
+     * proofs (since for "A => B" "A" is always true) we ought never
have strong
+     * refutation hold when weak refutation does not.
+     *
+     * We can't make the same assertion for implication since moving
from strong
+     * to weak implication expands the allowed values of "A" from
true to either
+     * true or NULL.

We could decide to handle this particular failing case explicitly in
predicate_refuted_by_simple_clause as opposed to inferring it by
whether or not implication by the not-arg holds, but I suspect that
leaves us open to other cases we should be to prove refutation for but
don't.

Alternatively (to avoid unnecessary CPU burn) we could modify
predicate_implied_by_recurse (and functionals called by it) to have a
argument beyond "weak = true/false" Ie.g., an enum that allows for
something like "WEAK", "STRONG", and "EITHER". That's a bigger change,
so I didn't want to do that right away unless there was agreement on
that direction.

I haven't changed this yet pending this discussion.

> Moreover, in order to mask such a bug,
> you're doubling the time taken by failed proofs, which is an
> unfortunate thing --- we don't like spending a lot of time on
> something that fails to improve the plan.

See above.

> @@ -1138,32 +1155,114 @@ predicate_implied_by_simple_clause(Expr *predicate, Node *clause,
>                      Assert(list_length(op->args) == 2);
>                      rightop = lsecond(op->args);
>
> -                    /*
> -                     * We might never see a null Const here, but better check
> -                     * anyway
> -                     */
> -                    if (rightop && IsA(rightop, Const) &&
> -                        !((Const *) rightop)->constisnull)
> +                    if (rightop && IsA(rightop, Const))
>                      {
> +                        Const    *constexpr = (Const *) rightop;
>                          Node       *leftop = linitial(op->args);
>
> -                        if (DatumGetBool(((Const *) rightop)->constvalue))
> -                        {
> -                            /* X = true implies X */
> -                            if (equal(predicate, leftop))
> -                                return true;
> -                        }
> +                        if (constexpr->constisnull)
> +                            return false;
> +
> +                        if (DatumGetBool(constexpr->constvalue))
> +                            return equal(predicate, leftop);
>                          else
> -                        {
> -                            /* X = false implies NOT X */
> -                            if (is_notclause(predicate) &&
> -                                equal(get_notclausearg(predicate), leftop))
> -                                return true;
> -                        }
> +                            return is_notclause(predicate) &&
> +                                equal(get_notclausearg(predicate), leftop);
>                      }
>                  }
>              }
>              break;
>
> I don't understand what this bit is doing ... and the fact that
> the patch removes all the existing comments and adds none isn't
> helping that.  What it seems to mostly be doing is adding early
> "return false"s, which I'm not sure is a good thing, because
> it seems possible that operator_predicate_proof could apply here.

I was mostly bringing it in line with the style I have elsewhere in
the patch by pulling out the Const* into a variable to avoid repeated
casting.

That being said, you're right that I didn't catch in the many
revisions along the way that I'd added unnecessary early returns and
lost the comments. Fixed both of those in the next version.

> +                    case IS_UNKNOWN:
> +                        /*
> +                         * When the clause is in the form "foo IS UNKNOWN" then
> +                         * we can prove weak implication of a predicate that
> +                         * is strict for "foo" and negated. This doesn't work
> +                         * for strong implication since if "foo" is "null" then
> +                         * the predicate will evaluate to "null" rather than
> +                         * "true".
> +                         */
>
> The phrasing of this comment seems randomly inconsistent with others
> making similar arguments.

Changed.

> +                    case IS_TRUE:
>                          /*
> -                         * If the predicate is of the form "foo IS NOT NULL",
> -                         * and we are considering strong implication, we can
> -                         * conclude that the predicate is implied if the
> -                         * clause is strict for "foo", i.e., it must yield
> -                         * false or NULL when "foo" is NULL.  In that case
> -                         * truth of the clause ensures that "foo" isn't NULL.
> -                         * (Again, this is a safe conclusion because "foo"
> -                         * must be immutable.)  This doesn't work for weak
> -                         * implication, though.  Also, "row IS NOT NULL" does
> -                         * not act in the simple way we have in mind.
> +                         * X implies X is true
> +                         *
> +                         * We can only prove strong implication here since
> +                         * `null is true` is false rather than null.
>                           */
>
> This hardly seems like an improvement on the comment.  (Also, here and
> elsewhere, could we avoid using two different types of quotes?)

I think the git diff is confusing here. The old comment was about a
predicate "foo IS NOT NULL", but the new comment is about a predicate
"foo IS TRUE".

I did fix the usage of backticks though.

> +                                /* X is unknown weakly implies X is not true */
> +                                if (weak && clausebtest->booltesttype == IS_UNKNOWN &&
> +                                    equal(clausebtest->arg, predbtest->arg))
> +                                    return true;
>
> Maybe I'm confused, but why is it only weak?

You're not confused; this seems like a mistake (same with the IS NOT
FALSE below it).

> +                            /*
> +                             * When we know what the predicate is in the form
> +                             * "foo IS UNKNOWN" then we can prove strong and
> +                             * weak refutation together. This is because the
> +                             * limits imposed by weak refutation (allowing
> +                             * "false" instead of just "null") is equivalently
> +                             * helpful since "foo" being "false" also refutes
> +                             * the predicate. Hence we pass weak=false here
> +                             * always.
> +                             */
>
> This comment doesn't make sense to me either.

I rewrote the comment in the attached revision; let me know if that helps.

> +    /* TODO: refactor this into switch statements also? */
>
> Let's drop the TODO comments.

This one was meant to be a question for you in review: do we want to
make that change? Or are we content to leave it as-is?

Either way, removed.

> +    /*
> +     * We can recurse into "not foo" without any additional processing because
> +     * "not (null)" evaluates to null. That doesn't work for allow_false,
> +     * however, since "not (false)" is true rather than null.
> +     */
> +    if (is_notclause(clause) &&
> +        clause_is_strict_for((Node *) get_notclausearg(clause), subexpr, false))
> +        return true;
>
> Not exactly convinced by this.  The way the comment is written, I'd
> expect to not call clause_is_strict_for at all if allow_false.  If
> it's okay to call it anyway and pass allow_false = false, you need
> to defend that, which this comment isn't doing.

I updated the comment to clarify. The restriction on allow_false
(always passing along false on the recursion case) is already
documented as a requirement in the function comment, but I wanted the
comment here to explain why that was necessary here, since in my
opinion it's not immediately obvious reading the function comment why
such a restriction would necessarily hold true for all recursion
cases.

Regards,
James Coleman

On Mon, Apr 1, 2024 at 8:06 AM James Coleman <jtc331@gmail.com> wrote:
>
> On Mon, Mar 25, 2024 at 5:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > James Coleman <jtc331@gmail.com> writes:
> > > [ v6 patchset ]
> >
> > I went ahead and committed 0001 after one more round of review
> >
> > statements; my bad).  I also added the changes in test_predtest.c from
> > 0002.  I attach a rebased version of 0002, as well as 0003 which isn't
> > changed, mainly to keep the cfbot happy.
> >
> > I'm still not happy with what you did in predicate_refuted_by_recurse:
> > it feels wrong and rather expensively so.  There has to be a better
> > way.  Maybe strong vs. weak isn't quite the right formulation for
> > refutation tests?
>
> Possibly. Earlier I'd mused that:
>
> > Alternatively (to avoid unnecessary CPU burn) we could modify
> > predicate_implied_by_recurse (and functionals called by it) to have a
> > argument beyond "weak = true/false" Ie.g., an enum that allows for
> > something like "WEAK", "STRONG", and "EITHER". That's a bigger change,
> > so I didn't want to do that right away unless there was agreement on
> > that direction.
>
> I'm going to try implementing that and see how I feel about what it
> looks like in practice.

Attached is v8 which does this. Note that I kept the patch 0001 as
before and inserted a new 0002 to show exactly what's changed from the
previously version -- I wouldn't expect that to be committed
separately, of course. With this change we only need to recurse a
single time and can check for both strong and weak refutation when
either will do for proving refutation of the "NOT x" construct.

Regards,
James Coleman

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Teach predtest about IS [NOT] proofs

Вложения

Вложения

Вложения

Вложения

Вложения