Обсуждение: COALESCE implementation question

Поиск
Список
Период
Сортировка

COALESCE implementation question

От
Philip Warner
Дата:
I noticed that the COALESCE function is implemented as a case statement,
with the result that:
  update t1 set f = Coalesce( (select fn from t2 x where x.f1 = t1.f1),
t1.f1)

has the following plan:

Seq Scan on t1  (cost=0.00..20.00 rows=1000 width=10) SubPlan   ->  Seq Scan on t2 x  (cost=0.00..22.50 rows=10
width=4)  ->  Seq Scan on t2 x  (cost=0.00..22.50 rows=1000 width=4)
 


ie. it *seems* to scan t2 twice, because the resulting CASE statement for
the subselect is:
   case when not (select fn from t2 x where x.f1 = t1.f1) is NULL then           (select fn from t2 x where x.f1 =
t1.f1)else           t1.f1   end
 

which does seem to imply two executions of the same select statement.

I realize that the standard says:
   2) COALESCE (V(1), V(2)) is equivalent to the following <case
specification>:
          CASE WHEN V(1) IS NOT NULL THEN V(1) ELSE V(2)
END
   3) "COALESCE (V(1), V(2), . . . , V(n))", for n >= 3, is
equivalent      to the following <case specification>:
          CASE
WHEN V(1) IS NOT NULL THEN V(1)           ELSE COALESCE (V(2), . . . , V(n))           END

I was wondering if there was a reason that we interpret this literally,
rather than implement a function? Or set a flag on the CaseExpr node to
indicate that the 'result == whenClause', or some such.

I am still hunting through the planner/optimizer to try to understand if
this is feasible, and would appreciate any suggestions...


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: COALESCE implementation question

От
Philip Warner
Дата:
P.S. What's worse, I should have mentioned that the plan *with* indexes
seems flawed:

Create Table t1(f1 int);
Create Table t1(f1 int, f2 int);
Create Unique Index t2f1 on t2(f1);
Create Unique Index t2f2 on t2(f2);

explain update t1 set f1 = Coalesce( (select f2 from t2 x where x.f1 =
t1.f1), t1.f1);
NOTICE:  QUERY PLAN:

Seq Scan on t1  (cost=0.00..20.00 rows=1000 width=10) SubPlan   ->  Index Scan using t2f1 on t2 x  (cost=0.00..8.14
rows=10width=4)   ->  Seq Scan on t2 x  (cost=0.00..22.50 rows=1000 width=4)
 



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: COALESCE implementation question

От
Philip Warner
Дата:
At 17:22 5/08/00 +1000, Philip Warner wrote:
>P.S. What's worse, I should have mentioned that the plan *with* indexes
>seems flawed:
>
>Create Table t1(f1 int);
>Create Table t1(f1 int, f2 int);
>Create Unique Index t2f1 on t2(f1);
>Create Unique Index t2f2 on t2(f2);
>
>explain update t1 set f1 = Coalesce( (select f2 from t2 x where x.f1 =
>t1.f1), t1.f1);
>NOTICE:  QUERY PLAN:
>
>Seq Scan on t1  (cost=0.00..20.00 rows=1000 width=10)
>  SubPlan
>    ->  Index Scan using t2f1 on t2 x  (cost=0.00..8.14 rows=10 width=4)
>    ->  Seq Scan on t2 x  (cost=0.00..22.50 rows=1000 width=4)
>

Oddly enough, I now think that the EXPLAIN output is a lie; it seems that
it never does a sequential scan. So I am now even more confused...


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: COALESCE implementation question

От
Tom Lane
Дата:
Philip Warner <pjw@rhyme.com.au> writes:
> I realize that the standard says:

>     2) COALESCE (V(1), V(2)) is equivalent to the following <case
>        specification> :
>            CASE WHEN V(1) IS NOT NULL THEN V(1) ELSE V(2) END

> I was wondering if there was a reason that we interpret this literally,
> rather than implement a function?

Well, the standard is perfectly clear, isn't it?  If V(1) has side
effects then trying to optimize this into just one evaluation of V(1)
will generate non-spec-compliant results.

I'd have to agree that two evaluations are pretty annoying, though,
and I wonder whether the spec authors *really* meant to demand
double evaluation of the "winning" case item.  Can anyone check
whether Oracle and other DBMSes perform double evaluation?

BTW, the "BETWEEN" expression has exactly the same issue.
        regards, tom lane


Re: COALESCE implementation question

От
Tom Lane
Дата:
Philip Warner <pjw@rhyme.com.au> writes:
> explain update t1 set f1 = Coalesce( (select f2 from t2 x where x.f1 =
> t1.f1), t1.f1);
> NOTICE:  QUERY PLAN:

> Seq Scan on t1  (cost=0.00..20.00 rows=1000 width=10)
>   SubPlan
>     -> Index Scan using t2f1 on t2 x  (cost=0.00..8.14 rows=10 width=4)
>     -> Seq Scan on t2 x  (cost=0.00..22.50 rows=1000 width=4)

This is a bug caused by interaction between two planning passes run
on the same Query node.  The parser thinks it's cool to generate a
CASE parsetree with multiple paths to the same sub-select Query node,
but in fact it is not cool because planning destructively alters the
Query node contents.  I'm amazed it didn't crash, to tell the truth.

I have a patch but haven't applied it yet (been offline for most of
two days due to telco idiocy :-().
        regards, tom lane


Re: COALESCE implementation question

От
Philip Warner
Дата:
At 22:36 5/08/00 -0400, Tom Lane wrote:
>Philip Warner <pjw@rhyme.com.au> writes:
>> I realize that the standard says:
>
>>     2) COALESCE (V(1), V(2)) is equivalent to the following <case
>>        specification> :
>>            CASE WHEN V(1) IS NOT NULL THEN V(1) ELSE V(2) END
>
>> I was wondering if there was a reason that we interpret this literally,
>> rather than implement a function?
>
>Well, the standard is perfectly clear, isn't it?  If V(1) has side
>effects then trying to optimize this into just one evaluation of V(1)
>will generate non-spec-compliant results.

At least with the new function manager, if I feel te need I can write a
'CoalesceValues' function (at least for fixed numbers of parameters).


>I'd have to agree that two evaluations are pretty annoying, though,
>and I wonder whether the spec authors *really* meant to demand
>double evaluation of the "winning" case item.  Can anyone check
>whether Oracle and other DBMSes perform double evaluation?

It's very hard to believe that is what they meant, or even if they even
considered the ramifications of their proposed implementation (I'm not
really sure why they chose to describe the implementation and specifically
to implement a 'function' as a case statement). eg. the result of the first
execution *could* mean that the second execution returns NULL - fine for
CASE, lousy for COALESCE. In fact it's pretty easy to write a function that
causes COALESCE(f(), 1) to return NULL...

Sadly, my usual yard stick (Dec/RDB) seems to evaluate twice (at least
that's what it's planner says). And dumping a view with a coalesce
statement produces a CASE statement, so it probably has no choice.

Just seems daft to me.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: COALESCE implementation question

От
Philip Warner
Дата:
At 22:27 5/08/00 -0400, Tom Lane wrote:
>Philip Warner <pjw@rhyme.com.au> writes:
>> explain update t1 set f1 = Coalesce( (select f2 from t2 x where x.f1 =
>> t1.f1), t1.f1);
>> NOTICE:  QUERY PLAN:
>
>> Seq Scan on t1  (cost=0.00..20.00 rows=1000 width=10)
>>   SubPlan
>>     -> Index Scan using t2f1 on t2 x  (cost=0.00..8.14 rows=10 width=4)
>>     -> Seq Scan on t2 x  (cost=0.00..22.50 rows=1000 width=4)
>
>This is a bug caused by interaction between two planning passes run
>on the same Query node.  The parser thinks it's cool to generate a
>CASE parsetree with multiple paths to the same sub-select Query node,
>but in fact it is not cool because planning destructively alters the
>Query node contents.  I'm amazed it didn't crash, to tell the truth.
>
>I have a patch but haven't applied it yet (been offline for most of
>two days due to telco idiocy :-().

Thanks for this; I must admit I was very surprised not to get a response
withing 24 hours! Is there any chance of sending me the patch - I have been
looking at the sources for a while now, and it would be nice to see the
answer...


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: COALESCE implementation question

От
Tom Lane
Дата:
Philip Warner <pjw@rhyme.com.au> writes:
>> Well, the standard is perfectly clear, isn't it?  If V(1) has side
>> effects then trying to optimize this into just one evaluation of V(1)
>> will generate non-spec-compliant results.

> At least with the new function manager, if I feel te need I can write a
> 'CoalesceValues' function (at least for fixed numbers of parameters).

Mmm ... not really.  You could detect nulls all right, but a function-
based version of COALESCE would evaluate *all* its arguments exactly
once, which is certainly wrong.  If you don't stop evaluating with
the one you decide to return, you are neither compliant with the spec
nor safe (later expressions might yield errors if evaluated!)

> Sadly, my usual yard stick (Dec/RDB) seems to evaluate twice (at least
> that's what it's planner says). And dumping a view with a coalesce
> statement produces a CASE statement, so it probably has no choice.

Sounds like they do it the same as we do, ie, expand COALESCE into the
specified CASE equivalent on sight.
        regards, tom lane


Re: COALESCE implementation question

От
Philip Warner
Дата:
At 23:28 5/08/00 -0400, Tom Lane wrote:
>Philip Warner <pjw@rhyme.com.au> writes:
>>> Well, the standard is perfectly clear, isn't it?  If V(1) has side
>>> effects then trying to optimize this into just one evaluation of V(1)
>>> will generate non-spec-compliant results.
>
>> At least with the new function manager, if I feel te need I can write a
>> 'CoalesceValues' function (at least for fixed numbers of parameters).
>
>Mmm ... not really.  You could detect nulls all right, but a function-
>based version of COALESCE would evaluate *all* its arguments exactly
>once, which is certainly wrong.

Good point. Although in the specific case (two args, one of them constant),
it's not an issue. I guess I'll just have to live with double-evaluation,
and a COALESCE than can return NULL. Grumble grumble...


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: COALESCE implementation question

От
Tom Lane
Дата:
Philip Warner <pjw@rhyme.com.au> writes:
>> This is a bug caused by interaction between two planning passes run
>> on the same Query node.  The parser thinks it's cool to generate a
>> CASE parsetree with multiple paths to the same sub-select Query node,
>> but in fact it is not cool because planning destructively alters the
>> Query node contents.  I'm amazed it didn't crash, to tell the truth.
>> 
>> I have a patch but haven't applied it yet (been offline for most of
>> two days due to telco idiocy :-().

> Thanks for this; I must admit I was very surprised not to get a response
> withing 24 hours! Is there any chance of sending me the patch - I have been
> looking at the sources for a while now, and it would be nice to see the
> answer...

Well, I'm not *proud* of this patch, it's pretty much brute-force.
But it will do until we get around to redesigning querytrees.
See src/backend/optimizer/plan/subselect.c.

I imagine the diff would apply to 7.0.* if you want to do that.
        regards, tom lane