Re: Performance improvement for joins where outer side is unique

Поиск
Список
Период
Сортировка
От David G. Johnston
Тема Re: Performance improvement for joins where outer side is unique
Дата
Msg-id CAKFQuwZKHSR8Hkc+13S-2y_Cg5O105ZvG89Q6ermc_=pzuQq0A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Performance improvement for joins where outer side is unique  (David Rowley <david.rowley@2ndquadrant.com>)
Ответы Re: Performance improvement for joins where outer side is unique  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Saturday, March 12, 2016, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 12 March 2016 at 11:43, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> I wondered why, instead of inventing an extra semantics-modifying flag,
>> we couldn't just change the jointype to *be* JOIN_SEMI when we've
>> discovered that the inner side is unique.
>
> BTW, to clarify: I'm not imagining that we'd make this change in the
> query jointree, as for example prepjointree.c might do.  That would appear
> to add join order constraints, which we don't want.  But I'm thinking that
> at the instant where we form a join Path, we could change the Path's
> jointype to be JOIN_SEMI or JOIN_SEMI_OUTER if we're able to prove the
> inner side unique, rather than annotating the Path with a separate flag.
> Then that representation is what propagates forward.

Thanks for looking at this.

Yes that might work, since we'd just be changing the jointype in the
JoinPath, if that path was discarded if favour of, say the commutative
variant, which was not "unique inner", then it shouldn't matter, as
the join type for that path would be the original one.

The thing that might matter is that, this;


explain (costs off) select * from t1 inner join t2 on t1.id=t2.id
         QUERY PLAN
------------------------------
 Hash Join
   Hash Cond: (t1.id = t2.id)
   ->  Seq Scan on t1
   ->  Hash
         ->  Seq Scan on t2

could become;

          QUERY PLAN
------------------------------
 Hash Semi Join
   Hash Cond: (t1.id = t2.id)
   ->  Seq Scan on t1
   ->  Hash
         ->  Seq Scan on t2

Wouldn't that cause quite a bit of confusion? People browsing EXPLAIN
output might be a bit confused at the lack of EXISTS/IN clause in a
query which is showing a Semi Join. Now, we could get around that by
adding JOIN_SEMI_INNER I guess, and just displaying that as a normal
inner join, yet it'll behave exactly like JOIN_SEMI!


Don't the semantics of a SEMI JOIN also state that the output columns only come from the outer relation? i.e., the inner relation doesn't contribute either rows or columns to the final result?  Or is that simply an implementation artifact of the fact that the only current way to perform a semi-join explicitly is via exists/in?

David J.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Julien Rouhaud
Дата:
Сообщение: Re: auto_explain sample rate
Следующее
От: "Daniel Verite"
Дата:
Сообщение: Re: [patch] Proposal for \crosstabview in psql