Re: Performance improvement for joins where outer side is unique

Поиск

Список

Период

Сортировка

От	David G. Johnston
Тема	Re: Performance improvement for joins where outer side is unique
Дата	12 марта 2016 г. 14:44:24
Msg-id	CAKFQuwZKHSR8Hkc+13S-2y_Cg5O105ZvG89Q6ermc_=pzuQq0A@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Performance improvement for joins where outer side is unique (David Rowley <david.rowley@2ndquadrant.com>)
Ответы	Re: Performance improvement for joins where outer side is unique
Список	pgsql-hackers

Дерево обсуждения

On Saturday, March 12, 2016, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 12 March 2016 at 11:43, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> I wondered why, instead of inventing an extra semantics-modifying flag,
>> we couldn't just change the jointype to *be* JOIN_SEMI when we've
>> discovered that the inner side is unique.
>
> BTW, to clarify: I'm not imagining that we'd make this change in the
> query jointree, as for example prepjointree.c might do. That would appear
> to add join order constraints, which we don't want. But I'm thinking that
> at the instant where we form a join Path, we could change the Path's
> jointype to be JOIN_SEMI or JOIN_SEMI_OUTER if we're able to prove the
> inner side unique, rather than annotating the Path with a separate flag.
> Then that representation is what propagates forward.

Thanks for looking at this.

Yes that might work, since we'd just be changing the jointype in the
JoinPath, if that path was discarded if favour of, say the commutative
variant, which was not "unique inner", then it shouldn't matter, as
the join type for that path would be the original one.

The thing that might matter is that, this;

explain (costs off) select * from t1 inner join t2 on t1.id=t2.id
QUERY PLAN
------------------------------
Hash Join
Hash Cond: (t1.id = t2.id)
-> Seq Scan on t1
-> Hash
-> Seq Scan on t2

could become;

QUERY PLAN
------------------------------
Hash Semi Join
Hash Cond: (t1.id = t2.id)
-> Seq Scan on t1
-> Hash
-> Seq Scan on t2

Wouldn't that cause quite a bit of confusion? People browsing EXPLAIN
output might be a bit confused at the lack of EXISTS/IN clause in a
query which is showing a Semi Join. Now, we could get around that by
adding JOIN_SEMI_INNER I guess, and just displaying that as a normal
inner join, yet it'll behave exactly like JOIN_SEMI!

Don't the semantics of a SEMI JOIN also state that the output columns only come from the outer relation? i.e., the inner relation doesn't contribute either rows or columns to the final result? Or is that simply an implementation artifact of the fact that the only current way to perform a semi-join explicitly is via exists/in?

David J.

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Performance improvement for joins where outer side is unique