Re: A new strategy for pull-up correlated ANY_SUBLINK

Поиск
Список
Период
Сортировка
От Andy Fan
Тема Re: A new strategy for pull-up correlated ANY_SUBLINK
Дата
Msg-id CAKU4AWpp8aANvwU1ivjPtF3AakFjTf1jBm_OFmPSdCqizjBQLw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: A new strategy for pull-up correlated ANY_SUBLINK  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: A new strategy for pull-up correlated ANY_SUBLINK  (Alena Rybakina <lena.ribackina@yandex.ru>)
Список pgsql-hackers
Hi Tom:
  
 Sorry for the delayed response!  I think my knowledge has been refreshed
 for this discussion.
 
One thing I'm not at all clear about is whether we need to restrict
the optimization so that it doesn't occur if the subquery contains
outer references falling outside available_rels.  I think that that
case is covered by is_simple_subquery() deciding later to not pull up
the subquery based on LATERAL restrictions, but maybe that misses
something.

I think we need the restriction and that should be enough for this feature
. Given the query Richard provided before:

explain
select * from tenk1 A where exists
(select 1 from tenk2 B
where A.hundred in (select C.hundred FROM tenk2 C
WHERE c.odd = b.odd));

It first can be converted to the below format without any issue.

SELECT * FROM tenk1 A SEMI JOIN tenk2 B
on A.hundred in (select C.hundred FROM tenk2 C
WHERE c.odd = b.odd);

Then without the restriction, since we only pull the varnos from
sublink->testexpr, then it is {A}, so it convert to

SELECT * FROM 
(tenk1 A SEMI JOIN LATERAL (SELECT c.hundred FROM tenk2 C)
ON c.odd = b.odd AND a.hundred = v.hundred) 
SEMI JOIN on tenk2 B ON TRUE;

then the above query is NOT A VALID QUERY since:
1. The above query is *not* same as

SELECT * FROM (tenk1 A SEMI JOIN tenk2 B) on true
SEMI JOIN LATERAL (SELECT c.hundred FROM tenk2 C) v 
ON v.odd = b.odd;

2. The above query requires b.odd when B is not available. So it is
right that an optimizer can't generate a plan for it. The fix would
be to do the restriction before applying this optimization.

I'm not sure pull-up-subquery can play any role here, IIUC, the bad thing
happens before pull-up-subquery.

I also write & analyze more test and found no issue by me

1. SELECT * FROM tenk1 A LEFT JOIN tenk2 B
ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
==> should not be pull-up to rarg of the left join since A.hundred is not
available.

2.  SELECT * FROM tenk1 A LEFT JOIN tenk2 B
ON B.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = a.odd);
==> should not be pull-up to rarg of the left join since A.odd is not
available.

3. SELECT * FROM tenk1 A LEFT JOIN tenk2 B
ON B.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
==> should be pull-up to rarg of left join.

4. SELECT * FROM tenk1 A INNER JOIN tenk2 B
ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
==> pull-up as expected.

5. SELECT * FROM tenk1 A RIGHT JOIN tenk2 B
ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
==> should not be pull-up into larg of left join since b.odd is not
available.


About the existing test case changes because of this patch, they do
requires on the sublink is planned to a subPlan, so I introduces the below
changes to keep the original intention.

Changes
A in (SELECT A FROM ..)
To
(A, random() > 0) in (SELECT a, random() > 0 FROM ..); 


I'm also wondering whether the similar restriction in
convert_EXISTS_sublink_to_join could be removed similarly.
In this light it was a mistake for convert_EXISTS_sublink_to_join
to manage the pullup itself rather than doing it in the two-step
fashion that convert_ANY_sublink_to_join does it.


Yes,  it is true!  I prefer to believe this deserves a separate patch.  

Any feedback is welcome!

--
Best Regards
Andy Fan
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Daniel Gustafsson
Дата:
Сообщение: Re: Fix code comment in postgres_fdw.c
Следующее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: Should vacuum process config file reload more often