Re: BUG #14107: Major query planner bug regarding subqueries and indices

Поиск

Список

Период

Сортировка

От	Mathias Kunter
Тема	Re: BUG #14107: Major query planner bug regarding subqueries and indices
Дата	29 апреля 2016 г. 12:15:44
Msg-id	e2090738-62b9-7697-886a-b2a01a8c9482@gmail.com обсуждение исходный текст
Ответ на	Re: BUG #14107: Major query planner bug regarding subqueries and indices (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-bugs

Дерево обсуждения

> If you know that the
> sub-select isn't going to return very many rows, you could do
>
> SELECT ... FROM a WHERE a.x = ? OR a.y = ANY(ARRAY(SELECT ...));

Isn't the planner already doing something like this, since the following
query is using the index as expected:

SELECT ... FROM a WHERE a.x IN (SELECT ...);

> but this would blow up rather badly with a large sub-select result,
> so I'm not sure I want to try to make the planner transform it that
> way automatically.

Wouldn't it be possible then to use this optimization based on the
estimated result size of the subquery? I think this would almost always
be faster than a sequential scan anyway. I observed that using the
ANY(ARRAY(SELECT...)) syntax on my small test tables (100 K rows)
already improves the query time by a factor of more than 100, and it
will be even more when tables are large. Please consider implementing
this optimization!

> I don't actually see any way to do very much with your second example at
> all:
>
>> SELECT ... FROM a JOIN b ON (...) WHERE a.x = ? OR b.y = ?;

Assuming both joined tables contain a PK (or another unique column),
then it should be possible by replanning the query as:

SELECT ... FROM a JOIN b ON ((join_cond AND a.x = ?) OR (join_cond AND
b.y = ?));

(Let "join_cond" denote the original join condition here.) Now, the JOIN
implementation must be smart enough to handle OR conditions: First,
obtain the rows satisfying

join_cond AND a.x = ?

by using the index as usual. For each matching row, create the tuple
(a.id, b.id) and insert it into a search tree (or hash or whatever).
Then, obtain the rows satisfying

join_cond AND b.y = ?

For each matching row, query the search tree whether it already contains
the tuple (a.id, b.id), and only add the current row to the final result
if it doesn't.

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BUG #14107: Major query planner bug regarding subqueries and indices