Re: BUG #14107: Major query planner bug regarding subqueries and indices
От | Mathias Kunter |
---|---|
Тема | Re: BUG #14107: Major query planner bug regarding subqueries and indices |
Дата | |
Msg-id | e2090738-62b9-7697-886a-b2a01a8c9482@gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #14107: Major query planner bug regarding subqueries and indices (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-bugs |
> If you know that the > sub-select isn't going to return very many rows, you could do > > SELECT ... FROM a WHERE a.x = ? OR a.y = ANY(ARRAY(SELECT ...)); Isn't the planner already doing something like this, since the following query is using the index as expected: SELECT ... FROM a WHERE a.x IN (SELECT ...); > but this would blow up rather badly with a large sub-select result, > so I'm not sure I want to try to make the planner transform it that > way automatically. Wouldn't it be possible then to use this optimization based on the estimated result size of the subquery? I think this would almost always be faster than a sequential scan anyway. I observed that using the ANY(ARRAY(SELECT...)) syntax on my small test tables (100 K rows) already improves the query time by a factor of more than 100, and it will be even more when tables are large. Please consider implementing this optimization! > I don't actually see any way to do very much with your second example at > all: > >> SELECT ... FROM a JOIN b ON (...) WHERE a.x = ? OR b.y = ?; Assuming both joined tables contain a PK (or another unique column), then it should be possible by replanning the query as: SELECT ... FROM a JOIN b ON ((join_cond AND a.x = ?) OR (join_cond AND b.y = ?)); (Let "join_cond" denote the original join condition here.) Now, the JOIN implementation must be smart enough to handle OR conditions: First, obtain the rows satisfying join_cond AND a.x = ? by using the index as usual. For each matching row, create the tuple (a.id, b.id) and insert it into a search tree (or hash or whatever). Then, obtain the rows satisfying join_cond AND b.y = ? For each matching row, query the search tree whether it already contains the tuple (a.id, b.id), and only add the current row to the final result if it doesn't.
В списке pgsql-bugs по дате отправления: