Re: Planning for improved versions of IN/NOT IN
| От | Joe Conway |
|---|---|
| Тема | Re: Planning for improved versions of IN/NOT IN |
| Дата | |
| Msg-id | 3DE83F57.10801@joeconway.com обсуждение исходный текст |
| Ответ на | Planning for improved versions of IN/NOT IN (Tom Lane <tgl@sss.pgh.pa.us>) |
| Список | pgsql-hackers |
Tom Lane wrote: > I've been thinking about how to improve the performance of queries using > "WHERE x IN (subselect)" and "WHERE x NOT IN (subselect)". > > In the existing implementation, the subquery result is rescanned to look > for a match to x each time the WHERE clause is executed; this essentially > makes it work like a nestloop join of the stupidest variety. (We do > stick a Materialize node atop the subselect if it looks complicated, but > that's not a big help in typical cases.) > > I've thought of three alternative implementations that would perform > better in various scenarios. Each would be relatively simple to > implement; the problem I'm having is figuring out how to get the planner > to choose the best one. The alternatives are basically: > [abbreviated] > 1. Add FROM item > 2. Hash-based > 3. Inner indexscan > 4. the existing implementation > The difficulty is that it's not clear how to choose one of these four > ways, short of planning the *entire* query from scratch all four ways :-(. > This seems pretty grim. Approaches #2 and #3 could be handled as local > transformations of the WHERE clause, but we couldn't choose which to use > very well if we don't yet know how many outer rows the WHERE clause will > be executed for. Approach #1 is really planning a completely different > query --- one with an extra FROM-item --- and there's not going to be > all that much commonality in the computations, unless we restrict where > the added FROM-item can be joined to the others, which'd more or less > defeat the purpose. > > Anyone see a way around this difficulty? How about starting with a rule-based method to make the choice? 1. If uncorrelated: use hash-based approach - ISTM this might address a large percentage of the problem cases -- it couldeven handle the "IN (list-of-scalars)" case. Could it fall back to a tuplesort/binary-search for the too many tohash in memory case? 2. If correlated: use an inner indexscan 3. If you come up with a pattern where none of the approaches produce a correct answer, use the existing implementation You could always get fancier later if needed, but something along these lines would be a great start. Joe
В списке pgsql-hackers по дате отправления: