Re: BUG #4949: NOT IN is prohibitive slower than the rewrite for medium to large sets
От | Tom Molesworth |
---|---|
Тема | Re: BUG #4949: NOT IN is prohibitive slower than the rewrite for medium to large sets |
Дата | |
Msg-id | 4A6F78C5.9070607@audioboundary.com обсуждение исходный текст |
Ответ на | Re: BUG #4949: NOT IN is prohibitive slower than the rewrite for medium to large sets (Ole Tange <ole@tange.dk>) |
Ответы |
Re: BUG #4949: NOT IN is prohibitive slower than the rewrite
for medium to large sets
|
Список | pgsql-bugs |
Ole Tange wrote: > On Tue, Jul 28, 2009 at 3:47 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > >> "Ole Tange" <postgresql.org@tange.dk> writes: >> >>> (modulo NULLs which seem to always cause problems in NOT INs). >>> >>> Because it can be rewritten, NOT IN should never be much slower than the >>> rewritten solution, as PostgreSQL should simply rewrite NOT IN to the >>> above. >>> >> Let's see, you understand that the rewrite violates the SQL standard >> semantics of NOT IN, but you think we should do it anyway? >> > > Thanks for your kind reply. > > Apparently my bug report was not quite clear. My rewrite example was > simply to show a way that could do a marvelous speedup on medium to > large sets. The correct dealing with NULL I am sure can be handled > just as efficiently. > Just an observation - it seems that you're using NOT IN() and expecting it to do the same as this: SELECT foo FROM a LEFT JOIN b ON a.key = b.key WHERE b.key IS NULL I find it's comparatively rare to actually want the NOT IN() null-handling semantics, especially given your comment about nulls 'causing problems' (although I guess you could get the same behaviour in that query as NOT IN, by adding 'AND a.key IS NOT NULL' to the where clause - I have no idea whether this is something the optimiser could or should be able to do). Although it refers to TSQL, this page might help explain more about NOT IN: http://stackoverflow.com/questions/129077/sql-not-in-constraint-and-null-values Not sure if constructions such as 'not exists (select 1 from b where b.key = a.key)' also have the same performance issues you describe - should be faster due to the key? - but I find the left join approach works well enough up to several million rows in the two tables. An unindexed subquery like 'select key from a' just seems like a guaranteed way to invoking a full (slow) table scan. Tom
В списке pgsql-bugs по дате отправления: