Re: BUG #4949: NOT IN is prohibitive slower than the rewrite for medium to large sets

Поиск

Список

Период

Сортировка

От	Tom Molesworth
Тема	Re: BUG #4949: NOT IN is prohibitive slower than the rewrite for medium to large sets
Дата	28 июля 2009 г. 20:00:23
Msg-id	4A6F78C5.9070607@audioboundary.com обсуждение исходный текст
Ответ на	Re: BUG #4949: NOT IN is prohibitive slower than the rewrite for medium to large sets (Ole Tange <ole@tange.dk>)
Ответы	Re: BUG #4949: NOT IN is prohibitive slower than the rewrite for medium to large sets
Список	pgsql-bugs

Дерево обсуждения

Ole Tange wrote:
> On Tue, Jul 28, 2009 at 3:47 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>
>> "Ole Tange" <postgresql.org@tange.dk> writes:
>>
>>> (modulo NULLs which seem to always cause problems in NOT INs).
>>>
>>> Because it can be rewritten, NOT IN should never be much slower than the
>>> rewritten solution, as PostgreSQL should simply rewrite NOT IN to the
>>> above.
>>>
>> Let's see, you understand that the rewrite violates the SQL standard
>> semantics of NOT IN, but you think we should do it anyway?
>>
>
> Thanks for your kind reply.
>
> Apparently my bug report was not quite clear. My rewrite example was
> simply to show a way that could do a marvelous speedup on medium to
> large sets. The correct dealing with NULL I am sure can be handled
> just as efficiently.
>

Just an observation - it seems that you're using NOT IN() and expecting
it to do the same as this:

SELECT foo FROM a LEFT JOIN b ON a.key = b.key WHERE b.key IS NULL

I find it's comparatively rare to actually want the NOT IN()
null-handling semantics, especially given your comment about nulls
'causing problems' (although I guess you could get the same behaviour in
that query as NOT IN, by adding 'AND a.key IS NOT NULL' to the where
clause - I have no idea whether this is something the optimiser could or
should be able to do).

Although it refers to TSQL, this page might help explain more about NOT IN:

http://stackoverflow.com/questions/129077/sql-not-in-constraint-and-null-values

Not sure if constructions such as 'not exists (select 1 from b where
b.key = a.key)' also have the same performance issues you describe -
should be faster due to the key? - but I find the left join approach
works well enough up to several million rows in the two tables. An
unindexed subquery like 'select key from a' just seems like a guaranteed
way to invoking a full (slow) table scan.

Tom

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BUG #4949: NOT IN is prohibitive slower than the rewrite for medium to large sets