Re: NOT IN vs. NOT EXISTS performance
От | Merlin Moncure |
---|---|
Тема | Re: NOT IN vs. NOT EXISTS performance |
Дата | |
Msg-id | CAHyXU0xqvzCt=cRc2+D4dTtWA1C4GmjvK=DPq3_db3FTOoGjcA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: NOT IN vs. NOT EXISTS performance (David Rowley <david.rowley@2ndquadrant.com>) |
Ответы |
Re: NOT IN vs. NOT EXISTS performance
|
Список | pgsql-performance |
On Thu, Nov 8, 2018 at 3:12 PM David Rowley <david.rowley@2ndquadrant.com> wrote: > > On 9 November 2018 at 08:35, Lincoln Swaine-Moore > <lswainemoore@gmail.com> wrote: > > My primary question is: why is this approach only possible (for data too > > large for memory) when using NOT EXISTS, and not when using NOT IN? > > > > I understand that there is a slight difference in the meaning of the two > > expressions, in that NOT IN will produce NULL if there are any NULL values > > in the right hand side (in this case there are none, and the queries should > > return the same COUNT). But if anything, I would expect that to improve > > performance of the NOT IN operation, since a single pass through that data > > should reveal if there are any NULL values, at which point that information > > could be used to short-circuit. So I am a bit baffled. > > The problem is that the planner makes the plan and would have to know > beforehand that no NULLs could exist on either side of the join. Yeah, the core issue is the SQL rules that define NOT IN behaves as: postgres=# select 1 not in (select 2); ?column? ────────── t (1 row) postgres=# select 1 not in (select 2 union all select null); ?column? ────────── (1 row) There's a certain logic to it but it's a death sentence for performance. merlin
В списке pgsql-performance по дате отправления: