Re: IN vs EXIIST
От | Greg Stark |
---|---|
Тема | Re: IN vs EXIIST |
Дата | |
Msg-id | 87ofaqrj9a.fsf@stark.dyndns.tv обсуждение исходный текст |
Ответ на | Re: IN vs EXIIST (Jean-Christian Imbeault <jc@mega-bucks.co.jp>) |
Список | pgsql-general |
Caveat, I'm new to Postgres. However extrapolating from my experience with Oracle I think some of you are going about this optimization work the wrong way around. Don't focus on IN vs EXISTS, focus on the query plans they generate. Which of two identical queries is faster is purely a quirk of optimizer (and it's arguably always a bug if they aren't both the same). It's like debating which of two identical cars will be faster without actually checking who is driving. And don't focus on the estimated costs from explain plan. If they were always right then there wouldn't be any disputes like these. Well not quite. So in the cases below as Jean-Christian showed the NOT IN query leads the optimizer to do two sequential scans and a full sort and unique operation. Whereas the EXISTS syntax query leads the optimizer to use the index on one of the tables. Usually doing a full table scan and an extra sort and unique operation will be slower, unless you were going to be reading a lot of the table anyways. So which of these two plans will be fastest should hinge on how much of the truth table pertains to the attribute being checked. Exactly where the breakeven point is depends on the details of the database engine. For Oracle the traditional rule of thumb is a (surprisingly low) 10%. Someone implied that EXISTS might be more influenced by a VACUUM ANALYZE. That wouldn't be surprising if the optimizer is changing strategies if it has enough data to determine that an index scan will read more than the magic breakeven percentage of the full table and switches to a full table scan. In the query plans below I stripped out the distracting cost estimates: Jean-Christian Imbeault <jc@mega-bucks.co.jp> writes: > psql TMP -c "explain select count(distinct invoice_id) from invoice_li where > received='true' AND shipped='false' AND cancelled='false' > AND > (invoice_id not in ... > > NOTICE: QUERY PLAN: > > Aggregate > -> Seq Scan on invoice_li > SubPlan > -> Materialize > -> Unique > -> Sort > -> Seq Scan on invoice_li > $ psql TMP -c "explain select count(distinct invoice_id) from invoice_li where > received='true' > AND shipped='false' AND cancelled='false' > AND > (NOT EXISTS ... > > NOTICE: QUERY PLAN: > > Aggregate > -> Seq Scan on invoice_li > SubPlan > -> Index Scan using invoice_li_pkey on invoice_li sq_inv_li -- greg
В списке pgsql-general по дате отправления: