Simple join doesn't use index

Поиск
Список
Период
Сортировка
От Alex Vinnik
Тема Simple join doesn't use index
Дата
Msg-id CALd8TVECZDosPDZ7Cwv1gVRy1p2F4Hg+sw+240MsN36KUkt31g@mail.gmail.com
обсуждение исходный текст
Ответы Re: Simple join doesn't use index  (Jeremy Harris <jgh@wizmail.org>)
Re: Simple join doesn't use index  (Stefan Andreatta <s.andreatta@synedra.com>)
Re: Simple join doesn't use index  (Merlin Moncure <mmoncure@gmail.com>)
Re: Simple join doesn't use index  (Jeff Janes <jeff.janes@gmail.com>)
Re: Simple join doesn't use index  (Alex Vinnik <alvinnik.g@gmail.com>)
Список pgsql-performance
Hi everybody,

I have implemented my first app using PG DB and thought for a minute(may be two) that I know something about PG but below problem totally destroyed my confidence :). Please help me to restore it. 

Here is simple join query. It runs just fine on MS SQL 2008 and uses all available indexes using even bigger overall dataset. 

from visits join views on visits.id = views.visit_id
where visits.created_at >= '11/15/2012' and visits.created_at < '11/16/2012' 

Quick performance stat

MS SQL: 1 second, 264K rows
PG: 158 seconds,  264K rows

Explain plan from both DBs

PG QUERY PLAN
Hash Join  (cost=12716.17..1101820.09 rows=248494 width=8)
  Hash Cond: (views.visit_id = visits.id)
  ->  Seq Scan on views  (cost=0.00..819136.56 rows=17434456 width=8)
  ->  Hash  (cost=10549.16..10549.16 rows=132081 width=4)
        ->  Index Scan using visits_created_at_index on visits  (cost=0.00..10549.16 rows=132081 width=4)
              Index Cond: ((created_at >= '2012-11-15 00:00:00'::timestamp without time zone) AND (created_at < '2012-11-16 00:00:00'::timestamp without time zone))

schemaname | tablename |            indexname            | tablespace |                                         indexdef                                         
------------+-----------+---------------------------------+------------+------------------------------------------------------------------------------------------
 public     | views     | views_pkey                      |            | CREATE UNIQUE INDEX views_pkey ON views USING btree (id)
 public     | views     | views_visit_id_index            |            | CREATE INDEX views_visit_id_index ON views USING btree (visit_id)

MS SQL Query plan
'11/16/2012'
  |--Parallelism(Gather Streams)
       |--Nested Loops(Inner Join, OUTER REFERENCES:([visits].[id], [Expr1006]) OPTIMIZED WITH UNORDERED PREFETCH)
            |--Index Seek(OBJECT:([visits].[test]), SEEK:([visits].[created_at] >= '2012-11-15 00:00:00.000' AND [visits].[created_at] < '2012-11-16 00:00:00.000') ORDERED FORWARD)
            |--Index Seek(OBJECT:([views].[views_visit_id_index]), SEEK:([views].[visit_id]=[raw_visits].[id]) ORDERED FORWARD)

It is clear that PG does full table scan "Seq Scan on views  (cost=0.00..819136.56 rows=17434456 width=8)"

Don't understand why PG doesn't use views_visit_id_index in that query but rather scans whole table. One explanation I have found that when resulting dataset constitutes ~15% of total number of rows in the table then seq scan is used. In this case resulting dataset is just 1.5% of total number of rows. So it must be something different. Any reason why it happens and how to fix it?

Postgres 9.2
Ubuntu 12.04.1 LTS
shared_buffers = 4GB the rest of the settings are default ones

Thanks
-Alex

В списке pgsql-performance по дате отправления:

Предыдущее
От: Daniel Westermann
Дата:
Сообщение: Re: FW: performance issue with a 2.5gb joinded table
Следующее
От: Jeremy Harris
Дата:
Сообщение: Re: Simple join doesn't use index