Re: Sequential scan instead of index scan

Поиск
Список
Период
Сортировка
От Ioannis Anagnostopoulos
Тема Re: Sequential scan instead of index scan
Дата
Msg-id 50203F5A.3040805@anatec.com
обсуждение исходный текст
Ответ на Re: Sequential scan instead of index scan  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Sequential scan instead of index scan  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
On 06/08/2012 16:34, Tom Lane wrote:
Ioannis Anagnostopoulos <ioannis@anatec.com> writes:
        I think this is a pretty good plan and quite quick given the       size of the table (88Million rows at present). However in real       life the parameter where I search for msg_id is not an array of       3 ids but of 300.000 or more. It is then that the query forgets       the plan and goes to sequential scan. Is there any way around?
If you've got that many, any(array[....]) is a bad choice.  I'd try
putting the IDs into a VALUES(...) list, or even a temporary table, and
then writing the query as a join.  It is a serious mistake to think that
a seqscan is evil when you're dealing with joining that many rows, btw.
What you should probably be looking for is a hash join plan.
		regards, tom lane
Ok in that scenario we are back to square one. Following your suggestion my resultant query is this (the temporary table is tmp_tbl_messages)
select
            ship_pos_messages.*
        from
            feed_all_y2012m08.ship_pos_messages join tmp_tbl_messages on (ship_pos_messages.msg_id = tmp_tbl_messages.msg_id)
        where
            extract('day' from msg_date_rec) = 1
            AND date_trunc('day', msg_date_rec) = '2012-08-01';

which gives us the following explain analyse:

"Merge Join  (cost=1214220.48..3818359.46 rows=173574357 width=128) (actual time=465036.958..479089.731 rows=341190 loops=1)"
"  Merge Cond: (feed_all_y2012m08.ship_pos_messages.msg_id = tmp_tbl_messages.msg_id)"
"  ->  Sort  (cost=1178961.70..1179223.51 rows=104725 width=128) (actual time=464796.971..476579.208 rows=19512873 loops=1)"
"        Sort Key: feed_all_y2012m08.ship_pos_messages.msg_id"
"        Sort Method:  external merge  Disk: 1254048kB"
"        ->  Append  (cost=0.00..1170229.60 rows=104725 width=128) (actual time=0.033..438682.971 rows=19512883 loops=1)"
"              ->  Seq Scan on ship_pos_messages  (cost=0.00..0.00 rows=1 width=100) (actual time=0.000..0.000 rows=0 loops=1)"
"                    Filter: ((date_part('day'::text, msg_date_rec) = 1::double precision) AND (date_trunc('day'::text, msg_date_rec) = '2012-08-01 00:00:00'::timestamp without time zone))"
"              ->  Seq Scan on ship_a_pos_messages ship_pos_messages  (cost=0.00..0.00 rows=1 width=100) (actual time=0.000..0.000 rows=0 loops=1)"
"                    Filter: ((date_part('day'::text, msg_date_rec) = 1::double precision) AND (date_trunc('day'::text, msg_date_rec) = '2012-08-01 00:00:00'::timestamp without time zone))"
"              ->  Index Scan using idx_ship_b_std_pos_messages_date_trunc on ship_b_std_pos_messages ship_pos_messages  (cost=0.00..58657.09 rows=5269 width=128) (actual time=0.032..799.171 rows=986344 loops=1)"
"                    Index Cond: (date_trunc('day'::text, msg_date_rec) = '2012-08-01 00:00:00'::timestamp without time zone)"
"                    Filter: (date_part('day'::text, msg_date_rec) = 1::double precision)"
"              ->  Index Scan using idx_ship_b_ext_pos_messages_date_trunc on ship_b_ext_pos_messages ship_pos_messages  (cost=0.00..1694.64 rows=141 width=128) (actual time=0.026..20.661 rows=26979 loops=1)"
"                    Index Cond: (date_trunc('day'::text, msg_date_rec) = '2012-08-01 00:00:00'::timestamp without time zone)"
"                    Filter: (date_part('day'::text, msg_date_rec) = 1::double precision)"
"              ->  Index Scan using idx_ship_a_pos_messages_wk0_date_trunc on ship_a_pos_messages_wk0 ship_pos_messages  (cost=0.00..1109877.86 rows=99313 width=128) (actual time=0.029..435784.376 rows=18499560 loops=1)"
"                    Index Cond: (date_trunc('day'::text, msg_date_rec) = '2012-08-01 00:00:00'::timestamp without time zone)"
"                    Filter: (date_part('day'::text, msg_date_rec) = 1::double precision)"
"  ->  Sort  (cost=35258.79..36087.50 rows=331486 width=8) (actual time=239.908..307.576 rows=349984 loops=1)"
"        Sort Key: tmp_tbl_messages.msg_id"
"        Sort Method:  quicksort  Memory: 28694kB"
"        ->  Seq Scan on tmp_tbl_messages  (cost=0.00..4863.86 rows=331486 width=8) (actual time=0.047..55.227 rows=349984 loops=1)"
"Total runtime: 479336.869 ms"



Which is a Merge join and not a hash. Any ideas how to make it a hash join?

Kind Regards
Yiannis

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Midge Brown"
Дата:
Сообщение: Re: slow query, different plans
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Sequential scan instead of index scan