Postgresql 9.0.7 weird planner decision (rows in plan close to reality but plan suboptimal)

Поиск
Список
Период
Сортировка
От Maxim Boguk
Тема Postgresql 9.0.7 weird planner decision (rows in plan close to reality but plan suboptimal)
Дата
Msg-id CAK-MWwQjqPhwES=1FYxTRYPTyAJpx5J+mFVtc+7U23SBUBZ3fw@mail.gmail.com
обсуждение исходный текст
Список pgsql-general
Hi,

Today on one of databases under my management I found very strange plan for simple query.
Postgresql 9.0.7 on Linux, 
random_page_cost=4
seq_page_cost=1

The query and plan:

db=# EXPLAIN (ANALYZE, COSTS, BUFFERS)  select obj_id, obj_commented,p2o_id FROM blog_post as obj
JOIN person2obj ON p2o_obj_obj_id = obj_id
JOIN person2obj_counters ON p2oc_id = p2o_id
WHERE obj_status_did = 1
AND obj_commented IS NOT NULL
AND obj_commented > now() - '7days'::interval
AND obj_commented > p2o_notified
AND p2o_notify = 't';
                                                                               QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=6546.54..6584.69 rows=3 width=24) (actual time=86.262..1349.266 rows=1770 loops=1)
   Merge Cond: (person2obj_counters.p2oc_id = person2obj.p2o_id)
   Buffers: shared hit=1140491
   ->  Index Scan using pk_person2obj_counters on person2obj_counters  (cost=0.00..47110.95 rows=1212591 width=8) (actual time=0.008..997.948 rows=1212765 loops=1)
         Buffers: shared hit=1108452
   ->  Sort  (cost=6546.42..6546.98 rows=221 width=24) (actual time=85.877..88.373 rows=7870 loops=1)
         Sort Key: person2obj.p2o_id
         Sort Method:  quicksort  Memory: 807kB
         Buffers: shared hit=32039
         ->  Nested Loop  (cost=0.00..6537.82 rows=221 width=24) (actual time=0.097..80.129 rows=7870 loops=1)
               Buffers: shared hit=32039
               ->  Index Scan using i_blog_post_commented_active on blog_post obj  (cost=0.00..225.73 rows=1726 width=16) (actual time=0.028..17.957 rows=6010 loops=1)
                     Index Cond: ((obj_commented IS NOT NULL) AND (obj_commented > (now() - '7 days'::interval)))
                     Buffers: shared hit=6207
               ->  Index Scan using i_person2obj_obj_notified_subscribed on person2obj  (cost=0.00..3.64 rows=1 width=24) (actual time=0.006..0.009 rows=1 loops=6010)
                     Index Cond: ((person2obj.p2o_obj_obj_id = obj.obj_id) AND (obj.obj_commented > person2obj.p2o_notified))
                     Buffers: shared hit=25832
 Total runtime: 1349.767 ms

I don't understand why database choose merge join with 1.2M entries table.
person2obj_counters have an index on p2oc_id (it's a primary key field).

Switch to fast inner loop plan could be managed with set random_page_cost=10

db=# set random_page_cost to 10;
SET
db=# EXPLAIN (ANALYZE, COSTS, BUFFERS)  select obj_id, obj_commented,p2o_id FROM blog_post as obj
JOIN person2obj ON p2o_obj_obj_id = obj_id
JOIN person2obj_counters ON p2oc_id = p2o_id
WHERE obj_status_did = 1
AND obj_commented IS NOT NULL
AND obj_commented > now() - '7days'::interval
AND obj_commented > p2o_notified
AND p2o_notify = 't';
                                                                            QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..14810.38 rows=3 width=24) (actual time=16.910..115.110 rows=1758 loops=1)
   Buffers: shared hit=57403
   ->  Nested Loop  (cost=0.00..14616.37 rows=221 width=24) (actual time=0.088..82.342 rows=7858 loops=1)
         Buffers: shared hit=32046
         ->  Index Scan using i_blog_post_commented_active on blog_post obj  (cost=0.00..273.70 rows=1725 width=16) (actual time=0.029..16.260 rows=6009 loops=1)
               Index Cond: ((obj_commented IS NOT NULL) AND (obj_commented > (now() - '7 days'::interval)))
               Buffers: shared hit=6222
         ->  Index Scan using i_person2obj_obj_notified_subscribed on person2obj  (cost=0.00..8.30 rows=1 width=24) (actual time=0.007..0.010 rows=1 loops=6009)
               Index Cond: ((person2obj.p2o_obj_obj_id = obj.obj_id) AND (obj.obj_commented > person2obj.p2o_notified))
               Buffers: shared hit=25824
   ->  Index Scan using pk_person2obj_counters on person2obj_counters  (cost=0.00..0.87 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=7858)
         Index Cond: (person2obj_counters.p2oc_id = person2obj.p2o_id)
         Buffers: shared hit=25357
 Total runtime: 115.465 ms
(14 rows)

Ok... so what I see... the database think it will need join 221 rows from previous level to the person2obj_counters table.
And somehow Pg  manage to calculate that the full index scan over 1.2M entries of the person2obj_counters table is faster that nested loop probes over 221 value.

What look very suspicious is that merge full index scan+merge join part
adds only 40 points to the total cost (
Merge Join  (cost=6546.54..6584.69 rows=3 width=24) (actual time=86.262..1349.266 rows=1770 loops=1)
...
->  Index Scan using pk_person2obj_counters on person2obj_counters  (cost=0.00..47110.95 rows=1212591 width=8) (actual time=0.008..997.948 rows=1212765 loops=1)
...
   ->  Sort  (cost=6546.42..6546.98 rows=221 width=24) (actual time=85.877..88.373 rows=7870 loops=1)
)... how that could be?

--
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
"If they can send one man to the moon... why can't they send them all?"

МойКруг: http://mboguk.moikrug.ru/
"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."


В списке pgsql-general по дате отправления:

Предыдущее
От: hamann.w@t-online.de
Дата:
Сообщение: question about alternate ordering of results
Следующее
От: Vincent Dautremont
Дата:
Сообщение: measure time intervals