Re: 7.0.2 and 6.5.3 performance
От | Tom Lane |
---|---|
Тема | Re: 7.0.2 and 6.5.3 performance |
Дата | |
Msg-id | 9634.969244010@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: 7.0.2 and 6.5.3 performance (lec <englim@pc.jaring.my>) |
Список | pgsql-general |
lec <englim@pc.jaring.my> writes: > explain select itm_section, itm_brand, itm_itemno, sum(stl_qty) > from stl, item where stl_itemno = itm_itemno > and stl_discounttype in ('','S','V') > and stl_trxdate>='Jul 01, 2000' and stl_trxdate<='Jul 02, 2000' > and stl_status='N' > and itm_sectiontype='O' > and itm_section >='101' > and itm_section <='135' > group by itm_section, itm_brand, itm_itemno; > The following is the query/explain under 7.0.2: > -> Nested Loop (cost=0.00..27770.88 rows=1 width=72) > -> Seq Scan on item (cost=0.00..2549.45 rows=1 width=36) > -> Index Scan using stl_datetrx on stl (cost=0.00..25157.34 rows=5127 width=36) > > The following is the query/explain under 6.5.3: > -> Hash Join (cost=25780.13 rows=55386 width=72) > -> Index Scan using stl_datetrx on stl (cost=15488.75 rows=210658 width=36) > -> Hash (cost=3225.66 rows=2879 width=36) > -> Seq Scan on item (cost=3225.66 rows=2879 width=36) What's driving the difference in plan choice is evidently the much smaller number of rows that 7.0 estimates will match the query restrictions. Specifically, it's figuring that only one item row will match and itm_sectiontype='O' and itm_section >='101' and itm_section <='135' whereas 6.5 estimates 2879. With only one row to match, it figures it might as well just do the join with a nested loop and not bother to set up a hashtable. Since you're complaining, I suppose this statistical estimate is way off :-(. What do you actually get from select count(*) from stl where stl_discounttype in ('','S','V') and stl_trxdate>='Jul 01, 2000' and stl_trxdate<='Jul 02, 2000' and stl_status='N'; select count(*) from item where itm_sectiontype='O' and itm_section >='101' and itm_section <='135'; ? I suspect the reason for the difference in estimated row counts is that 7.0 recognizes that these are range queries; instead of treating "stl_trxdate>='Jul 01, 2000' and stl_trxdate<='Jul 02, 2000'" as two independent clauses, or "itm_section >='101' and itm_section <='135'" as independent clauses, it combines those clauses and tries to estimate the fraction of the data range being requested. This is usually a win but it's possible to lose rather badly if the requested range covers a spike in the data's frequency distribution. We don't yet have adequate statistics to know that a spike may be involved... regards, tom lane
В списке pgsql-general по дате отправления: