Re: [HACKERS] Why is that so slow?
От | Tatsuo Ishii |
---|---|
Тема | Re: [HACKERS] Why is that so slow? |
Дата | |
Msg-id | 199903061408.XAA00494@ext16.sra.co.jp обсуждение исходный текст |
Ответ на | Re: [HACKERS] Why is that so slow? (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: [HACKERS] Why is that so slow?
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-hackers |
> Tatsuo Ishii <t-ishii@sra.co.jp> writes: > > postal=> explain select * from postal,prefecture where city ~ '^aaa' and postal.pid = prefecture.pid; > > NOTICE: QUERY PLAN: > > > > Nested Loop (cost=98.90 size=1 width=100) > > -> Seq Scan on prefecture (cost=2.55 size=47 width=26) > > -> Index Scan using pidindex on postal (cost=2.05 size=1 width=74) > > > > This is so slooow. Can anybody explain this? Am I missing something? > > and later: > > I had defined a btree index on pid and it has 2000 > > duplicate entries in average! After I removed the index, the query > > runs unbelievably fast! Now explain shows: > > > Nested Loop (cost=933.82 size=1 width=100) > > -> Index Scan using cityindex on postal (cost=931.77 size=1 width=74) > > -> Index Scan using prefpidindex on prefecture (cost=2.05 size=47 width=26) > > Hmm. Removal of the index is just a hack --- the system should have > been smart enough not to use it. It looks like the system chose the > first plan shown above because it thought that selecting postal entries > matching a particular pid value would on average match only one postal > tuple (note the "size" fields, which are estimates of the numbers of > resulting tuples). But in reality, each scan produced 2000 matching > entries on average, according to your second message --- and each of > those entries had to be tested to see if it had the right city name. > So, very slow. > > The question I have is why didn't the system realize that there would be > lots of matches on pid? The "dispersion" statistics it keeps ought to > have given it a clue that this approach wouldn't be very selective. > > The second example is fast because the scan over postal looking for city > name matches finds only one match, so prefecture is scanned only once. Actulally not only one since I use ~ operator. Anyway matching rows would be reasonably small. > However the cost estimates there also look bogus --- the system is again > mis-guessing how many entries will be selected. It seems to think that > all 47 prefecture entries will be matched by a scan for a specific pid. > So, bogus dispersion values again (or bad use of them). > > Something is fishy here. Have you done a "vacuum analyze" since loading > the data in these tables? Oh, I never thought about that. After re-made the index I removed in the next letter and did vacuum analyze, I got: Hash Join (cost=951.50 size=19 width=100) -> Index Scan using cityindex on postal (cost=944.77 size=19 width=74) -> Hash (cost=0.00 size=0 width=0) -> Seq Scan on prefecture (cost=2.55 size=47 width=26) This plan looks good(and actually as fast as the previous one). However, the cost estimate for prefecture is again 47? -- Tatsuo Ishii
В списке pgsql-hackers по дате отправления: