Re: [HACKERS] Why is that so slow?
От | Tom Lane |
---|---|
Тема | Re: [HACKERS] Why is that so slow? |
Дата | |
Msg-id | 3107.920676281@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: [HACKERS] Why is that so slow? (Tatsuo Ishii <t-ishii@sra.co.jp>) |
Ответы |
Re: [HACKERS] Why is that so slow?
|
Список | pgsql-hackers |
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > postal=> explain select * from postal,prefecture where city ~ '^aaa' and postal.pid = prefecture.pid; > NOTICE: QUERY PLAN: > > Nested Loop (cost=98.90 size=1 width=100) > -> Seq Scan on prefecture (cost=2.55 size=47 width=26) > -> Index Scan using pidindex on postal (cost=2.05 size=1 width=74) > > This is so slooow. Can anybody explain this? Am I missing something? and later: > I had defined a btree index on pid and it has 2000 > duplicate entries in average! After I removed the index, the query > runs unbelievably fast! Now explain shows: > Nested Loop (cost=933.82 size=1 width=100) > -> Index Scan using cityindex on postal (cost=931.77 size=1 width=74) > -> Index Scan using prefpidindex on prefecture (cost=2.05 size=47 width=26) Hmm. Removal of the index is just a hack --- the system should have been smart enough not to use it. It looks like the system chose the first plan shown above because it thought that selecting postal entries matching a particular pid value would on average match only one postal tuple (note the "size" fields, which are estimates of the numbers of resulting tuples). But in reality, each scan produced 2000 matching entries on average, according to your second message --- and each of those entries had to be tested to see if it had the right city name. So, very slow. The question I have is why didn't the system realize that there would be lots of matches on pid? The "dispersion" statistics it keeps ought to have given it a clue that this approach wouldn't be very selective. The second example is fast because the scan over postal looking for city name matches finds only one match, so prefecture is scanned only once. However the cost estimates there also look bogus --- the system is again mis-guessing how many entries will be selected. It seems to think that all 47 prefecture entries will be matched by a scan for a specific pid. So, bogus dispersion values again (or bad use of them). Something is fishy here. Have you done a "vacuum analyze" since loading the data in these tables? regards, tom lane
В списке pgsql-hackers по дате отправления: