Re: SourceForge & Postgres
От | mlw |
---|---|
Тема | Re: SourceForge & Postgres |
Дата | |
Msg-id | 3A36C0B3.ADAF69FE@mohawksoft.com обсуждение исходный текст |
Ответ на | Re: SourceForge & Postgres (Bruce Momjian <pgman@candle.pha.pa.us>) |
Список | pgsql-hackers |
Tom Lane wrote: > > mlw <markw@mohawksoft.com> writes: > > btw anyone trying this query should use: "attdispersion" > > Sorry about that --- I just copied-and-pasted the query from some notes > that are obsolete as of 7.1... > > > cdinfo=# explain select * from ztitles where artistid = 100000220 ; > > NOTICE: QUERY PLAN: > > > Seq Scan on ztitles (cost=0.00..4740.75 rows=3163 width=296) > > > And this is with "-o -fs" > > > Index Scan using ztitles_artistid_ndx on ztitles (cost=0.00..5915.01 > > rows=3163 width=296) > > > attname | attdispersion | starelid | staattnum | staop | stanullfrac | stacommonfrac | stacommonval | staloval | stahival > > artistid | 0.0477198 | 19274 | 2 | 97 | 0 | 0.149362 | 100050450 | 100000000 | 100055325 > > The reason why the thing is going for a sequential scan is that > astonishingly high stacommonfrac statistic. Does artistid 100050450 > really account for 14.9% of all the rows in your table? (Who is that > anyway? ;-)) If so, a search for artistid 100050450 definitely *should* > use a sequential scan. I tested this statement against the database and you are right, about 14 seconds with the index, 4 without. BTW ID # 100050450 is "Various Artists" This is sort of a point I was trying to make in previous emails. I think this situation, and this sort of ratio is far more likely than the attention it has been given. In about every project I have used postgres I have run into this. It is only recently that I have understood what the problem was and how to get around it (sort of). This one entry is destroying any intelligent performance we could hope to attain. As I said, I always see this sort of behavior in some implementation. > The problem at hand is estimating the frequency > of entries for some other artistid, given that we only have this much > statistical info available. Obviously the stats are insufficient, and > I hope to do something about that in a release or two, but it ain't > gonna happen for 7.1. In the meantime, if you've got huge outliers > like that, you could try reducing the value of NOT_MOST_COMMON_RATIO > in src/backend/utils/adt/selfuncs.c. I did some playing with this value, and I can seem to have it differentiate between 100050450 and anything else. -- http://www.mohawksoft.com
В списке pgsql-hackers по дате отправления: