Re: Weird indices
От | Richard Huxton |
---|---|
Тема | Re: Weird indices |
Дата | |
Msg-id | 004101c09cb6$d573f060$1001a8c0@archonet.com обсуждение исходный текст |
Ответ на | Re: Weird indices (Joseph Shraibman <jks@selectacast.net>) |
Список | pgsql-general |
From: "Tom Lane" <tgl@sss.pgh.pa.us> > Martijn van Oosterhout <kleptog@svana.org> writes: > > We have a table with over 1 million rows and the statistics Postgres gathers > > are not particularly useful. There is not one (non-null) value that occurs > > significantly more often than other values but the distribution looks a lot > > like a 1/x curve I guess. The most common value occurs 5249 times but the > > average is only 95, so Postgres chooses seq scan almost always. We actually > > now set enable_seqscan=off in many areas of our code to speed it up to a > > useful rate. (This table also happens to have an (accedental) clustering on > > this column also). > > > What is the reasoning behind estimating like that? Why not just the average > > or the average + 1 SD? > > Can you think of a reasonable algorithm for VACUUM to obtain the true > average frequency? It has a tough enough time estimating the most > common frequency with any reliability. Given complaints in nearby > threads that VACUUM ANALYZE is too slow, it'd be a good idea if your > algorithm was faster than the current one, too ;-) I'm don't see that there's any way that you're going to get an analyser that _always_ gets it right. Might there not be some way of explicitly telling the analyser the distribution of the data. Like Martijn says above, he thinks the distribution is something like 1/x. In the cases where you really care you probably do know what sort of values are stored. I have to admit my maths isn't good enough to say how sensible an idea this is, but figured I'd put my tuppence-worth in. - Richard Huxton
В списке pgsql-general по дате отправления: