Re: multi-column index
От | Tom Lane |
---|---|
Тема | Re: multi-column index |
Дата | |
Msg-id | 25832.1111083332@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: multi-column index (Manfred Koizar <mkoi-pg@aon.at>) |
Ответы |
Re: multi-column index
|
Список | pgsql-performance |
Manfred Koizar <mkoi-pg@aon.at> writes: > On Wed, 16 Mar 2005 22:19:13 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> calculate the correlation explicitly for each index > May be it's time to revisit an old proposal that has failed to catch > anybody's attention during the 7.4 beta period: > http://archives.postgresql.org/pgsql-hackers/2003-08/msg00937.php > I'm not sure I'd store index correlation in a separate table today. > You've invented something better for functional index statistics, AFAIR. Well, the original motivation for calculating correlations on columns was that historically, you didn't need to re-ANALYZE after creating an index: the stats on the base table were already in place. So the idea was to have the correlations already available whether or not the index existed. This works fine for plain indexes on single columns ;-). We didn't realize (or at least I didn't) how poorly the per-column stats apply to multi-column indexes. I am coming around to the view that we really do need to calculate index-specific correlation numbers, and that probably does need a special table ... or maybe better, add a column to pg_index. The column in pg_statistic is useless and should be removed, because there isn't any need for per-column correlation. Now, as to the actual mechanics of getting the numbers: the above link seems to imply reading the whole index in index order. Which is a hugely expensive proposition for a big index, especially one that's grown rather than been built recently --- the physical and logical orderings of the index will be different. (Hm, maybe we need a stat about the extent of disorder within the index itself?) We need a way to get the number from a small sample of pages. The idea I was toying with was to recalculate the index keys for the sample rows that ANALYZE already acquires, and then compare/sort those. This is moderately expensive CPU-wise though, and it's also not clear what "compare/sort" means for non-btree indexes. If we could get a correlation estimate by probing only a small fraction of the index pages, that would work, but in a disordered index I'm not sure how you figure out what you're looking at. regards, tom lane
В списке pgsql-performance по дате отправления: