Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple se lect)
От | Philip Warner |
---|---|
Тема | Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple se lect) |
Дата | |
Msg-id | 3.0.5.32.19990729123705.00b59380@mail.rhyme.com.au обсуждение исходный текст |
Ответ на | Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple se lect) (Bruce Momjian <maillist@candle.pha.pa.us>) |
Ответы |
Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple se lect)
|
Список | pgsql-hackers |
At 20:21 28/07/99 -0400, you wrote: > >> I wonder whether it would help to add even more data to pg_statistic. >> For example, suppose we store the fraction of the columns that are NULL, >> plus the most frequently occurring *non null* value, plus the fraction >> of the columns that are that value. This would allow us to be very >> smart about columns in which "no data" is represented by NULL (as a good >> DB designer would do): > >That would be nice. > I know I've mentioned this before, but can't the designer of the query be given some influence over optimizer index choices? We can circle around the problem of understanding the demographics of a table, but without row-by-row analysis, you'll *never* get the complete and accurate view that is needed to cater for all cases. OTOH, a query designer often knows that a particular query will only be run to find 'exceptions' (ie. non-nulls when 95% are nulls), or to find 'small' ranges. IMO, when a DBA is in a position to help the optimizer, they *should* allowed to. PG *already* has something like this in the form of partial indexes: you can view the query that is associated with the index as a 'hint' as to when that index should be used. All I'm asking is for queries, not indexes, to specify when an index is used. This will not in any way replace the optimizer, but it will give users the ability deal with pathological cases. In terms of the statistics collected, it *may* also be worth doing some rudimentary analysis on the data to see it is conforms to any common distribution (or sum of distributions), and if it does, save that information. eg. the optimizer will do pretty well if it *knows* the data is in a normal distribution, with a mean of 972 and a stdev of 70! Of course, you must be sure that it *is* a normal distribution to start with. FWIW, statisticians often seem worried about three values: the mean, median and mode. I don't really know which is which, but they are: o The average of all values o The average of the min and max value o The most common value. Someone who knows a lot more about this stuff than me can probably tell us how these values will affect the trust we place in the index statistics. Someone on this list must be able to give us some insight??? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: +61-03-5367 7422 | _________ \ Fax: +61-03-5367 7430 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
В списке pgsql-hackers по дате отправления: