Re: cross column correlation revisted
От | marcin mank |
---|---|
Тема | Re: cross column correlation revisted |
Дата | |
Msg-id | AANLkTimWL5LeO4Iioj-i4HkBZgIWtX6F588n7KdFQoem@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: cross column correlation revisted (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
On Wed, Jul 14, 2010 at 5:13 PM, Robert Haas <robertmhaas@gmail.com> wrote: > 2010/7/14 Tom Lane <tgl@sss.pgh.pa.us>: >> If the combination of columns is actually interesting, there might well >> be an index in place, or the DBA might be willing to create it. > > Indexes aren't free, though, nor even close to it. > > Still, I think we should figure out the underlying mechanism first and > then design the interface afterwards. One idea I had was a way to say > "compute the MCVs and histogram buckets for this table WHERE > <predicate>". If you can prove predicate for a particular query, you > can use the more refined statistics in place of the full-table > statistics. This is fine for the breast cancer case, but not so > useful for the zip code/street name case (which seems to be the really > tough one). > One way of dealing with the zipcode problem is estimating NDST = count(distinct row(zipcode, street)) - i.e. multi-column ndistinct. Then the planner doesn`t have to assume that the selectivity of a equality condition involving both zipcode and city is a multiple of the respective selectivities. As a first cut it can assume that it will get count(*) / NDST rows, but there are ways to improve it. Greetings Marcin Mańk
В списке pgsql-hackers по дате отправления: