Re: Cross-column statistics revisited

Поиск

Список

Период

Сортировка

От	Greg Stark
Тема	Re: Cross-column statistics revisited
Дата	16 октября 2008 г. 21:01:21
Msg-id	BBCB1CB3-976C-4572-B1D8-E0F4D0E3345C@enterprisedb.com обсуждение исходный текст
Ответ на	Re: Cross-column statistics revisited (Ron Mayer <rm_pg@cheapcomplexdevices.com>)
Список	pgsql-hackers

Дерево обсуждения

This is yet another issue entirely. This is about estimating how much  
io will be random io if we do an index order scan.  Correlation is a  
passable tool for this but we might be able to do better.

But it has nothing to do with the cross-column stats problem.

greg

On 17 Oct 2008, at 01:29 AM, Ron Mayer <rm_pg@cheapcomplexdevices.com>  
wrote:

> Josh Berkus wrote:
>>> Yes, or to phrase that another way: What kinds of queries are being
>>> poorly optimized now and why?
>> Well, we have two different correlation problems.  One is the  
>> problem of dependant correlation, such as the 1.0 correlation of  
>> ZIP and CITY fields as a common problem.  This could in fact be  
>> fixed, I believe, via a linear math calculation based on the  
>> sampled level of correlation, assuming we have enough samples.  And  
>> it's really only an issue if the correlation is
>>> 0.5.
>
> I'd note that this can be an issue even without 2 columns involved.
>
> I've seen a number of tables where the data is loaded in batches
> so similar-values from a batch tend to be packed into relatively few  
> pages.
>
> Thinks a database for a retailer that nightly aggregates data from
> each of many stores.  Each incoming batch inserts the store's data
> into tightly packed disk pages where most all rows on the page are for
> that store.   But those pages are interspersed with pages from other
> stores.
>
> I think I like the ideas Greg Stark had a couple years ago:
> http://archives.postgresql.org/pgsql-hackers/2006-09/msg01040.php
>   "...sort the sampled values by value
>   and count up the average number of distinct blocks per value.... Or
>   perhaps we need a second histogram where the quantities are of
>   distinct pages rather than total records.... We might also need a
>   separate "average number of n-block spans per value"
> since those seem to me to lead more directly to values like "blocks
> that need to be read".
>
>
>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Cross-column statistics revisited