Re: Cross-column statistics revisited
От | Greg Stark |
---|---|
Тема | Re: Cross-column statistics revisited |
Дата | |
Msg-id | BBCB1CB3-976C-4572-B1D8-E0F4D0E3345C@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: Cross-column statistics revisited (Ron Mayer <rm_pg@cheapcomplexdevices.com>) |
Список | pgsql-hackers |
This is yet another issue entirely. This is about estimating how much io will be random io if we do an index order scan. Correlation is a passable tool for this but we might be able to do better. But it has nothing to do with the cross-column stats problem. greg On 17 Oct 2008, at 01:29 AM, Ron Mayer <rm_pg@cheapcomplexdevices.com> wrote: > Josh Berkus wrote: >>> Yes, or to phrase that another way: What kinds of queries are being >>> poorly optimized now and why? >> Well, we have two different correlation problems. One is the >> problem of dependant correlation, such as the 1.0 correlation of >> ZIP and CITY fields as a common problem. This could in fact be >> fixed, I believe, via a linear math calculation based on the >> sampled level of correlation, assuming we have enough samples. And >> it's really only an issue if the correlation is >>> 0.5. > > I'd note that this can be an issue even without 2 columns involved. > > I've seen a number of tables where the data is loaded in batches > so similar-values from a batch tend to be packed into relatively few > pages. > > Thinks a database for a retailer that nightly aggregates data from > each of many stores. Each incoming batch inserts the store's data > into tightly packed disk pages where most all rows on the page are for > that store. But those pages are interspersed with pages from other > stores. > > I think I like the ideas Greg Stark had a couple years ago: > http://archives.postgresql.org/pgsql-hackers/2006-09/msg01040.php > "...sort the sampled values by value > and count up the average number of distinct blocks per value.... Or > perhaps we need a second histogram where the quantities are of > distinct pages rather than total records.... We might also need a > separate "average number of n-block spans per value" > since those seem to me to lead more directly to values like "blocks > that need to be read". > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
В списке pgsql-hackers по дате отправления: