Re: Better estimates of index correlation

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: Better estimates of index correlation
Дата
Msg-id AANLkTikXkDNtwd4vrHMV0zemUP-H+dPH=+OQQWUARSdP@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Better estimates of index correlation  (Josh Berkus <josh@agliodbs.com>)
Ответы Re: Better estimates of index correlation  (Josh Berkus <josh@agliodbs.com>)
Список pgsql-hackers
On Tue, Mar 15, 2011 at 12:27 AM, Josh Berkus <josh@agliodbs.com> wrote:
> Note that if this requires VACUUM rather than ANALYZE, it introduces a
> problem for data warehousing users, who can go years between vacuums of
> their largest tables.

I don't understand, are they going years between vacuums because their
data is static? In which case the index correlation won't change. Or
is it append-only, in which case I suspect the newly appended data is
likely to have the same correlation as the old data. But is there
anything stopping us from doing some sort of ANALYZE-style sample of
the index pages as well?

I think the bigger problems here are that a) correlation isn't
actually a useful statistic for estimating random seeks and b) I'm not
sure what counting back-transitions has to do with correlation.

If we're lucky it may be that counting back-transitions is a more
useful stat than correlation anyways. It does seem to have more to do
with random seeks than correlation. It might need some refinement
though or some other metrics to go along with it to get a real basis
for an estimate of random seeks though. I'm wondering about how far
the back-transitions and forward transitions actually go. If you're
skipping every other block that's twice as much i/o as reading every
block, but if you skip n blocks where n >
random_page_cost/seq_page_cost then that's one random read per block.
If you skip 1 block backwards to a block you've already read then
that's free, but if you skip backwards to a block that isn't recently
referenced that's a random seek.

There are also niggling questions about taking statistics based on
tuples that are already dead or were never committed.

-- 
greg


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: dependency between numbers keywords and parser speed
Следующее
От: Josh Berkus
Дата:
Сообщение: Re: Better estimates of index correlation