Re: O(samplesize) tuple sampling, proof of concept

Поиск

Список

Период

Сортировка

От	Manfred Koizar
Тема	Re: O(samplesize) tuple sampling, proof of concept
Дата	5 апреля 2004 г. 22:25:55
Msg-id	s4j370l6ra56tvodcbg9baaf682q6cu3pn@email.aon.at обсуждение исходный текст
Ответ на	Re: O(samplesize) tuple sampling, proof of concept (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-patches

Дерево обсуждения

On Mon, 05 Apr 2004 15:37:07 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>I wouldn't bother with a GUC variable for the production patch.

Among other things the GUC variable will be thrown out for the final
version.

>> Once a block is selected for inspection, all tuples of this
>> block are accessed to get a good estimation of the live : dead row
>> ratio.
>
>Why are you bothering to compute the live:dead ratio?

That was basically bad wording.  It should have been "... to get a good
estimation of live rows per page."  Counting dead rows turned out to be
trivial, so I just did it and included the number in the debug messages.
Then it happened to be useful for method 2.

>> Because I was afraid that method 1 might be too expensive in terms of
>> CPU cycles, I implemented a small variation that skips tuples without
>> checking them for visibility; this is sampling_method 2.
>
>And?  Does it matter?

There's a clearly visible difference for mid-size relations.  I'm not
sure whether this can be attributed to visibility bit updating or other
noise-contributing factors.

Method 2 gives a row count estimation error between 10 and 17% where
method 1 is less than 1% off.  (My updates generated dead tuples at very
evenly distributed locations by using conditions like WHERE mod(twenty,
7) = 0).

>If that's as bad as it gets I think we are OK.  You should redo the test
>with larger sample size though (try stats target = 100) to see if the
>answer changes.

Will do.

>I find -u diffs close to unreadable for reviewing purposes.  Please
>submit diffs in -c format in future.

De gustibus non est disputandum :-)
Fortunately this patch wouldn't look much different.  There is just a
bunch of "+" lines.

>AFAICS the rows will *always* be sorted already, and so this qsort is an
>utter waste of cycles.  No?

I don't think so.  We get the *blocks* in the correct order.  But tuples
are still sampled by the Vitter method which starts to replace random
tuples after the pool is filled.

BTW, thanks for the paper!

Servus
 Manfred

В списке pgsql-patches по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: O(samplesize) tuple sampling, proof of concept