Re: multivariate statistics v14

Поиск

Список

Период

Сортировка

От	Jeff Janes
Тема	Re: multivariate statistics v14
Дата	22 марта 2016 г. 05:53:19
Msg-id	CAMkU=1zBTiYwDc8K=YmL1dmiZqXHPNbeX=svBGJCNTjaGtsiyQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: multivariate statistics v14 (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы	Re: multivariate statistics v14
Список	pgsql-hackers

Дерево обсуждения

On Sun, Mar 20, 2016 at 4:34 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
>
> D'oh. Thanks for reporting. Attached is v16, hopefully fixing the few
> remaining whitespace issues.

Hi Tomas,

I'm trying out v16 against a common problem, where postgresql thinks
it is likely top stop early during a "order by (index express) limit
1" but it doesn't actually stop early due to cross-column
correlations.  But the multivariate statistics don't seem to help.  Am
I doing this wrong, or just expecting too much?


jjanes=# create table foo as select x, floor(x/(10000000/500))::int as
y  from generate_series(1,10000000) f(x);
jjanes=# create index on foo (x,y);
jjanes=# create index on foo (y,x);
jjanes=# create statistics jjj on foo (x,y) with (dependencies,histogram);
jjanes=# vacuum analyze ;


jjanes=# explain (analyze, timing off)  select x from foo where y
between 478 and 480 order by x limit 1;                                                   QUERY PLAN

-------------------------------------------------------------------------------------------------------------------Limit
(cost=0.43..4.92 rows=1 width=4) (actual rows=1 loops=1)  ->  Index Only Scan using foo_x_y_idx on foo
(cost=0.43..210156.55
rows=46812 width=4) (actual rows=1 loops=1)        Index Cond: ((y >= 478) AND (y <= 480))        Heap Fetches:
0Planningtime: 0.311 msExecution time: 478.917 ms
 

Here is walks up the index on x, until it meets the first row meeting
the qualification on y. It thinks it will get to stop early and be
very fast, but it doesn't.

If I add an dummy addition to the ORDER BY, to force it not to talk
the index, I get a plan which uses the other index and is actually
much faster, but is planned to be several hundred times slower:


jjanes=# explain (analyze, timing off)  select x from foo where y
between 478 and 480 order by x+0 limit 1;                                                       QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------Limit
(cost=1803.77..1803.77 rows=1 width=8) (actual rows=1 loops=1)  ->  Sort  (cost=1803.77..1920.80 rows=46812 width=8)
(actualrows=1 loops=1)        Sort Key: ((x + 0))        Sort Method: top-N heapsort  Memory: 25kB        ->  Index
OnlyScan using foo_y_x_idx on foo
 
(cost=0.43..1569.70 rows=46812 width=8) (actual rows=60000 loops=1)              Index Cond: ((y >= 478) AND (y <=
480))             Heap Fetches: 0Planning time: 0.175 msExecution time: 20.264 ms
 

(I use the "timing off" option, because without it the second plan
spends most of its time calling "gettimeofday")

Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: multivariate statistics v14