Re: multivariate statistics v14

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: multivariate statistics v14
Дата
Msg-id CAMkU=1zBTiYwDc8K=YmL1dmiZqXHPNbeX=svBGJCNTjaGtsiyQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: multivariate statistics v14  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: multivariate statistics v14  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
On Sun, Mar 20, 2016 at 4:34 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
>
> D'oh. Thanks for reporting. Attached is v16, hopefully fixing the few
> remaining whitespace issues.

Hi Tomas,

I'm trying out v16 against a common problem, where postgresql thinks
it is likely top stop early during a "order by (index express) limit
1" but it doesn't actually stop early due to cross-column
correlations.  But the multivariate statistics don't seem to help.  Am
I doing this wrong, or just expecting too much?


jjanes=# create table foo as select x, floor(x/(10000000/500))::int as
y  from generate_series(1,10000000) f(x);
jjanes=# create index on foo (x,y);
jjanes=# create index on foo (y,x);
jjanes=# create statistics jjj on foo (x,y) with (dependencies,histogram);
jjanes=# vacuum analyze ;


jjanes=# explain (analyze, timing off)  select x from foo where y
between 478 and 480 order by x limit 1;                                                   QUERY PLAN

-------------------------------------------------------------------------------------------------------------------Limit
(cost=0.43..4.92 rows=1 width=4) (actual rows=1 loops=1)  ->  Index Only Scan using foo_x_y_idx on foo
(cost=0.43..210156.55
rows=46812 width=4) (actual rows=1 loops=1)        Index Cond: ((y >= 478) AND (y <= 480))        Heap Fetches:
0Planningtime: 0.311 msExecution time: 478.917 ms
 

Here is walks up the index on x, until it meets the first row meeting
the qualification on y. It thinks it will get to stop early and be
very fast, but it doesn't.

If I add an dummy addition to the ORDER BY, to force it not to talk
the index, I get a plan which uses the other index and is actually
much faster, but is planned to be several hundred times slower:


jjanes=# explain (analyze, timing off)  select x from foo where y
between 478 and 480 order by x+0 limit 1;                                                       QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------Limit
(cost=1803.77..1803.77 rows=1 width=8) (actual rows=1 loops=1)  ->  Sort  (cost=1803.77..1920.80 rows=46812 width=8)
(actualrows=1 loops=1)        Sort Key: ((x + 0))        Sort Method: top-N heapsort  Memory: 25kB        ->  Index
OnlyScan using foo_y_x_idx on foo
 
(cost=0.43..1569.70 rows=46812 width=8) (actual rows=60000 loops=1)              Index Cond: ((y >= 478) AND (y <=
480))             Heap Fetches: 0Planning time: 0.175 msExecution time: 20.264 ms
 

(I use the "timing off" option, because without it the second plan
spends most of its time calling "gettimeofday")

Cheers,

Jeff



В списке pgsql-hackers по дате отправления:

Предыдущее
От: James Sewell
Дата:
Сообщение: Re: Parallel Aggregate
Следующее
От: Ashutosh Bapat
Дата:
Сообщение: Re: Odd system-column handling in postgres_fdw join pushdown patch