Re: multivariate statistics v14

Поиск
Список
Период
Сортировка
От Tatsuo Ishii
Тема Re: multivariate statistics v14
Дата
Msg-id 20160323.142004.624106180171245380.t-ishii@sraoss.co.jp
обсуждение исходный текст
Ответ на Re: multivariate statistics v14  (Tatsuo Ishii <ishii@postgresql.org>)
Ответы Re: multivariate statistics v14  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
>> I am now looking into the create statistics doc to see if the example
>> appearing in it is working. I will get back if I find any.

I have the ref doc: CREATE STATISTICS

There are nice examples how the multivariate statistics gives better
row number estimation. So I gave them a try.

"Create table t1 with two functionally dependent columns,i.e. knowledge of a value in the first column is sufficient
fordeterminingthe value in the other column" The example creates table"t1", then populates it using generate_series.
AfterCREATESTATISTICS, ANALYZE and EXPLAIN. I expected the EXPLAIN demonstrateshow result rows estimation is enhanced
byusing the multivariatestatistics.
 

Here is the EXPLAIN output using the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1);                                           QUERY PLAN
                                    
 
---------------------------------------------------------------------------------------------------Seq Scan on t1
(cost=0.00..19425.00rows=98 width=8) (actual time=76.876..76.876 rows=0 loops=1)  Filter: ((a = 1) AND (b = 1))  Rows
Removedby Filter: 1000000Planning time: 0.146 msExecution time: 76.896 ms
 
(5 rows)

Here is the EXPLAIN output without the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1);                                           QUERY PLAN
                                   
 
--------------------------------------------------------------------------------------------------Seq Scan on t1
(cost=0.00..19425.00rows=1 width=8) (actual time=78.867..78.867 rows=0 loops=1)  Filter: ((a = 1) AND (b = 1))  Rows
Removedby Filter: 1000000Planning time: 0.102 msExecution time: 78.885 ms
 
(5 rows)

It seems the row numbers estimation (98) using the multivariate
statistics is actually *worse* than the one (1) not using the
statistics because the actual row number is 0.

Next example (using table "t2") is much better than the case using t1.

Here is the EXPLAIN output using the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1);                                              QUERY PLAN
                                         
 
--------------------------------------------------------------------------------------------------------Seq Scan on t2
(cost=0.00..19425.00rows=9633 width=8) (actual time=0.012..75.350 rows=10000 loops=1)  Filter: ((a = 1) AND (b = 1))
RowsRemoved by Filter: 990000Planning time: 0.107 msExecution time: 75.680 ms
 
(5 rows)

Here is the EXPLAIN output without the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1);                                             QUERY PLAN
                                       
 
------------------------------------------------------------------------------------------------------Seq Scan on t2
(cost=0.00..19425.00rows=91 width=8) (actual time=0.008..76.614 rows=10000 loops=1)  Filter: ((a = 1) AND (b = 1))
RowsRemoved by Filter: 990000Planning time: 0.067 msExecution time: 76.935 ms
 
(5 rows)

This time it seems the row numbers estimation (9633) using the
multivariate statistics is much better than the one (91) not using the
statistics because the actual row number is 10000.

The last example (using table "t3") seems no effect by multivariate statistics.

Here is the EXPLAIN output using the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t3 WHERE (a < 500) AND (b > 500);                                               QUERY
PLAN                                                
 
-----------------------------------------------------------------------------------------------------------Seq Scan on
t3 (cost=0.00..20407.65 rows=111123 width=16) (actual time=0.154..132.509 rows=6002 loops=1)  Filter: ((a <
'500'::doubleprecision) AND (b > '500'::double precision))  Rows Removed by Filter: 993998Planning time: 0.080
msExecutiontime: 132.735 ms
 
(5 rows)

EXPLAIN ANALYZE SELECT * FROM t3 WHERE (a < 400) AND (b > 600);                                               QUERY
PLAN                                               
 
----------------------------------------------------------------------------------------------------------Seq Scan on
t3 (cost=0.00..20407.65 rows=111123 width=16) (actual time=110.518..110.518 rows=0 loops=1)  Filter: ((a <
'400'::doubleprecision) AND (b > '600'::double precision))  Rows Removed by Filter: 1000000Planning time: 0.052
msExecutiontime: 110.531 ms
 
(5 rows)

Here is the EXPLAIN output without the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t3 WHERE (a < 500) AND (b > 500);                                               QUERY
PLAN                                                
 
-----------------------------------------------------------------------------------------------------------Seq Scan on
t3 (cost=0.00..20407.65 rows=111123 width=16) (actual time=0.149..129.718 rows=5999 loops=1)  Filter: ((a <
'500'::doubleprecision) AND (b > '500'::double precision))  Rows Removed by Filter: 994001Planning time: 0.058
msExecutiontime: 129.893 ms
 
(5 rows)

EXPLAIN ANALYZE SELECT * FROM t3 WHERE (a < 400) AND (b > 600);                                               QUERY
PLAN                                               
 
----------------------------------------------------------------------------------------------------------Seq Scan on
t3 (cost=0.00..20407.65 rows=111123 width=16) (actual time=108.015..108.015 rows=0 loops=1)  Filter: ((a <
'400'::doubleprecision) AND (b > '600'::double precision))  Rows Removed by Filter: 1000000Planning time: 0.037
msExecutiontime: 108.027 ms
 
(5 rows)

This time it seems the row numbers estimation (111123) using the
multivariate statistics is same as same as the one (111123) not
using the statistics because the actual row number is 5999 or 0.

In summary, the only case which shows the effect of the multivariate
statistics is the "t2" case. So I don't see why other examples are
shown in the manual. Am I missing something?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Re: Timeline following for logical slots
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: 2PC support for pglogical