Re: [HACKERS] multivariate statistics (v19)
| От | Tomas Vondra |
|---|---|
| Тема | Re: [HACKERS] multivariate statistics (v19) |
| Дата | |
| Msg-id | 6f7ff2aa-b2b8-dbde-b39b-a9099f615466@2ndquadrant.com обсуждение исходный текст |
| Ответ на | Re: [HACKERS] multivariate statistics (v19) (Dilip Kumar <dilipbalaut@gmail.com>) |
| Ответы |
Re: [HACKERS] multivariate statistics (v19)
|
| Список | pgsql-hackers |
On 01/04/2017 03:21 PM, Dilip Kumar wrote: > On Wed, Jan 4, 2017 at 8:05 AM, Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >> Attached is v22 of the patch series, rebased to current master and fixing >> the reported bug. I haven't made any other changes - the issues reported by >> Petr are mostly minor, so I've decided to wait a bit more for (hopefully) >> other reviews. > > v22 fixes the problem, I reported. In my test, I observed that group > by estimation is much better with ndistinct stat. > > Here is one example: > > postgres=# explain analyze select p_brand, p_type, p_size from part > group by p_brand, p_type, p_size; > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------- > HashAggregate (cost=37992.00..38992.00 rows=100000 width=36) (actual > time=953.359..1011.302 rows=186607 loops=1) > Group Key: p_brand, p_type, p_size > -> Seq Scan on part (cost=0.00..30492.00 rows=1000000 width=36) > (actual time=0.013..163.672 rows=1000000 loops=1) > Planning time: 0.194 ms > Execution time: 1020.776 ms > (5 rows) > > postgres=# CREATE STATISTICS s2 WITH (ndistinct) on (p_brand, p_type, > p_size) from part; > CREATE STATISTICS > postgres=# analyze part; > ANALYZE > postgres=# explain analyze select p_brand, p_type, p_size from part > group by p_brand, p_type, p_size; > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------- > HashAggregate (cost=37992.00..39622.46 rows=163046 width=36) (actual > time=935.162..992.944 rows=186607 loops=1) > Group Key: p_brand, p_type, p_size > -> Seq Scan on part (cost=0.00..30492.00 rows=1000000 width=36) > (actual time=0.013..156.746 rows=1000000 loops=1) > Planning time: 0.308 ms > Execution time: 1001.889 ms > > In above example, > Without MVStat-> estimated: 100000 Actual: 186607 > With MVStat-> estimated: 163046 Actual: 186607 > Thanks. Those plans match my experiments with the TPC-H data set, although I've been playing with the smallest scale (1GB). It's not very difficult to make the estimation error arbitrary large, e.g. by using perfectly correlated (identical) columns. regard -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: