Re: improving GROUP BY estimation
От | Mark Dilger |
---|---|
Тема | Re: improving GROUP BY estimation |
Дата | |
Msg-id | 03A3B58D-6A2D-44FE-9B03-E21948CFCBAA@gmail.com обсуждение исходный текст |
Ответ на | improving GROUP BY estimation (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Ответы |
Re: improving GROUP BY estimation
Re: improving GROUP BY estimation |
Список | pgsql-hackers |
> On Feb 23, 2016, at 5:12 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > <snip> > > So much better. Clearly, there are cases where this will over-estimate the cardinality - for example when the values aresomehow correlated. > I applied your patch, which caused a few regression tests to fail. Attached is a patch that includes the necessary changes to the expected test results. It is not hard to write test cases where your patched version overestimates the number of rows by a very similar factor as the old code underestimates them. My very first test, which was not specifically designed to demonstrate this, happens to be one such example: CREATE TABLE t (a INT, b int); INSERT INTO t SELECT sqrt(gs)::int, gs FROM generate_series(1,10000000) gs; ANALYZE t; EXPLAIN SELECT a FROM t WHERE b < 1000 GROUP BY a; QUERY PLAN --------------------------------------------------------------- HashAggregate (cost=169250.21..169258.71 rows=850 width=4) Group Key: a -> Seq Scan on t (cost=0.00..169247.71 rows=1000 width=4) Filter: (b < 1000) (4 rows) SELECT COUNT(*) FROM (SELECT a FROM t WHERE b < 1000 GROUP BY a) AS ss; count ------- 32 (1 row) So, it estimates 850 rows where only 32 are returned . Without applying your patch, it estimates just 1 row where 32 are returned. That's an overestimate of roughly 26 times, rather than an underestimate of 32 times. As a patch review, I'd say that your patch does what you claim it does, and it applies cleanly, and passes the regression tests with my included modifications. I think there needs to be some discussion on the list about whether the patch is a good idea. Mark Dilger
Вложения
В списке pgsql-hackers по дате отправления: