Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H
От | Jeff Janes |
---|---|
Тема | Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H |
Дата | |
Msg-id | CAMkU=1xWa5GvE3tsrcUsWCQaZ_mSjLtKGEAEbee5VbLFa47SCA@mail.gmail.com обсуждение исходный текст |
Ответ на | pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Ответы |
Re: pretty bad n_distinct estimate, causing HashAgg OOM
on TPC-H
|
Список | pgsql-hackers |
On Wed, Jun 17, 2015 at 10:52 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,
I'm currently running some tests on a 3TB TPC-H data set, and I tripped over a pretty bad n_distinct underestimate, causing OOM in HashAgg (which somehow illustrates the importance of the memory-bounded hashagg patch Jeff Davis is working on).
The problem is Q18, particularly this simple subquery:
select l_orderkey
from lineitem
group by l_orderkey
having sum(l_quantity) > 313;
which is planned like this:
QUERY PLAN
---------------------------------------------------------------------------------
HashAggregate (cost=598510163.92..598515393.93 rows=418401 width=12)
Group Key: l_orderkey
Filter: (sum(l_quantity) > '313'::double precision)
-> Seq Scan on lineitem (cost=0.00..508509923.28 rows=18000048128 width=12)
(4 rows)
but sadly, in reality the l_orderkey cardinality looks like this:
tpch=# select count(distinct l_orderkey) from lineitem;
count
------------
4500000000
(1 row)
That's a helluva difference - not the usual one or two orders of magnitude, but 10000x underestimate.
Is the row order in the table correlated with the value l_orderkey?
Could you create copy of the table ordered at random, and see if it exhibits the same estimation issue?
Cheers,
Jeff
В списке pgsql-hackers по дате отправления: