Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H
От | Robert Haas |
---|---|
Тема | Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H |
Дата | |
Msg-id | CA+TgmoYtOZyfFp47KBUvL5+Q=RZJcHM+Lk7=rd6cvihfk36c5A@mail.gmail.com обсуждение исходный текст |
Ответ на | pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Ответы |
Re: pretty bad n_distinct estimate, causing HashAgg OOM
on TPC-H
|
Список | pgsql-hackers |
On Wed, Jun 17, 2015 at 1:52 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > I'm currently running some tests on a 3TB TPC-H data set, and I tripped over > a pretty bad n_distinct underestimate, causing OOM in HashAgg (which somehow > illustrates the importance of the memory-bounded hashagg patch Jeff Davis is > working on). Stupid question, but why not just override it using ALTER TABLE ... ALTER COLUMN ... SET (n_distinct = ...)? I think it's been discussed quite often on previous threads that you need to sample an awful lot of the table to get a good estimate for n_distinct. We could support that, but it would be expensive, and it would have to be done again every time the table is auto-analyzed. The above syntax supports nailing the estimate to either an exact value or a percentage of the table, and I'm not sure why that isn't good enough. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: