Re: Default setting for enable_hashagg_disk
От | Jeff Davis |
---|---|
Тема | Re: Default setting for enable_hashagg_disk |
Дата | |
Msg-id | d5af7930265b8c4bbda5381364d7e21955597538.camel@j-davis.com обсуждение исходный текст |
Ответ на | Re: Default setting for enable_hashagg_disk (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: Default setting for enable_hashagg_disk
|
Список | pgsql-hackers |
On Sat, 2020-07-25 at 11:05 -0700, Peter Geoghegan wrote: > What worries me a bit is the sharp discontinuities when spilling with > significantly less work_mem than the "optimal" amount. For example, > with Tomas' TPC-H query (against my smaller TPC-H dataset), I find > that setting work_mem to 6MB looks like this: ... > Planned Partitions: 128 Peak Memory Usage: 6161kB Disk > Usage: 2478080kB HashAgg Batches: 128 ... > Planned Partitions: 128 Peak Memory Usage: 5393kB Disk > Usage: 2482152kB HashAgg Batches: 11456 ... > My guess that this is because the > recursive hash aggregation misbehaves in a self-similar fashion once > a > certain tipping point has been reached. It looks like it might be fairly easy to use HyperLogLog as an estimator for the recursive step. That should reduce the overpartitioning, which I believe is the cause of this discontinuity. It's not clear to me that overpartitioning is a real problem in this case -- but I think the fact that it's causing confusion is enough reason to see if we can fix it. Regards, Jeff Davis
В списке pgsql-hackers по дате отправления: