Обсуждение: Hash aggregates blowing out memory
I've been having problems where a HashAggregate is used because of a bad
estimate of the distinct number of elements involved. In the following
example the total number of domain IDs is about 2/3 of the number of
rows, and it's estimated at about 1/15 of the actual value. This will
occasionally cause the generated query to use a HashAggregate, and this
runs the backend out of memory - it will use 700 or more meg before
failing.
The following was run -immediately- after a vacuum.
explain analyze select sum(count) as sumc,class,domain_id into temp
new_clicks from clicks,countries where date > (current_date - 20) and
clicks.country_id=countries.country_id group by domain_id,class;
GroupAggregate (cost=1136261.89..1183383.51 rows=191406 width=12)
(actual time=138375.935..163794.452 rows=3258152 loops=1)
-> Sort (cost=1136261.89..1147922.66 rows=4664311 width=12) (actual
time=138374.865..147308.343 rows=4514313 loops=1)
Sort Key: clicks.domain_id, countries."class"
-> Hash Join (cost=4.72..421864.06 rows=4664311 width=12)
(actual time=6837.405..66938.259 rows=4514313 loops=1)
Hash Cond: ("outer".country_id = "inner".country_id)
-> Seq Scan on clicks (cost=0.00..351894.67
rows=4664311 width=12) (actual time=6836.388..46865.490 rows=4514313
loops=1)
Filter: (date > (('now'::text)::date - 20))
-> Hash (cost=4.18..4.18 rows=218 width=8) (actual
time=0.946..0.946 rows=0 loops=1)
-> Seq Scan on countries (cost=0.00..4.18
rows=218 width=8) (actual time=0.011..0.516 rows=218 loops=1)
Total runtime: 175404.738 ms
(10 rows)
--
Mike Harding <mvh@ix.netcom.com>
Mike Harding <mvh@ix.netcom.com> writes:
> I've been having problems where a HashAggregate is used because of a bad
> estimate of the distinct number of elements involved.
If you're desperate, there's always enable_hashagg. Or reduce sort_mem
enough so that even the misestimate looks like it will exceed sort_mem.
In the long run it would be nice if HashAgg could spill to disk. We
were expecting to see a contribution of code along that line last year
(from the CMU/Berkeley database class) but it never showed up. The
performance implications might be a bit grim anyway :-(
regards, tom lane
Any way to adjust n_distinct to be more accurate? I don't think a 'disk spill' would be that bad, if you could re-sort the hash in place. If nothing else, if it could -fail- when it reaches the lower stratosphere, and re-start, it's faster than getting no result at all... sort of an auto disable of the hashagg. On Fri, 2005-02-25 at 16:55 -0500, Tom Lane wrote: > Mike Harding <mvh@ix.netcom.com> writes: > > I've been having problems where a HashAggregate is used because of a bad > > estimate of the distinct number of elements involved. > > If you're desperate, there's always enable_hashagg. Or reduce sort_mem > enough so that even the misestimate looks like it will exceed sort_mem. > > In the long run it would be nice if HashAgg could spill to disk. We > were expecting to see a contribution of code along that line last year > (from the CMU/Berkeley database class) but it never showed up. The > performance implications might be a bit grim anyway :-( > > regards, tom lane -- Mike Harding <mvh@ix.netcom.com>
Mike Harding <mvh@ix.netcom.com> writes:
> Any way to adjust n_distinct to be more accurate?
You could try increasing the statistics target for the relevant columns.
What does pg_stats show for the "numdistinct" estimates of the columns
you are grouping over, and does that have anything to do with reality?
regards, tom lane
Mike Harding <mvh@ix.netcom.com> writes: > The following was run -immediately- after a vacuum. You realize "vacuum" doesn't update the statistics, right? You have to do "analyze" or "vacuum analyze" for that. -- greg
Sorry, I should have said 'vacuum analyze verbose'... On Sat, 2005-02-26 at 00:45 -0500, Greg Stark wrote: > Mike Harding <mvh@ix.netcom.com> writes: > > > The following was run -immediately- after a vacuum. > > You realize "vacuum" doesn't update the statistics, right? > You have to do "analyze" or "vacuum analyze" for that. > > -- Mike Harding <mvh@ix.netcom.com>