Re: [SQL] How to avoid "Out of memory" using aggregate functions?
От | Tom Lane |
---|---|
Тема | Re: [SQL] How to avoid "Out of memory" using aggregate functions? |
Дата | |
Msg-id | 29444.944177579@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | How to avoid "Out of memory" using aggregate functions? (Mark Dalphin <mdalphin@amgen.com>) |
Список | pgsql-sql |
Mark Dalphin <mdalphin@amgen.com> writes: > ... since all the elements requested > are aggregates, why do I run out of memory? The basic problem here is that Postgres' expression evaluator leaks memory when dealing with expressions that use pass-by-reference data types (which means practically any type other than bool, int2, int4). The memory is reclaimed at the end of the query ... but that doesn't help you if your allowable swap space fills with int8 values before you get to the end :-(. Fixing this is on the TODO list. I have made a start on it in current sources: aggregate functions applied to simple field references do not leak memory anymore. That won't help your query as written: SELECT min(seqlength::int8) ... since cast-to-int8 is a runtime type conversion expression and the result of that expression evaluation will be leaked. But perhaps you can store seqlength as an int8 field to begin with, and skip the conversion. If so, and if you're handy with C, you might want to look at src/backend/executor/nodeAgg.c in a current snapshot and see whether it's practical to back-patch the memory-releasing logic into 6.5 ... or just run the current snapshot if you're feeling adventurous ... Of course the whole problem only comes up because avg() and sum() use accumulators of the same width as the source data type, so they are uncomfortably prone to overflow on large tables. They really should be modified to use more reasonable choices of accumulation data type; probably float8 for all datatypes except int8 and numeric, and numeric for those two would work OK. This wasn't practical in prior releases because a float8 accumulator would leak memory, but I think it would be a good change to make for 7.0. Anyway, to get back to your immediate problem, what you probably want to do while using 6.5 is tweak the query to minimize the problem as much as you can. First off, I see no reason to cast the inputs of count(), min(), or max() to int8; leaving those inputs as the int4 field value should work just as well and not leak memory. You don't have much choice for either avg() or sum(), but perhaps not trying to evaluate both in one query will bring the memory usage down to where you can live with it. If not, the only answer I can see is to do sums() over subsets of the table and save the results to be added and averaged later (hack hack ...) regards, tom lane
В списке pgsql-sql по дате отправления: