How to avoid "Out of memory" using aggregate functions?
От | Mark Dalphin |
---|---|
Тема | How to avoid "Out of memory" using aggregate functions? |
Дата | |
Msg-id | 3846C8D3.DB066E78@amgen.com обсуждение исходный текст |
Ответы |
Re: [SQL] How to avoid "Out of memory" using aggregate functions?
|
Список | pgsql-sql |
Hi, I am working with a simple table similar to the following: CREATE TABLE Seq ( SeqID int8 PRIMARY KEY, FileID int2 NOT NULL, FileOffset int8 NOT NULL, SeqLength int4, FOREIGNKEY (FileID) REFERENCES File (FileID) ); CREATE INDEX Seq_SeqLength ON Seq (SeqLength); The table contains facts about a DNA sequence (eg its length) and how to access that sequence from an ordinary Unix file (FileID and FileOffset) where it is concatonated with all the other DNA sequences. Currently the TABLE Seq contains 4,586,379 (times 2) rows and the overall size of the Unix file is 3,660,117,293 bytes (ie I expect sum(SeqLength) to be in the region of 3.6 Gbytes). I want to use the PostgreSQL aggregate functions to learn more about the sequences. The following query works well and quickly (I need to cast as int8 to avoid overflow of the sum() and perhaps for avg()): SELECT count(seqlength::int8), min(seqlength::int8), max(seqlength::int8), avg(seqlength::int8), sum(seqlength::int8) FROM Seq WHERE FileID=4 AND SeqLength>1000; An EXPLAIN of the above query returns the following: NOTICE: QUERY PLAN: Aggregate (cost=202000.62 rows=918779 width=4) -> Index Scan using seq_seqlength on seq (cost=202000.62 rows=918779 width=4) EXPLAIN If I remove the SeqLength constraint, the query takes forever to execute and then crashes with an out-of-memory (on my SGI, I have 256 Mbyte RAM and equal sized swap space; I run out of swap space during the query which makes a very unpleasent noise). EXPLAIN SELECT count(seqlength::int8), min(seqlength::int8), max(seqlength::int8), avg(seqlength::int8), sum(seqlength::int8) FROM Seq WHERE FileID=4; NOTICE: QUERY PLAN: Aggregate (cost=397916.69 rows=2756337 width=4) -> Seq Scan on seq (cost=397916.69 rows=2756337 width=4) EXPLAIN I do not understand how to interpret this Query Plan, except that I am doing a sequential scan of the DB (hence slow), but, since all the elements requested are aggregates, why do I run out of memory? I am not trying to load the whole table into memory at once, am I? Suggestions for how to make this query run would be welcome; my database is rapidly growing larger! Thanks, Mark -- Mark Dalphin email: mdalphin@amgen.com Mail Stop: 29-2-A phone: +1-805-447-4951 (work) One Amgen Center Drive +1-805-375-0680 (home) Thousand Oaks, CA 91320 fax: +1-805-499-9955 (work)
В списке pgsql-sql по дате отправления: