Re: query produces 1 GB temp file
От | Dirk.Lutzebaeck@t-online.de (Dirk Lutzebaeck) |
---|---|
Тема | Re: query produces 1 GB temp file |
Дата | |
Msg-id | 4205228C.3060601@aeccom.com обсуждение исходный текст |
Ответ на | Re: query produces 1 GB temp file (John A Meinel <john@arbash-meinel.com>) |
Список | pgsql-performance |
Hi John, thanks very much for your analysis. I'll probably need to reorganize some things. Regards, Dirk John A Meinel wrote: > Dirk Lutzebaeck wrote: > >> Hi, >> >> here is a query which produces over 1G temp file in pgsql_tmp. This >> is on pgsql 7.4.2, RHEL 3.0, XEON MP machine with 32GB RAM, 300MB >> sort_mem and 320MB shared_mem. >> >> Below is the query and results for EXPLAIN and EXPLAIN ANALYZE. All >> tables have been analyzed before. >> >> Can some please explain why the temp file is so huge? I understand >> there are a lot of rows. >> >> Thanks in advance, >> >> Dirk >> >> > ... > >> -> Nested Loop (cost=0.00..8346.73 rows=3 width=1361) >> (actual time=34.104..18016.005 rows=703677 loops=1) >> >> > Well, there is this particular query where it thinks there will only > be 3 rows, but in fact there are 703,677 of them. And the previous line: > >> -> Sort (cost=8346.75..8346.76 rows=3 width=1361) (actual >> time=75357.448..75499.263 rows=22439 loops=1) >> >> > Seem to indicate that after sorting you still have 22,439 rows, which > then gets pared down again down to 1000. > > I'm assuming that the sort you are trying to do is extremely > expensive. You are sorting 700k rows, which takes up too much memory > (1GB), which forces it to create a temporary table, and write it out > to disk. > > I didn't analyze it a lot, but you might get a lot better performance > from doing a subselect, rather than the query you wrote. > > You are joining 4 tables (bi, en, df AS ft, es) I don't know which > tables are what size. In the end, though, you don't really care about > the en table or es tables (they aren't in your output). > > So maybe one of you subselects could be: > > where bi.en = (select en from es where es.co = bi.co and > es.spec=122293729); > > I'm pretty sure the reason you need 1GB of temp space is because at > one point you have 700k rows. Is it possible to rewrite the query so > that it does more filtering earlier? Your distinct criteria seems to > filter it down to 20k rows. So maybe it's possible to do some sort of > a distinct in part of the subselect, before you start joining against > other tables. > > If you have that much redundancy, you might also need to think of > doing a different normalization. > > Just some thoughts. > > Also, I thought using the "oid" column wasn't really recommended, > since in *high* volume databases they aren't even guaranteed to be > unique. (I think it is a 32-bit number that rolls over.) Also on a > database dump and restore, they don't stay the same, unless you take a > lot of extra care that they are included in both the dump and the > restore. I believe it is better to create your own "id" per table (say > SERIAL or BIGSERIAL). > > John > =:-> >
В списке pgsql-performance по дате отправления: