Re: slow group by query
От | Ellen Cyran |
---|---|
Тема | Re: slow group by query |
Дата | |
Msg-id | 5.1.1.6.0.20021119124548.01afcb88@wolf.csuohio.edu обсуждение исходный текст |
Ответ на | Re: slow group by query (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-sql |
That's a whole lot faster. The query on 40 msa_codes that took 7 minutes, now only takes 10 seconds. Thanks a lot. At 11:54 AM 11/19/2002 -0500, Tom Lane wrote: >Ellen Cyran <ellen@urban.csuohio.edu> writes: > > Here is the explain analyze: > > > Group (cost=637.18..696.52 rows=593 width=22) (actual > time=982.67..67581.85 rows=435 loops=1) > > -> Sort (cost=637.18..637.18 rows=5934 width=22) (actual > time=833.27..844.78 rows=6571 loops=1) > >Well, we don't have to read any further than that to see that all the >time is being spent in the final Group step --- and since grouping 6500 >rows is surely not taking very long, the true cost must be in the >evaluation of the SELECT's output targetlist (which the estimated costs >ignore, since the planner can't do much to change it). In other words, >what's hurting you are those subselects in the SELECT list. You're >doing two thousand separate subselects (435*5) --- evidently they take >about 30 msec apiece, which isn't that bad by itself, but it adds up. > >What you've basically got here is what Joe Celko calls a "crosstab >by subqueries" (_SQL For Smarties_, section 23.6.3). You might want >to buy his book, which shows several other ways to do crosstab queries, >but the one that seems most directly useful is to GROUP BY and use >aggregate functions to extract the values you want in each crosstab >column: > >SELECT msa_code, sic, own, ind_div, >max(case when year = '1975' then emp else null end) as emp1975, >max(case when year = '1976' then emp else null end) as emp1976, >max(case when year = '1977' then emp else null end) as emp1977, >max(case when year = '1978' then emp else null end) as emp1978, >max(case when year = '1990' then emp else null end) as emp1990 >FROM tbl_bls_msa >WHERE msa_code in ('1680','1640','0040','0120','0080') >GROUP BY msa_code, sic, ind_div, own ; > >If I understand your data schema correctly, only one row in a >msa_code, sic, ind_div, own group will have a particular year >value, so the case/max structure extracts that value, or gives >NULL if there's no such row. (MIN would have worked as well; >with a numeric field you could use SUM too.) > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 4: Don't 'kill -9' the postmaster
В списке pgsql-sql по дате отправления: