How to improve speed of 3 table join &group (HUGE tables)
От | John Major |
---|---|
Тема | How to improve speed of 3 table join &group (HUGE tables) |
Дата | |
Msg-id | 4717914D.1050704@cbio.mskcc.org обсуждение исходный текст |
Ответы |
Re: How to improve speed of 3 table join &group (HUGE tables)
Re: How to improve speed of 3 table join &group (HUGE tables) |
Список | pgsql-performance |
I am trying to join three quite large tables, and the query is unbearably slow(meaning I can't get results in more than a day of processing). I've tried the basic optimizations I understand, and nothing has improved the execute speed.... any help with this would be greatly appreciated The three tables are quite large: sequence_fragment = 4.5 million rows sequence_external_info = 10million rows sequence_alignment = 500 million rows The query I am attempting to run is this: select sf.library_id, fio.clip_type , count(distinct(sa.sequence_id)) from sequence_alignment sa, sequence_fragment sf, fragment_external_info fio where sf.seq_frag_id = fio.sequence_frag_id and sf.sequence_id = sa.sequence_id group by sf.library_id, fio.clip_type NOTES: ~there are indexes on all of the fields being joined (but not on library_id or clip_type ). ~Everything has been re-analyzed post index creation ~I've tried "set enable_seqscan=off" and set (join_table_order or something) = 1 The explain plan is as follows: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=1443436673.93..1480593403.29 rows=54 width=16) -> Sort (cost=1443436673.93..1452725856.10 rows=3715672868 width=16) Sort Key: sf.library_id, fio.clip_type -> Merge Join (cost=263624049.25..319410068.18 rows=3715672868 width=16) Merge Cond: (sf.sequence_id = sa.sequence_id) -> Sort (cost=38102888.77..38128373.54 rows=10193906 width=16) Sort Key: sf.sequence_id -> Hash Join (cost=5305576.14..36080036.76 rows=10193906 width=16) Hash Cond: (fio.sequence_frag_id = sf.seq_frag_id) -> Index Scan using frag_ext_info_seq_frag_id on fragment_external_info fio (cost=0.00..30450510.27 rows=10193906 width=12) -> Hash (cost=5223807.54..5223807.54 rows=4453728 width=12) -> Index Scan using seq_frag_seqid_ind on sequence_fragment sf (cost=0.00..5223807.54 rows=4453728 width=12) -> Sort (cost=225521160.48..226688766.88 rows=467042560 width=4) Sort Key: sa.sequence_id -> Seq Scan on sequence_alignment sa (cost=100000000.00..110379294.60 rows=467042560 width=4) 15 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 0/ms] Thanks in advance! John Major
В списке pgsql-performance по дате отправления: