Re: cost and actual time
От | Tom Lane |
---|---|
Тема | Re: cost and actual time |
Дата | |
Msg-id | 9093.1045498916@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: cost and actual time (Chantal Ackermann <chantal.ackermann@biomax.de>) |
Список | pgsql-performance |
Chantal Ackermann <chantal.ackermann@biomax.de> writes: > the gene_id for 'igg' occurres 110637 times in gene_occurrences, it is > the most frequent. I think the problem here is that the planner doesn't know that (and probably can't without some kind of cross-table statistics apparatus). It's generating a plan based on the average frequency of gene_ids, which is a loser for this outlier. Probably the most convenient way to do better is to structure things so that the reduction from gene name to gene_id is done before the planner starts to develop a plan. Instead of joining to gene, consider this: create function get_gene_id (text) returns int as -- adjust types as needed 'select gene_id from gene where gene_name = $1' language sql immutable strict; -- in 7.2, instead say "with (isCachable, isStrict)" EXPLAIN ANALYZE SELECT tmp.disease_name, count(tmp.disease_name) AS cnt FROM (SELECT DISTINCT disease.disease_name, disease_occurrences.sentence_id FROM disease, gene_occurrences, disease_occurrences WHERE gene_occurrences.sentence_id=disease_occurrences.sentence_id AND get_gene_id('igg')=gene_occurrences.gene_id AND disease.disease_id=disease_occurrences.disease_id) AS tmp GROUP BY tmp.disease_name ORDER BY cnt DESC; Now get_gene_id() isn't really immutable (unless you never change the gene table) but you have to lie and pretend that it is, so that the function call will be constant-folded during planner startup. The planner will then see something like gene_occurrences.gene_id = 42 and it will have a much better shot at determining the number of rows this matches. regards, tom lane
В списке pgsql-performance по дате отправления: