optimizing common subqueries
От | Kevin Murphy |
---|---|
Тема | optimizing common subqueries |
Дата | |
Msg-id | 4342D298.2070803@genome.chop.edu обсуждение исходный текст |
Ответы |
Re: optimizing common subqueries
|
Список | pgsql-general |
My understanding is that PG does not eliminate common subqueries within complex queries. I just wanted to confirm this. I also tried wrapping subqueries in STABLE or IMMUTABLE functions returning SETOF, but from the PG 8.0.3 log, I see that the function is invoked redundantly anyway. I know that disk buffering has the effect of optimizing this sort of thing to some extent, but I was wondering if I can do better. Any other possibilities? The code below shows what I am trying to do. I created functions to make it easy for you to see where the common subqueries occur (note that one of the functions invokes the other function, which increases the redundancy.) CREATE FUNCTION pmids_by_mention(text) RETURNS SETOF integer AS $$ SELECT pmid FROM normalized_genes WHERE symbol IN ( SELECT DISTINCT symbol FROM normalized_genes WHERE mention = $1 ); $$ LANGUAGE SQL STRICT IMMUTABLE; CREATE FUNCTION common_pmids(text,text) RETURNS SETOF integer AS $$ SELECT * FROM pmids_by_mention($1) INTERSECT SELECT * FROM pmids_by_mention($2) $$ LANGUAGE SQL STRICT IMMUTABLE; SELECT PMID, COUNT(*) AS total FROM ( SELECT pmid FROM pmids_by_mention('mycn') as pmid WHERE pmid IN ( SELECT * FROM common_pmids('mycn','trka') ) UNION ALL SELECT pmid FROM pmids_by_mention('trka') as pmid WHERE pmid IN ( SELECT * FROM common_pmids('mycn','trka') ) ) AS subq GROUP BY pmid ORDER BY total desc; I doubt anybody cares, but ... I am doing an article lookup by genes which are mentioned in articles (pmid), where each gene may be referred to indirectly via an alias (mention). Each gene symbol has many aliases/mentions. (Unfortunately, it is also possible but rare for an alias/mention to map to more than one gene symbol). The query logic is as follows. For each mention/alias supplied by the user, find all articles connected to the gene (or, rarely, genes), even indirectly. Take the intersection of these sets to find the articles/pmid's containing all the specified genes. Unfortunately, in order to rank the articles by relevance, it is further necessary to come up with a result set containing one row for each qualifying gene mention in the set of matching articles. There can be any number of search terms (up to some limit), so the actual query has to be built dynamically on the fly by the application. Thanks, Kevin Murphy
В списке pgsql-general по дате отправления: