Re: bad plan
От | Julien Cigar |
---|---|
Тема | Re: bad plan |
Дата | |
Msg-id | 4F7ED097.70807@ulb.ac.be обсуждение исходный текст |
Ответ на | Re: bad plan (Ants Aasma <ants@cybertec.at>) |
Список | pgsql-performance |
On 04/05/2012 21:47, Ants Aasma wrote: > On Thu, Apr 5, 2012 at 2:47 PM, Julien Cigar<jcigar@ulb.ac.be> wrote: >> - http://www.pastie.org/3731956 : with default config >> - http://www.pastie.org/3731960 : this is with enable_seq_scan = off > It looks like the join selectivity of (context_to_context_links, > ancestors) is being overestimated by almost two orders of magnitude. > The optimizer thinks that there are 564 rows in the > context_to_context_links table for each taxon_id, while in fact for > this query the number is 9. To confirm that this, you can force the > selectivity estimate to be 200x lower by adding a geo_id = geod_id > where clause to the subquery. adding a geo_id = geo_id to the subquery helped a little bit with a cpu_tuple_cost of 0.1: http://www.pastie.org/3738224 : without: Index Scan using ltlc_taxon_id_idxoncontext_to_context_links (cost=0.00..146.93 rows=341 width=8) (actual time=0.004..0.019 rows=9 loops=736) with geo_id = geo_id: Index Scan using ltlc_taxon_id_idxoncontext_to_context_links (cost=0.00..148.11 rows=2 width=8) (actual time=0.004..0.020 rows=9 loops=736) > If it does help, then the next question would be why is the estimate > so much off. It could be either because the stats for > context_to_context_links.taxon_id are wrong or because > ancestors.taxon_id(subphylum_id = 18830) is a special case. To help > figuring this is out, you could run the following to queries and post > the results: > > SELECT floor(log(num,2)) AS nmatch, COUNT(*) AS freq FROM (SELECT > COUNT(*) AS num FROM context_to_context_links GROUP BY taxon_id) AS > dist GROUP BY 1 ORDER BY 1; > > SELECT floor(log(num,2)) AS nmatch, COUNT(*) AS freq FROM (SELECT > COUNT(*) AS num FROM context_to_context_links WHERE NOT geo_id IS NULL > and taxon_id= ANY ( select taxon_id from rab.ancestors where > ancestors.subphylum_id = 18830) GROUP BY taxon_id) AS dist GROUP BY > 1 ORDER BY 1; I'm sorry but I get an "ERROR: division by zero" for both of your queries.. > If the second distribution has a significantly different shape then > cross column statistics are necessary to get good plans. As it happens > I'm working on adding this functionality to PostgreSQL and would love > to hear more details about your use-case to understand if it would be > solved by this work. Thank you for your help, Julien > Regards, > Ants Aasma -- No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced.
Вложения
В списке pgsql-performance по дате отправления: