Re: Bad plan chosen for union all

Поиск

Список

Период

Сортировка

От	Alex Reece
Тема	Re: Bad plan chosen for union all
Дата	29 ноября 2017 г. 06:39:22
Msg-id	CANywC6CpaHnM7dawEu3M_X+GUwCOGUAJQoquZP7-aoMbw2SXQw@mail.gmail.com обсуждение исходный текст
Ответ на	Bad plan chosen for union all (Alex Reece <awreece@gmail.com>)
Ответы	Re: Bad plan chosen for union all
Список	pgsql-performance

Дерево обсуждения

I managed to reduce my test case: the following query does not take advantage of the index on contribution metrics. explain select cim.yield from earnings JOIN contributions on contributions.id = earnings.note_id JOIN ( SELECT contribution_id, max(CASE metrics.name WHEN 'Yield'::text THEN projected ELSE NULL::double precision END) AS yield from contribution_metrics JOIN metrics ON metrics.id = metric group by contribution_id ) cim ON cim.contribution_id = contributions.id WHERE earnings.id = '\x595400456c1f1400116b3843'; I got this: Hash Join (cost=125.02..147.03 rows=1 width=8) (actual time=4.781..4.906 rows=1 loops=1) Hash Cond: (contribution_metrics.contribution_id = contributions.id) -> HashAggregate (cost=116.86..126.64 rows=3261 width=21) (actual time=4.157..4.600 rows=3261 loops=1) Group Key: contribution_metrics.contribution_id -> Hash Join (cost=1.11..108.18 rows=5788 width=33) (actual time=0.021..2.425 rows=5788 loops=1) Hash Cond: (contribution_metrics.metric = metrics.id) -> Seq Scan on contribution_metrics (cost=0.00..85.36 rows=5788 width=34) (actual time=0.006..0.695 rows=5788 loops=1) -> Hash (cost=1.05..1.05 rows=17 width=25) (actual time=0.009..0.009 rows=17 loops=1) -> Seq Scan on metrics (cost=0.00..1.05 rows=17 width=25) (actual time=0.002..0.005 rows=17 loops=1) -> Hash (cost=8.15..8.15 rows=1 width=26) (actual time=0.022..0.022 rows=1 loops=1) -> Nested Loop (cost=0.14..8.15 rows=1 width=26) (actual time=0.019..0.020 rows=1 loops=1) -> Index Scan using earnings_pkey on earnings (cost=0.06..4.06 rows=1 width=13) (actual time=0.009..0.009 rows=1 loops=1) Index Cond: (id = '\x595400456c1f1400116b3843'::bytea) -> Index Only Scan using contributions_pkey on contributions (cost=0.08..4.09 rows=1 width=13) (actual time=0.008..0.009 rows=1 loops=1) Index Cond: (id = earnings.note_id) Planning time: 0.487 ms Execution time: 4.975 ms But I expected it to be equivalent to the plan from this query: select cim.yield from ( select contribution_id, max(CASE metrics.name WHEN 'Yield'::text THEN projected ELSE NULL::double precision END) AS yield from contribution_metrics JOIN metrics ON metrics.id = metric group by contribution_id ) cim where cim.contribution_id = ( select contributions.id from contributions join earnings on earnings.note_id = contributions.id where earnings.id = '\x595400456c1f1400116b3843') Which gives me _this_ plan, that correctly uses the index on contribution_metrics. Subquery Scan on cim (cost=9.32..14.23 rows=2 width=8) (actual time=0.108..0.108 rows=1 loops=1) InitPlan 1 (returns $1) -> Nested Loop (cost=0.14..8.15 rows=1 width=13) (actual time=0.054..0.055 rows=1 loops=1) -> Index Scan using earnings_pkey on earnings (cost=0.06..4.06 rows=1 width=13) (actual time=0.025..0.026 rows=1 loops=1) Index Cond: (id = '\x595400456c1f1400116b3843'::bytea) -> Index Only Scan using contributions_pkey on contributions (cost=0.08..4.09 rows=1 width=13) (actual time=0.026..0.026 rows=1 loops=1) Index Cond: (id = earnings.note_id) -> GroupAggregate (cost=1.17..6.07 rows=2 width=21) (actual time=0.108..0.108 rows=1 loops=1) Group Key: contribution_metrics.contribution_id -> Hash Join (cost=1.17..6.07 rows=2 width=33) (actual time=0.100..0.101 rows=2 loops=1) Hash Cond: (contribution_metrics.metric = metrics.id) -> Index Scan using contribution_metrics_contribution_id_metric_last_update_on_idx1 on contribution_metrics ( cost=0.06..4.95 rows=2 width=34) (actual time Index Cond: (contribution_id = $1) -> Hash (cost=1.05..1.05 rows=17 width=25) (actual time=0.012..0.012 rows=17 loops=1) -> Seq Scan on metrics (cost=0.00..1.05 rows=17 width=25) (actual time=0.004..0.006 rows=17 loops=1) Planning time: 0.396 ms Execution time: 0.165 ms schema here: https://gist.github.com/awreece/aeacbc818277c7c6d99477645e7fcd03 Best, ~Alex On Tue, Nov 28, 2017 at 2:13 AM Alex Reece wrote: > I'm on PostgreSQL 9.6.5 and getting an awkwardly bad plan chosen for my > query. > > I want to do: > > select investments.id, cim.yield > FROM contributions > JOIN investments ON contributions.investment_id = investments.id > JOIN contribution_investment_metrics_view cim ON cim.investment_id = > investments.id > WHERE contributions.id IN ('\x58c9c0d3ee944c48b32f814d', '\x11') > Where contribution_investment_metrics_view is morally > > select investment_id, first(val) from (select * from contribution_metrics > UNION ALL select * from investment_metrics) group by id > > Typically, querying this view is very fast since I have indexes in both > component queries, leading to a very tight plan: > > Sort Key: "*SELECT* 1".metric > -> Subquery Scan on "*SELECT* 1" (cost=14.68..14.68 rows=1 width=26) > (actual time=0.043..0.044 rows=2 loops=1) > -> Sort (cost=14.68..14.68 rows=1 width=42) (actual > time=0.042..0.043 rows=2 loops=1) > Sort Key: cm.metric, cm.last_update_on DESC > Sort Method: quicksort Memory: 25kB > -> Nested Loop (cost=0.14..14.68 rows=1 width=42) (actual > time=0.032..0.034 rows=2 loops=1) > -> Index Scan using contributions_investment_id_idx on > contributions (cost=0.08..4.77 rows=2 width=26) (actual time=0.026..0.027 > rows=1 loops=1) > Index Cond: (investment_id = $1) > -> Index Only Scan using > contribution_metrics_contribution_id_metric_last_update_on_idx on > contribution_metrics cm (cost=0.06..4.95 rows=2 width=34) (actual > time=0.005..0.006 r > Index Cond: (contribution_id = contributions.id) > Heap Fetches: 2 > -> Subquery Scan on "*SELECT* 2" (cost=0.08..5.86 rows=3 width=26) > (actual time=0.008..0.008 rows=3 loops=1) > -> Index Only Scan using > investment_metrics_investment_id_metric_last_updated_on_idx on > investment_metrics im (cost=0.08..5.85 rows=3 width=42) (actual > time=0.008..0.008 rows=3 loops=1) > Index Cond: (investment_id = $1) > Heap Fetches: 3 > > Unfortunately, when I try to query this view in the larger query above, I > get a *much* worse plan for this view, leading to >1000x degradation in > performance: > > -> Append (cost=10329.18..26290.92 rows=482027 width=26) (actual > time=90.157..324.544 rows=482027 loops=1) > -> Subquery Scan on "*SELECT* 1" (cost=10329.18..10349.44 > rows=5788 width=26) (actual time=90.157..91.207 rows=5788 loops=1) > -> Sort (cost=10329.18..10332.08 rows=5788 width=42) (actual > time=90.156..90.567 rows=5788 loops=1) > Sort Key: contributions_1.investment_id, cm.metric, > cm.last_update_on DESC > Sort Method: quicksort Memory: 645kB > -> Hash Join (cost=105.62..10256.84 rows=5788 > width=42) (actual time=1.924..85.913 rows=5788 loops=1) > Hash Cond: (contributions_1.id = > cm.contribution_id) > -> Seq Scan on contributions contributions_1 > (cost=0.00..9694.49 rows=351495 width=26) (actual time=0.003..38.794 > rows=351495 loops=1) > -> Hash (cost=85.36..85.36 rows=5788 width=34) > (actual time=1.907..1.907 rows=5788 loops=1) > Buckets: 8192 Batches: 1 Memory Usage: > 453kB > -> Seq Scan on contribution_metrics cm > (cost=0.00..85.36 rows=5788 width=34) (actual time=0.003..0.936 rows=5788 > loops=1) > -> Subquery Scan on "*SELECT* 2" (cost=0.08..15941.48 rows=476239 > width=26) (actual time=0.017..203.006 rows=476239 loops=1) > -> Index Only Scan using > investment_metrics_investment_id_metric_last_updated_on_idx1 on > investment_metrics im (cost=0.08..14512.76 rows=476239 width=42) (actual > time=0.016..160.410 rows=476239 l > Heap Fetches: 476239 > > I've played around with a number of solutions (including lateral joins) > and the closest I can come is: > > select investment_id > from contribution_investment_metrics > where investment_id = ( > select investments.id > from investments > join contributions on investments.id = contributions.investment_id > where contributions.id = '\x58c9c0d3ee944c48b32f814d' > ) > > This doesn't really work for my purposes, since I want to project columns > from contributions and investments and I want to run this query on "up to a > handful" contributions at once (maybe more than one, never more than 100). > > I'm on PostgreSQL 9.6.5. > Schema and full explain analyzes: > https://gist.github.com/awreece/28c359c6d834717ab299665022b19fd6 > I don't think it's relevant, but since > https://wiki.postgresql.org/wiki/SlowQueryQuestions asks -- I'm running > in Heroku. > > What are my options here? Currently, I'm planning to avoid these bad plans > by using a less straightforward query for the view: > > SELECT > coalesce(contrib.id, cm.contribution_id) AS contribution_id, > coalesce(cm.yield, im.yield) AS yield, > coalesce(cm.term, im.term) AS term > FROM contributions contrib > JOIN investment_metrics_view im ON im.investment_id = > contrib.investment_id > FULL OUTER JOIN contribution_metrics_view cm ON cm.contribution_id = > contrib.id > > Best, > ~Alex Reece >

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Bad plan chosen for union all