BUG #18234: Nested Loop joint strategy is ignored for a tiny table joined with UNION ALL of two filtered parts
От | PG Bug reporting form |
---|---|
Тема | BUG #18234: Nested Loop joint strategy is ignored for a tiny table joined with UNION ALL of two filtered parts |
Дата | |
Msg-id | 18234-571586e70bd0658d@postgresql.org обсуждение исходный текст |
Ответы |
Re: BUG #18234: Nested Loop joint strategy is ignored for a tiny table joined with UNION ALL of two filtered parts
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 18234 Logged by: Dmitry Astapov Email address: dastapov@gmail.com PostgreSQL version: 15.5 Operating system: Rocky Linux Description: Summary of the issue: for a (5-row recordset) JOIN (massive partitioned recordset indexed by id) USING (id), the (Nested Loop over 5 values) strategy is completely ignored, and Hash Join or Merge Join is done instead, which does SeqScan over the "massive recordset". Verified with PostgreSQL versions 13.13, 14.10, 15.5 Reproduction in DB Fiddle: https://www.db-fiddle.com/f/sJUUWNgW7pqPWcJwihVoj5/1 (this demonstrates both the bad behaviour and a way to work around it) Plan for the badly-behaving query (pulled out of DB Fiddle): explain select * from tiny join vw_broken on tiny.id = vw_broken.id; QUERY PLAN --------------------------------------------------------------------------------------------- Hash Join (cost=1.11..3798.30 rows=712 width=16) Hash Cond: (huge.id = tiny.id) -> Append (cost=0.00..3683.32 rows=28466 width=8) -> Append (cost=0.00..3222.91 rows=28181 width=8) -> Seq Scan on huge huge_1 (cost=0.00..0.00 rows=1 width=8) Filter: filter_out -> Seq Scan on huge_partition1 huge_2 (cost=0.00..1541.00 rows=13987 width=8) Filter: filter_out -> Seq Scan on huge_partition2 huge_3 (cost=0.00..1541.00 rows=14193 width=8) Filter: filter_out -> Append (cost=0.00..33.42 rows=285 width=8) -> Seq Scan on medium medium_1 (cost=0.00..0.00 rows=1 width=8) Filter: filter_out -> Seq Scan on medium_partition1 medium_2 (cost=0.00..16.00 rows=142 width=8) Filter: filter_out -> Seq Scan on medium_partition2 medium_3 (cost=0.00..16.00 rows=142 width=8) Filter: filter_out -> Hash (cost=1.05..1.05 rows=5 width=8) -> Seq Scan on tiny (cost=0.00..1.05 rows=5 width=8) I think that there are two key components for triggering the bad behaviour: 1)The massive recordset on the right side of the JOIN must come from the UNION ALL of two parts, both of which have a filter, like this view in my reproduction: create view vw_broken as select id from huge where filter_out union all select id from medium where filter_out; 2)Tables being UNION ALL'ed should be inheritance parents (aka old-style partitioned tables). I was unable to reproduce with regular tables, though I did not try very hard. The fix is to move the filter condition out of UNION ALL: create view vw_not_broken as select id,filter_out from ( select id,filter_out from huge union all select id,filter_out from medium ) q where filter_out; What adds insult to injury is that the optimizer seems to be happily pushing the "where filter_out" condition back into the UNION ALL and down the inheritance hierarchy (which is exactly what the "broken" view is doing as well, seemingly): explain select * from tiny join vw_not_broken on tiny.id = vw_not_broken.id; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..151.53 rows=712 width=17) -> Seq Scan on tiny (cost=0.00..1.05 rows=5 width=8) -> Append (cost=0.00..30.04 rows=6 width=9) -> Seq Scan on huge (cost=0.00..0.00 rows=1 width=9) Filter: (filter_out AND (tiny.id = id)) -> Index Scan using huge_partition1_id_idx on huge_partition1 huge_1 (cost=0.29..8.31 rows=1 width=9) Index Cond: (id = tiny.id) Filter: filter_out -> Index Scan using huge_partition2_id_idx on huge_partition2 huge_2 (cost=0.29..8.31 rows=1 width=9) Index Cond: (id = tiny.id) Filter: filter_out -> Seq Scan on medium (cost=0.00..0.00 rows=1 width=9) Filter: (filter_out AND (tiny.id = id)) -> Index Scan using medium_partition1_id_idx on medium_partition1 medium_1 (cost=0.28..6.69 rows=1 width=9) Index Cond: (id = tiny.id) Filter: filter_out -> Index Scan using medium_partition2_id_idx on medium_partition2 medium_2 (cost=0.28..6.69 rows=1 width=9) Index Cond: (id = tiny.id) Filter: filter_out I was unable to get a "good" plan out of the "bad" view by tweaking enable_ settings, so I suspect that the indexed access path is not considered at all, for some reason. I don't think that this is a recently introduced issue, as this email thread from 2015 seems to be talking about the same issue: https://www.postgresql.org/message-id/20150521104103.GB10049@pax.zz.de (sadly, thread petered out without a reproduction / more info from the reporter). I'd love to do more debugging and get to the bottom of this. If you have a hunch or suspicion about the possible origin of the bad behaviour but have no time to check it yourself - can you please point me to a suspect part of the optimizer so I can try to pinpoint it?
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Tom LaneДата:
Сообщение: Re: BUG #18233: coerce_type accepts NULL as pstate and can pass it to coerce_record_to_complex
Следующее
От: Tom LaneДата:
Сообщение: Re: BUG #18234: Nested Loop joint strategy is ignored for a tiny table joined with UNION ALL of two filtered parts