Обсуждение: BUG #17922: ANY vs IN execution plan difference for a single explicit input value
BUG #17922: ANY vs IN execution plan difference for a single explicit input value
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 17922 Logged by: Kaarel Moppel Email address: kaarel.moppel@gmail.com PostgreSQL version: 15.2 Operating system: Ubuntu 22.04 Description: I guess there can be differences how IN vs ANY input is handled for prepared statements but for plain SQL input I would expect them to produce identical plans. Seems to be not the case though for a below scenario which we stumbled on in production where there's a runtime difference of around 5 orders of magnitude as a different leading index is chosen. A repeatable test case: CREATE UNLOGGED TABLE a ( node_id int8 NOT NULL GENERATED BY DEFAULT AS IDENTITY, project_id int8 NOT NULL ); INSERT INTO a (project_id) SELECT CASE WHEN random() < 0.99 THEN 1 ELSE 2 END FROM generate_series(1, 8e6); CREATE UNIQUE INDEX a_uq ON a (project_id, node_id); VACUUM ANALYZE a; CREATE UNLOGGED TABLE b ( node_id int8 NOT NULL UNIQUE ); INSERT INTO b (node_id) SELECT node_id FROM ( SELECT node_id FROM a WHERE node_id > 7e6 LIMIT 15000) x; VACUUM ANALYZE b; EXPLAIN SELECT a.* FROM a JOIN b USING (node_id) WHERE a.project_id = ANY (ARRAY[1]::int8[]) ORDER BY a.project_id, a.node_id LIMIT 10; EXPLAIN SELECT a.* FROM a JOIN b USING (node_id) WHERE a.project_id IN (1) -- Also OK with a plain "=1" witout IN ORDER BY a.project_id, a.node_id LIMIT 10; QUERY PLAN ─────────────────────────────────────────────────────────────────────────────────────────── Limit (cost=0.72..1789.68 rows=10 width=16) -> Nested Loop (cost=0.72..2655175.45 rows=14842 width=16) -> Index Only Scan using a_uq on a (cost=0.43..260452.19 rows=7915872 width=16) Index Cond: (project_id = ANY ('{1}'::bigint[])) -> Index Only Scan using b_node_id_key on b (cost=0.29..0.30 rows=1 width=8) Index Cond: (node_id = a.node_id) (6 rows) QUERY PLAN ────────────────────────────────────────────────────────────────────────────────────────────── Limit (cost=0.72..38.07 rows=10 width=16) -> Nested Loop (cost=0.72..55440.79 rows=14842 width=16) -> Index Only Scan using b_node_id_key on b (cost=0.29..397.29 rows=15000 width=8) -> Index Only Scan using a_uq on a (cost=0.43..3.67 rows=1 width=16) Index Cond: ((project_id = 1) AND (node_id = b.node_id)) (5 rows)
PG Bug reporting form <noreply@postgresql.org> writes: > I guess there can be differences how IN vs ANY input is handled for prepared > statements but for plain SQL input I would expect them to produce identical > plans. If you can find any such promise in the documentation, I will be glad to remove it ;-) > EXPLAIN > SELECT > a.* > FROM > a > JOIN b USING (node_id) > WHERE > a.project_id = ANY (ARRAY[1]::int8[]) > ORDER BY > a.project_id, > a.node_id > LIMIT 10; > EXPLAIN > SELECT > a.* > FROM > a > JOIN b USING (node_id) > WHERE > a.project_id IN (1) -- Also OK with a plain "=1" witout IN > ORDER BY > a.project_id, > a.node_id > LIMIT 10; The thing about "project_id IN (1)", which is reduced to "project_id = 1" by the parser, is that the planner can see that that renders the "ORDER BY a.project_id" clause a no-op. So it only has to order by node_id, which is what makes this plan valid: > Limit (cost=0.72..38.07 rows=10 width=16) > -> Nested Loop (cost=0.72..55440.79 rows=14842 width=16) > -> Index Only Scan using b_node_id_key on b (cost=0.29..397.29 rows=15000 width=8) > -> Index Only Scan using a_uq on a (cost=0.43..3.67 rows=1 width=16) > Index Cond: ((project_id = 1) AND (node_id = b.node_id)) If you don't recognize that then you're forced into sorting the indexscan output, which is going to look pretty bad for a small-LIMIT situation. We don't make any attempt to make a similar deduction from =ANY clauses, mainly because it's usually not possible to be sure that the array has only one member. I'm not excited about the cost/benefit ratio of adding code to check for that. regards, tom lane