BUG #16993: Query Planner does not use index for EXISTS-query on a large table with all equal foreign key values
От | PG Bug reporting form |
---|---|
Тема | BUG #16993: Query Planner does not use index for EXISTS-query on a large table with all equal foreign key values |
Дата | |
Msg-id | 16993-1298e8dc9c41ab97@postgresql.org обсуждение исходный текст |
Ответы |
Re: BUG #16993: Query Planner does not use index for EXISTS-query on a large table with all equal foreign key values
Re: BUG #16993: Query Planner does not use index for EXISTS-query on a large table with all equal foreign key values |
Список | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 16993 Logged by: A Schnabl Email address: a.schnabl@synedra.com PostgreSQL version: 13.2 Operating system: Windows 10 Description: Hello, I can reproduce a query planner problem on a fresh default installation of postgres-13.2 on my Windows 10 machine. I also saw it occur on on postgres-12.6 installations on Windows 10 and CentOS 8. I used the following table setup: CREATE TABLE data_node (id BIGINT PRIMARY KEY); CREATE TABLE data_entry (id BIGINT PRIMARY KEY, node_fk BIGINT NOT NULL, FOREIGN KEY (node_fk) REFERENCES data_node (id)); CREATE INDEX node_ix ON data_entry USING BTREE (node_fk); INSERT INTO data_node (id) SELECT generate_series(1,10); INSERT INTO data_entry (id, node_fk) SELECT s, 2 FROM generate_series(1,10000000) s; VACUUM ANALYZE data_node; VACUUM ANALYZE data_entry; With the following query, I filter all entries of the data_node table (only a single with the data given here) which are referenced by the data_entry table: SELECT * FROM data_node WHERE EXISTS ( SELECT 1 FROM data_entry WHERE node_fk = data_node.id ); The query takes a few seconds to complete, although I expect it to be much faster (as shown below, it can be performed in less than a millisecond) due to the node_ix index. EXPLAIN ANALYZE shows the following query plan: Merge Join (cost=179053.41..179054.24 rows=1 width=8) (actual time=2140.820..2140.822 rows=1 loops=1) Merge Cond: (data_node.id = data_entry.node_fk) -> Index Only Scan using data_node_pkey on data_node (cost=0.14..8.29 rows=10 width=8) (actual time=0.006..0.010 rows=3 loops=1) Heap Fetches: 0 -> Sort (cost=179053.27..179053.28 rows=1 width=8) (actual time=2140.805..2140.806 rows=1 loops=1) Sort Key: data_entry.node_fk Sort Method: quicksort Memory: 25kB -> HashAggregate (cost=179053.25..179053.26 rows=1 width=8) (actual time=2140.796..2140.796 rows=1 loops=1) Group Key: data_entry.node_fk Batches: 1 Memory Usage: 24kB -> Seq Scan on data_entry (cost=0.00..154053.60 rows=9999860 width=8) (actual time=0.036..973.376 rows=10000000 loops=1) Planning Time: 0.192 ms Execution Time: 2140.873 ms Note that a sequential scan scan on data_entry is used here rather than some index (only) scans on node_ix. If I SET enable_seqscan = false; and EXPLAIN ANALYZE the query again, I get the following result: Nested Loop Semi Join (cost=0.57..12.85 rows=1 width=8) (actual time=0.877..0.920 rows=1 loops=1) -> Index Only Scan using data_node_pkey on data_node (cost=0.14..8.29 rows=10 width=8) (actual time=0.006..0.011 rows=10 loops=1) Heap Fetches: 0 -> Index Only Scan using node_ix on data_entry (cost=0.43..178143.98 rows=9999860 width=8) (actual time=0.089..0.089 rows=0 loops=10) Index Cond: (node_fk = data_node.id) Heap Fetches: 0 Planning Time: 0.328 ms Execution Time: 0.959 ms Due to the huge performance gap between the plans, I think that the query planner should use the node_ix index here even without enable_seqscan being set to false.
В списке pgsql-bugs по дате отправления: