BUG #16993: Query Planner does not use index for EXISTS-query on a large table with all equal foreign key values

Поиск

Список

Период

Сортировка

От	PG Bug reporting form
Тема	BUG #16993: Query Planner does not use index for EXISTS-query on a large table with all equal foreign key values
Дата	5 мая 2021 г. 13:24:05
Msg-id	16993-1298e8dc9c41ab97@postgresql.org обсуждение исходный текст
Ответы	Re: BUG #16993: Query Planner does not use index for EXISTS-query on a large table with all equal foreign key values Re: BUG #16993: Query Planner does not use index for EXISTS-query on a large table with all equal foreign key values
Список	pgsql-bugs

Дерево обсуждения

The following bug has been logged on the website:

Bug reference:      16993
Logged by:          A Schnabl
Email address:      a.schnabl@synedra.com
PostgreSQL version: 13.2
Operating system:   Windows 10
Description:

Hello,

I can reproduce a query planner problem on a fresh default installation of
postgres-13.2 on my Windows 10 machine. I also saw it occur on on
postgres-12.6 installations on Windows 10 and CentOS 8. I used the following
table setup:

CREATE TABLE data_node (id BIGINT PRIMARY KEY);
CREATE TABLE data_entry (id BIGINT PRIMARY KEY, node_fk BIGINT NOT NULL,
FOREIGN KEY (node_fk) REFERENCES data_node (id));
CREATE INDEX node_ix ON data_entry USING BTREE (node_fk);
INSERT INTO data_node (id) SELECT generate_series(1,10);
INSERT INTO data_entry (id, node_fk) SELECT s, 2 FROM
generate_series(1,10000000) s;
VACUUM ANALYZE data_node;
VACUUM ANALYZE data_entry;

With the following query, I filter all entries of the data_node table (only
a single with the data given here) which are referenced by the data_entry
table:

SELECT * FROM data_node
WHERE EXISTS (
 SELECT 1 FROM data_entry WHERE node_fk = data_node.id
);

The query takes a few seconds to complete, although I expect it to be much
faster (as shown below, it can be performed in less than a millisecond) due
to the node_ix index. EXPLAIN ANALYZE shows the following query plan:

Merge Join  (cost=179053.41..179054.24 rows=1 width=8) (actual
time=2140.820..2140.822 rows=1 loops=1)
  Merge Cond: (data_node.id = data_entry.node_fk)
  ->  Index Only Scan using data_node_pkey on data_node  (cost=0.14..8.29
rows=10 width=8) (actual time=0.006..0.010 rows=3 loops=1)
        Heap Fetches: 0
  ->  Sort  (cost=179053.27..179053.28 rows=1 width=8) (actual
time=2140.805..2140.806 rows=1 loops=1)
        Sort Key: data_entry.node_fk
        Sort Method: quicksort  Memory: 25kB
        ->  HashAggregate  (cost=179053.25..179053.26 rows=1 width=8)
(actual time=2140.796..2140.796 rows=1 loops=1)
              Group Key: data_entry.node_fk
              Batches: 1  Memory Usage: 24kB
              ->  Seq Scan on data_entry  (cost=0.00..154053.60 rows=9999860
width=8) (actual time=0.036..973.376 rows=10000000 loops=1)
Planning Time: 0.192 ms
Execution Time: 2140.873 ms

Note that a sequential scan scan on data_entry is used here rather than some
index (only) scans on node_ix. If I

SET enable_seqscan = false;

and EXPLAIN ANALYZE the query again, I get the following result:

Nested Loop Semi Join  (cost=0.57..12.85 rows=1 width=8) (actual
time=0.877..0.920 rows=1 loops=1)
  ->  Index Only Scan using data_node_pkey on data_node  (cost=0.14..8.29
rows=10 width=8) (actual time=0.006..0.011 rows=10 loops=1)
        Heap Fetches: 0
  ->  Index Only Scan using node_ix on data_entry  (cost=0.43..178143.98
rows=9999860 width=8) (actual time=0.089..0.089 rows=0 loops=10)
        Index Cond: (node_fk = data_node.id)
        Heap Fetches: 0
Planning Time: 0.328 ms
Execution Time: 0.959 ms

Due to the huge performance gap between the plans, I think that the query
planner should use the node_ix index here even without enable_seqscan being
set to false.

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

BUG #16993: Query Planner does not use index for EXISTS-query on a large table with all equal foreign key values