BUG #18423: suboptimal query plan is used when ordering by an indexed field with limit

Поиск
Список
Период
Сортировка
От PG Bug reporting form
Тема BUG #18423: suboptimal query plan is used when ordering by an indexed field with limit
Дата
Msg-id 18423-f063cbf58ca6a294@postgresql.org
обсуждение исходный текст
Ответы Re: BUG #18423: suboptimal query plan is used when ordering by an indexed field with limit  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      18423
Logged by:          Jiayin Mao
Email address:      maojiayin@gmail.com
PostgreSQL version: 15.0
Operating system:   all
Description:

We found that when we use ORDER BY on an indexed field with a small limit
value the index on the ordered field is always used, leading to long
execution time when the ordered field does not have good selectivity.
Usually the main purpose for using ORDER BY with LIMIT is to make sure the
search result is stable. It is the conditions in the query (the expressions
between WHERE and ORDER BY) that application developers want query planner
to choose to select targeted rows as a first step. 

For example, our "user" table has an id primary key, an "org_id" column and
a "disabled" column. The table has millions of rows and for each org_id
there is only usually a few hundred rows. We have an index on (org_id,
disabled) and that index can quickly select a few hundred rows from the
million-row table. We want to find the row with the smallest id with a given
org_id, so we use "ORDER BY id LIMIT 1" as order condition. The query plan
uses the btree index on the id field since it is the ordered field, causing
the execution to first see a lot more rows than it needs. In the following
query plan, it has to remove 596003 rows but the org only has a few hundred
rows, which can be seen in the second query plan below. 

```
explain analyze SELECT * FROM "user" WHERE org_id = 123456 AND disabled =
false ORDER BY id LIMIT 1;
                                                              QUERY PLAN
                                                          

---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..366.83 rows=1 width=232) (actual time=357.319..357.320
rows=1 loops=1)
   ->  Index Scan using user_pkey on user  (cost=0.43..306673.84 rows=837
width=232) (actual time=357.319..357.319 rows=1 loops=1)
         Filter: ((NOT disabled) AND (org_id = 123456))
         Rows Removed by Filter: 596003
 Planning Time: 0.885 ms
 Execution Time: 357.373 ms
```

If the index of (org_id, disabled) is used by tricking the query planner
with "ORDER BY id + 0 LIMIT 1", the query time drops from 357ms to 1.5ms.
```
EXPLAIN analyze SELECT * FROM "user" WHERE org_id = 123456 and disabled =
false order by user.id + 0 limit 1;
                                                                     QUERY
PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3145.61..3145.62 rows=1 width=236) (actual time=1.513..1.514
rows=1 loops=1)
   ->  Sort  (cost=3145.61..3147.71 rows=837 width=236) (actual
time=1.512..1.513 rows=1 loops=1)
         Sort Key: ((id + 0))
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Index Scan using user_org_disabled_idx on user
(cost=0.43..3141.43 rows=837 width=236) (actual time=0.049..1.407 rows=166
loops=1)
               Index Cond: ((org_id = 123456) AND (disabled = false))
 Planning Time: 0.908 ms
 Execution Time: 1.580 ms
```


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
Следующее
От: Noah Misch
Дата:
Сообщение: Re: FSM Corruption (was: Could not read block at end of the relation)