Top-N sorts in EXPLAIN, row count estimates, and parallelism

Поиск
Список
Период
Сортировка
От	Andres Freund
Тема	Top-N sorts in EXPLAIN, row count estimates, and parallelism
Дата	23 мая 2019 г. 19:22:48
Msg-id	20190523192248.lqapbtgsnyerbbop@alap3.anarazel.de обсуждение исходный текст
Ответы	Re: Top-N sorts in EXPLAIN, row count estimates, and parallelism
Список	pgsql-hackers
Дерево обсуждения
Hi,

Right now we don't indicate that a top-n sort is going to be used in
EXPLAIN, just EXPLAIN ANALYZE. That's imo suboptimal, because one quite
legitimately might want to know that before actually executing (as it
will make a huge amount of difference in the actual resource intensity
of the query).

postgres[28165][1]=# EXPLAIN (VERBOSE) SELECT * FROM hashes ORDER BY count DESC LIMIT 10;
┌───────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                              QUERY PLAN                                               │
├───────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit  (cost=12419057.53..12419058.70 rows=10 width=45)                                               │
│   Output: hash, count                                                                                 │
│   ->  Gather Merge  (cost=12419057.53..66041805.65 rows=459591466 width=45)                           │
│         Output: hash, count                                                                           │
│         Workers Planned: 2                                                                            │
│         ->  Sort  (cost=12418057.51..12992546.84 rows=229795733 width=45)                             │
│               Output: hash, count                                                                     │
│               Sort Key: hashes.count DESC                                                             │
│               ->  Parallel Seq Scan on public.hashes  (cost=0.00..7452254.33 rows=229795733 width=45) │
│                     Output: hash, count                                                               │
└───────────────────────────────────────────────────────────────────────────────────────────────────────┘
(10 rows)

postgres[28165][1]=# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM hashes ORDER BY count DESC LIMIT 10;

┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                         QUERY PLAN
                                     │
 

├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit  (cost=12419057.53..12419058.70 rows=10 width=45) (actual time=115204.278..115205.024 rows=10 loops=1)
                                     │
 
│   Output: hash, count
                                     │
 
│   ->  Gather Merge  (cost=12419057.53..66041805.65 rows=459591466 width=45) (actual time=115204.276..115205.020
rows=10loops=1)                            │
 
│         Output: hash, count
                                     │
 
│         Workers Planned: 2
                                     │
 
│         Workers Launched: 2
                                     │
 
│         ->  Sort  (cost=12418057.51..12992546.84 rows=229795733 width=45) (actual time=115192.189..115192.189 rows=7
loops=3)                              │
 
│               Output: hash, count
                                     │
 
│               Sort Key: hashes.count DESC
                                     │
 
│               Sort Method: top-N heapsort  Memory: 25kB
                                     │
 
│               Worker 0:  Sort Method: top-N heapsort  Memory: 25kB
                                     │
 
│               Worker 1:  Sort Method: top-N heapsort  Memory: 25kB
                                     │
 
│               Worker 0: actual time=115186.558..115186.559 rows=10 loops=1
                                     │
 
│               Worker 1: actual time=115186.540..115186.540 rows=10 loops=1
                                     │
 
│               ->  Parallel Seq Scan on public.hashes  (cost=0.00..7452254.33 rows=229795733 width=45) (actual
time=0.080..90442.215rows=183836589 loops=3) │
 
│                     Output: hash, count
                                     │
 
│                     Worker 0: actual time=0.111..90366.999 rows=183976442 loops=1
                                     │
 
│                     Worker 1: actual time=0.107..90461.921 rows=184707680 loops=1
                                     │
 
│ Planning Time: 0.121 ms
                                     │
 
│ Execution Time: 115205.053 ms
                                     │
 

└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(20 rows)

It's also noticable that we preposterously assume that the sort actually
will return exactly the number of rows in the table, despite being a
top-n style sort. That seems bad for costing of the parallel query,
because it think we'll assume that costs tups * parallel_tuple_cost?

I'm also unclear as to why the Gather Merge ends up with twice as
many estimated rows as there are in the table.

Greetings,

Andres Freund
В списке pgsql-hackers по дате отправления:
Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Top-N sorts in EXPLAIN, row count estimates, and parallelism