Should Explain show Parallel Hash node’s total rows?

Поиск
Список
Период
Сортировка
От Zhang Mingli
Тема Should Explain show Parallel Hash node’s total rows?
Дата
Msg-id 9CC27C1D-8592-4331-8F3B-D98109A48CAF@gmail.com
обсуждение исходный текст
Список pgsql-hackers

Hi, all

Shall we show Parallel Hash node’s total rows of a Parallel-aware HashJoin?

Ex: a non-parallel plan,  table simple has 20000 rows.

zml=# explain  select count(*) from simple r join simple s using (id);
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Aggregate  (cost=1309.00..1309.01 rows=1 width=8)
   ->  Hash Join  (cost=617.00..1259.00 rows=20000 width=0)
         Hash Cond: (r.id = s.id)
         ->  Seq Scan on simple r  (cost=0.00..367.00 rows=20000 width=4)
         ->  Hash  (cost=367.00..367.00 rows=20000 width=4)
               ->  Seq Scan on simple s  (cost=0.00..367.00 rows=20000 width=4)
(6 rows)

While a parallel-aware plan:

zml=# explain  select count(*) from simple r join simple s using (id);
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=691.85..691.86 rows=1 width=8)
   ->  Gather  (cost=691.63..691.84 rows=2 width=8)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=691.63..691.64 rows=1 width=8)
               ->  Parallel Hash Join  (cost=354.50..670.80 rows=8333 width=0)
                     Hash Cond: (r.id = s.id)
                     ->  Parallel Seq Scan on simple r  (cost=0.00..250.33 rows=8333 width=4)
                     ->  Parallel Hash  (cost=250.33..250.33 rows=8333 width=4)
                           ->  Parallel Seq Scan on simple s  (cost=0.00..250.33 rows=8333 width=4)
(9 rows)

When initial_cost_hashjoin(), we undo the parallel division when parallel ware.
It’s reasonable because a shared hash table should have all the data.
And we also take parallel into account for hash plan’s total rows if it’s parallel aware.
```
 if (best_path->jpath.path.parallel_aware)
{
  hash_plan->plan.parallel_aware = true;
  hash_plan->rows_total = best_path->inner_rows_total;
}
```

But the Parallel Hash node of plan shows the same rows with subplan, I’m wandering if it’s more reasonable to show rows_total instead of plan_rows for Parallel Hash nodes?

For this example,
  -> Parallel Hash (rows=20000)
    -> Parallel Seq Scan on simple s (rows=8333)



Zhang Mingli
HashData https://www.hashdata.xyz

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: run pgindent on a regular basis / scripted manner
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: trying again to get incremental backup