Should Explain show Parallel Hash node’s total rows?

Поиск

Список

Период

Сортировка

От	Zhang Mingli
Тема	Should Explain show Parallel Hash node’s total rows?
Дата	24 октября 2023 г. 14:46:06
Msg-id	9CC27C1D-8592-4331-8F3B-D98109A48CAF@gmail.com обсуждение исходный текст
Список	pgsql-hackers

Дерево обсуждения

Hi, all

Shall we show Parallel Hash node’s total rows of a Parallel-aware HashJoin?

Ex: a non-parallel plan,  table simple has 20000 rows.

zml=# explain  select count(*) from simple r join simple s using (id);
QUERY PLAN
--------------------------------------------------------------------------------
Aggregate  (cost=1309.00..1309.01 rows=1 width=8)
->  Hash Join  (cost=617.00..1259.00 rows=20000 width=0)
Hash Cond: (r.id = s.id)
->  Seq Scan on simple r  (cost=0.00..367.00 rows=20000 width=4)
->  Hash  (cost=367.00..367.00 rows=20000 width=4)
->  Seq Scan on simple s  (cost=0.00..367.00 rows=20000 width=4)
(6 rows)

While a parallel-aware plan:

zml=# explain  select count(*) from simple r join simple s using (id);
QUERY PLAN
----------------------------------------------------------------------------------------------------
Finalize Aggregate  (cost=691.85..691.86 rows=1 width=8)
->  Gather  (cost=691.63..691.84 rows=2 width=8)
Workers Planned: 2
->  Partial Aggregate  (cost=691.63..691.64 rows=1 width=8)
->  Parallel Hash Join  (cost=354.50..670.80 rows=8333 width=0)
Hash Cond: (r.id = s.id)
->  Parallel Seq Scan on simple r  (cost=0.00..250.33 rows=8333 width=4)
->  Parallel Hash  (cost=250.33..250.33 rows=8333 width=4)
->  Parallel Seq Scan on simple s  (cost=0.00..250.33 rows=8333 width=4)
(9 rows)

When initial_cost_hashjoin(), we undo the parallel division when parallel ware.
It’s reasonable because a shared hash table should have all the data.
And we also take parallel into account for hash plan’s total rows if it’s parallel aware.
```
if (best_path->jpath.path.parallel_aware)
{
hash_plan->plan.parallel_aware = true;
hash_plan->rows_total = best_path->inner_rows_total;
}
```

But the Parallel Hash node of plan shows the same rows with subplan, I’m wandering if it’s more reasonable to show rows_total instead of plan_rows for Parallel Hash nodes?

For this example,
-> Parallel Hash (rows=20000)
-> Parallel Seq Scan on simple s (rows=8333)

Zhang Mingli

HashData https://www.hashdata.xyz

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Should Explain show Parallel Hash node’s total rows?