Shall we show Parallel Hash node’s total rows of a Parallel-aware HashJoin?
Ex: a non-parallel plan, table simple has 20000 rows.
zml=# explain select count(*) from simple r join simple s using (id); QUERY PLAN -------------------------------------------------------------------------------- Aggregate (cost=1309.00..1309.01 rows=1 width=8) -> Hash Join (cost=617.00..1259.00 rows=20000 width=0) Hash Cond: (r.id = s.id) -> Seq Scan on simple r (cost=0.00..367.00 rows=20000 width=4) -> Hash (cost=367.00..367.00 rows=20000 width=4) -> Seq Scan on simple s (cost=0.00..367.00 rows=20000 width=4) (6 rows)
While a parallel-aware plan:
zml=# explain select count(*) from simple r join simple s using (id); QUERY PLAN ---------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=691.85..691.86 rows=1 width=8) -> Gather (cost=691.63..691.84 rows=2 width=8) Workers Planned: 2 -> Partial Aggregate (cost=691.63..691.64 rows=1 width=8) -> Parallel Hash Join (cost=354.50..670.80 rows=8333 width=0) Hash Cond: (r.id = s.id) -> Parallel Seq Scan on simple r (cost=0.00..250.33 rows=8333 width=4) -> Parallel Hash (cost=250.33..250.33 rows=8333 width=4) -> Parallel Seq Scan on simple s (cost=0.00..250.33 rows=8333 width=4) (9 rows)
When initial_cost_hashjoin(), we undo the parallel division when parallel ware. It’s reasonable because a shared hash table should have all the data. And we also take parallel into account for hash plan’s total rows if it’s parallel aware. ``` if (best_path->jpath.path.parallel_aware) { hash_plan->plan.parallel_aware = true; hash_plan->rows_total = best_path->inner_rows_total; } ```
But the Parallel Hash node of plan shows the same rows with subplan, I’m wandering if it’s more reasonable to show rows_total instead of plan_rows for Parallel Hash nodes?
For this example, -> Parallel Hash (rows=20000) -> Parallel Seq Scan on simple s (rows=8333)