Re: Re: parallel distinct union and aggregate support patch

Поиск

Список

Период

Сортировка

От	bucoo@sohu.com
Тема	Re: Re: parallel distinct union and aggregate support patch
Дата	23 октября 2020 г. 03:20:31
Msg-id	2020102311203106263310@sohu.com обсуждение исходный текст
Ответ на	parallel distinct union and aggregate support patch ("bucoo@sohu.com" <bucoo@sohu.com>)
Список	pgsql-hackers

Дерево обсуждения

> If I understood correctly, the tuples emitted by Parallel Batch Sort

> in each process are ordered by (hash(key, ...) % npartitions, key,

> ...), but the path is claiming to be ordered by (key, ...), no?

> That's enough for Unique and Aggregate to give the correct answer,

> because they really only require equal keys to be consecutive (and in

> the same process), but maybe some other plan could break?

The path not claiming to be ordered by (key, ...), the path save PathKey(s) in BatchSortPath::batchkeys, not Path::pathkeys.

I don't understand "but maybe some other plan could break", mean some on path using this path? no, BathSortPath on for some special path(Unique, GroupAgg ...).

bucoo@sohu.com

From: Thomas Munro
Date: 2020-10-21 12:27
To: bucoo@sohu.com
CC: pgsql-hackers
Subject: Re: parallel distinct union and aggregate support patch
On Tue, Oct 20, 2020 at 3:49 AM bucoo@sohu.com <bucoo@sohu.com> wrote:
> I write a path for soupport parallel distinct, union and aggregate using batch sort.
> steps:
> 1. generate hash value for group clauses values, and using mod hash value save to batch
> 2. end of outer plan, wait all other workers finish write to batch
> 3. echo worker get a unique batch number, call tuplesort_performsort() function finish this batch sort
> 4. return row for this batch
> 5. if not end of all batchs, got step 3
>
> BatchSort paln make sure same tuple(group clause) return in same range, so Unique(or GroupAggregate) plan can work.

Hi!

Interesting work! In the past a few people have speculated about a
Parallel Repartition operator that could partition tuples a bit like
this, so that each process gets a different set of partitions. Here
you combine that with a sort. By doing both things in one node, you
avoid a lot of overheads (writing into a tuplestore once in the
repartitioning node, and then once again in the sort node, with tuples
being copied one-by-one between the two nodes).

If I understood correctly, the tuples emitted by Parallel Batch Sort
in each process are ordered by (hash(key, ...) % npartitions, key,
...), but the path is claiming to be ordered by (key, ...), no?
That's enough for Unique and Aggregate to give the correct answer,
because they really only require equal keys to be consecutive (and in
the same process), but maybe some other plan could break?

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Re: parallel distinct union and aggregate support patch