Re: Re: parallel distinct union and aggregate support patch
От | bucoo@sohu.com |
---|---|
Тема | Re: Re: parallel distinct union and aggregate support patch |
Дата | |
Msg-id | 2020102311203106263310@sohu.com обсуждение исходный текст |
Ответ на | parallel distinct union and aggregate support patch ("bucoo@sohu.com" <bucoo@sohu.com>) |
Список | pgsql-hackers |
> If I understood correctly, the tuples emitted by Parallel Batch Sort
> in each process are ordered by (hash(key, ...) % npartitions, key,
> ...), but the path is claiming to be ordered by (key, ...), no?
> That's enough for Unique and Aggregate to give the correct answer,
> because they really only require equal keys to be consecutive (and in
> the same process), but maybe some other plan could break?
The path not claiming to be ordered by (key, ...), the path save PathKey(s) in BatchSortPath::batchkeys, not Path::pathkeys.
I don't understand "but maybe some other plan could break", mean some on path using this path? no, BathSortPath on for some special path(Unique, GroupAgg ...).
bucoo@sohu.com
From: Thomas MunroDate: 2020-10-21 12:27To: bucoo@sohu.comCC: pgsql-hackersSubject: Re: parallel distinct union and aggregate support patchOn Tue, Oct 20, 2020 at 3:49 AM bucoo@sohu.com <bucoo@sohu.com> wrote:> I write a path for soupport parallel distinct, union and aggregate using batch sort.> steps:> 1. generate hash value for group clauses values, and using mod hash value save to batch> 2. end of outer plan, wait all other workers finish write to batch> 3. echo worker get a unique batch number, call tuplesort_performsort() function finish this batch sort> 4. return row for this batch> 5. if not end of all batchs, got step 3>> BatchSort paln make sure same tuple(group clause) return in same range, so Unique(or GroupAggregate) plan can work.Hi!Interesting work! In the past a few people have speculated about aParallel Repartition operator that could partition tuples a bit likethis, so that each process gets a different set of partitions. Hereyou combine that with a sort. By doing both things in one node, youavoid a lot of overheads (writing into a tuplestore once in therepartitioning node, and then once again in the sort node, with tuplesbeing copied one-by-one between the two nodes).If I understood correctly, the tuples emitted by Parallel Batch Sortin each process are ordered by (hash(key, ...) % npartitions, key,...), but the path is claiming to be ordered by (key, ...), no?That's enough for Unique and Aggregate to give the correct answer,because they really only require equal keys to be consecutive (and inthe same process), but maybe some other plan could break?
В списке pgsql-hackers по дате отправления: