Re: Parallel Seq Scan
От | Haribabu Kommi |
---|---|
Тема | Re: Parallel Seq Scan |
Дата | |
Msg-id | CAJrrPGd28BLMhD_yQTWdRcap8TW_Nf=yJKEJF+RS3GWRm0cfrQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Parallel Seq Scan (Amit Kapila <amit.kapila16@gmail.com>) |
Список | pgsql-hackers |
On Sat, Feb 21, 2015 at 12:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Wed, Feb 18, 2015 at 6:44 PM, Andres Freund <andres@2ndquadrant.com> > wrote: >> On 2015-02-18 16:59:26 +0530, Amit Kapila wrote: >> >> > There could be some cases where it could be beneficial for worker >> > to process a sub-tree, but I think there will be more cases where >> > it will just work on a part of node and send the result back to either >> > master backend or another worker for further processing. >> >> I think many parallelism projects start out that way, and then notice >> that it doesn't parallelize very efficiently. >> >> The most extreme example, but common, is aggregation over large amounts >> of data - unless you want to ship huge amounts of data between processes >> eto parallize it you have to do the sequential scan and the >> pre-aggregate step (that e.g. selects count() and sum() to implement a >> avg over all the workers) inside one worker. >> > > OTOH if someone wants to parallelize scan (including expensive qual) and > sort then it will be better to perform scan (or part of scan by one worker) > and sort by other worker. There exists a performance problem if we perform SCAN in one worker and SORT operation in another worker, because there is a need of twice tuple transfer between worker to worker/backend. This is a costly operation. It is better to combine SCAN and SORT operation into a one worker job. This can be targeted once the parallel scan code is stable. Regards, Hari Babu Fujitsu Australia
В списке pgsql-hackers по дате отправления: