Re: Parallel Seq Scan

Поиск

Список

Период

Сортировка

От	Haribabu Kommi
Тема	Re: Parallel Seq Scan
Дата	20 февраля 2015 г. 21:48:58
Msg-id	CAJrrPGd28BLMhD_yQTWdRcap8TW_Nf=yJKEJF+RS3GWRm0cfrQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Parallel Seq Scan (Amit Kapila <amit.kapila16@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Sat, Feb 21, 2015 at 12:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Feb 18, 2015 at 6:44 PM, Andres Freund <andres@2ndquadrant.com>
> wrote:
>> On 2015-02-18 16:59:26 +0530, Amit Kapila wrote:
>>
>> > There could be some cases where it could be beneficial for worker
>> > to process a sub-tree, but I think there will be more cases where
>> > it will just work on a part of node and send the result back to either
>> > master backend or another worker for further processing.
>>
>> I think many parallelism projects start out that way, and then notice
>> that it doesn't parallelize very efficiently.
>>
>> The most extreme example, but common, is aggregation over large amounts
>> of data - unless you want to ship huge amounts of data between processes
>> eto parallize it you have to do the sequential scan and the
>> pre-aggregate step (that e.g. selects count() and sum() to implement a
>> avg over all the workers) inside one worker.
>>
>
> OTOH if someone wants to parallelize scan (including expensive qual) and
> sort then it will be better to perform scan (or part of scan by one worker)
> and sort by other worker.

There exists a performance problem if we perform SCAN in one worker
and SORT operation in another worker,
because there is a need of twice tuple transfer between worker to
worker/backend. This is a costly operation.
It is better to combine SCAN and SORT operation into a one worker job.
This can be targeted once the parallel scan
code is stable.

Regards,
Hari Babu
Fujitsu Australia

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Parallel Seq Scan