Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)

Поиск

Список

Период

Сортировка

От	Peter Geoghegan
Тема	Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)
Дата	6 мая 2017 г. 00:42:32
Msg-id	CAH2-Wzkhso9LTHKHW+KxZt=CEV1=T0wHpptkSz29oJcpDs02UQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation) (Robert Haas <robertmhaas@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Fri, May 5, 2017 at 12:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> One idea that crossed my mind is to just have workers write all of
> their output tuples to a temp file and have the leader read them back
> in.  At some cost in I/O, this would completely eliminate the overhead
> of workers waiting for the leader.  In some cases, it might be worth
> it.  At the least, it could be interesting to try a prototype
> implementation of this with different queries (TPC-H, maybe) and see
> what happens.  It would give us some idea how much of a problem
> stalling on the leader is in practice.  Wait event monitoring could
> possibly also be used to figure out an answer to that question.

The use of temp files in all cases was effective in my parallel
external sort patch, relative to what I imagine an approach built on a
gather node would get you, but not because of the inherent slowness of
a Gather node. I'm not so sure that Gather is actually inherently
slow, given the interface it supports.

While incremental, retail processing of each tuple is flexible and
composable, it will tend to be slow compared to an approach based on
batch processing (for tasks where you happen to be able to get away
with batch processing). This is true for all the usual reasons --
better locality of access, better branch prediction properties, lower
"effective instruction count" due to having very tight inner loops,
and so on.

I agree with Andres that we shouldn't put too much effort into
modelling concurrency ahead of optimizing serial performance. The
machine's *aggregate* memory bandwidth should be used as efficiently
as possible, and parallelism is just one (very important) tool for
making that happen.

-- 
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)