Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)
От | Peter Geoghegan |
---|---|
Тема | Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation) |
Дата | |
Msg-id | CAH2-Wzkhso9LTHKHW+KxZt=CEV1=T0wHpptkSz29oJcpDs02UQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation) (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
On Fri, May 5, 2017 at 12:40 PM, Robert Haas <robertmhaas@gmail.com> wrote: > One idea that crossed my mind is to just have workers write all of > their output tuples to a temp file and have the leader read them back > in. At some cost in I/O, this would completely eliminate the overhead > of workers waiting for the leader. In some cases, it might be worth > it. At the least, it could be interesting to try a prototype > implementation of this with different queries (TPC-H, maybe) and see > what happens. It would give us some idea how much of a problem > stalling on the leader is in practice. Wait event monitoring could > possibly also be used to figure out an answer to that question. The use of temp files in all cases was effective in my parallel external sort patch, relative to what I imagine an approach built on a gather node would get you, but not because of the inherent slowness of a Gather node. I'm not so sure that Gather is actually inherently slow, given the interface it supports. While incremental, retail processing of each tuple is flexible and composable, it will tend to be slow compared to an approach based on batch processing (for tasks where you happen to be able to get away with batch processing). This is true for all the usual reasons -- better locality of access, better branch prediction properties, lower "effective instruction count" due to having very tight inner loops, and so on. I agree with Andres that we shouldn't put too much effort into modelling concurrency ahead of optimizing serial performance. The machine's *aggregate* memory bandwidth should be used as efficiently as possible, and parallelism is just one (very important) tool for making that happen. -- Peter Geoghegan VMware vCenter Server https://www.vmware.com/
В списке pgsql-hackers по дате отправления: