Re: design for parallel backup

Поиск

Список

Период

Сортировка

От	Amit Kapila
Тема	Re: design for parallel backup
Дата	21 апреля 2020 г. 04:50:01
Msg-id	CAA4eK1LQuN6LCaiHduaF0L47DUQ2e5cL_QNMytPRVUNidOA0dw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: design for parallel backup (Andres Freund <andres@anarazel.de>)
Ответы	Re: design for parallel backup
Список	pgsql-hackers

Дерево обсуждения

On Tue, Apr 21, 2020 at 2:40 AM Andres Freund <andres@anarazel.de> wrote:
>
> On 2020-04-20 16:36:16 -0400, Robert Haas wrote:
>
> > If a backup client - either current or hypothetical - is compressing
> > and encrypting, then it doesn't have either a filesystem I/O or a
> > network I/O in progress while it's doing so. You take not only the hit
> > of the time required for compression and/or encryption, but also use
> > that much less of the available network and/or I/O capacity.
>
> I don't think it's really the time for network/file I/O that's the
> issue. Sure memcpy()'ing from the kernel takes time, but compared to
> encryption/compression it's not that much.  Especially for compression,
> it's not really lack of cycles for networking that prevent a higher
> throughput, it's that after buffering a few MB there's just no point
> buffering more, given compression will plod along with 20-100MB/s.
>

It is quite likely that compression can benefit more from parallelism
as compared to the network I/O as that is mostly a CPU intensive
operation but I am not sure if we can just ignore the benefit of
utilizing the network bandwidth.  In our case, after copying from the
network we do write that data to disk, so during filesystem I/O the
network can be used if there is some other parallel worker processing
other parts of data.

Also, there may be some users who don't want their data to be
compressed due to some reason like the overhead of decompression is so
high that restore takes more time and they are not comfortable with
that as for them faster restore is much more critical then compressed
or fast back up.  So, for such things, the parallelism during backup
as being discussed in this thread will still be helpful.

OTOH, I think without some measurements it is difficult to say that we
have significant benefit by paralysing the backup without compression.
I have scanned the other thread [1] where the patch for parallel
backup was discussed and didn't find any performance numbers, so
probably having some performance data with that patch might give us a
better understanding of introducing parallelism in the backup.

[1] - https://www.postgresql.org/message-id/CADM=JehKgobEknb+_nab9179HzGj=9EiTzWMOd2mpqr_rifm0Q@mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: design for parallel backup