Re: design for parallel backup
От | Amit Kapila |
---|---|
Тема | Re: design for parallel backup |
Дата | |
Msg-id | CAA4eK1LQuN6LCaiHduaF0L47DUQ2e5cL_QNMytPRVUNidOA0dw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: design for parallel backup (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: design for parallel backup
|
Список | pgsql-hackers |
On Tue, Apr 21, 2020 at 2:40 AM Andres Freund <andres@anarazel.de> wrote: > > On 2020-04-20 16:36:16 -0400, Robert Haas wrote: > > > If a backup client - either current or hypothetical - is compressing > > and encrypting, then it doesn't have either a filesystem I/O or a > > network I/O in progress while it's doing so. You take not only the hit > > of the time required for compression and/or encryption, but also use > > that much less of the available network and/or I/O capacity. > > I don't think it's really the time for network/file I/O that's the > issue. Sure memcpy()'ing from the kernel takes time, but compared to > encryption/compression it's not that much. Especially for compression, > it's not really lack of cycles for networking that prevent a higher > throughput, it's that after buffering a few MB there's just no point > buffering more, given compression will plod along with 20-100MB/s. > It is quite likely that compression can benefit more from parallelism as compared to the network I/O as that is mostly a CPU intensive operation but I am not sure if we can just ignore the benefit of utilizing the network bandwidth. In our case, after copying from the network we do write that data to disk, so during filesystem I/O the network can be used if there is some other parallel worker processing other parts of data. Also, there may be some users who don't want their data to be compressed due to some reason like the overhead of decompression is so high that restore takes more time and they are not comfortable with that as for them faster restore is much more critical then compressed or fast back up. So, for such things, the parallelism during backup as being discussed in this thread will still be helpful. OTOH, I think without some measurements it is difficult to say that we have significant benefit by paralysing the backup without compression. I have scanned the other thread [1] where the patch for parallel backup was discussed and didn't find any performance numbers, so probably having some performance data with that patch might give us a better understanding of introducing parallelism in the backup. [1] - https://www.postgresql.org/message-id/CADM=JehKgobEknb+_nab9179HzGj=9EiTzWMOd2mpqr_rifm0Q@mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: