Re: WIP/PoC for parallel backup

Поиск
Список
Период
Сортировка
От Asif Rehman
Тема Re: WIP/PoC for parallel backup
Дата
Msg-id CADM=JehpCq0sdXbk0NuesBE8mxnpyWSW07D=yGoUPOcVAgb=iA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WIP/PoC for parallel backup  (Rushabh Lathia <rushabh.lathia@gmail.com>)
Ответы Re: WIP/PoC for parallel backup  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers


On Mon, Oct 7, 2019 at 1:52 PM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
Thanks Asif for the patch.  I am opting this for a review.  Patch is
bit big, so here are very initial comments to make the review process
easier.

Thanks Rushabh for reviewing the patch.


1) Patch seems doing lot of code shuffling, I think it would be easy
to review if you can break the clean up patch separately.

Example:
a: setup_throttle
b: include_wal_files

2) As I can see this patch basically have three major phase.

a) Introducing new commands like START_BACKUP, SEND_FILES_CONTENT and
STOP_BACKUP.
b) Implementation of actual parallel backup.
c) Testcase

I would suggest, if you can break out in three as a separate patch that
would be nice.  It will benefit in reviewing the patch.

Sure, why not. I will break them into multiple patches.
 

3) In your patch you are preparing the backup manifest (file which
giving the information about the data files). Robert Haas, submitted
the backup manifests patch on another thread [1], and I think we
should use that patch to get the backup manifests for parallel backup.

Sure. Though the backup manifest patch calculates and includes the checksum of backup files and is done
while the file is being transferred to the frontend-end. The manifest file itself is copied at the
very end of the backup. In parallel backup, I need the list of filenames before file contents are transferred, in
order to divide them into multiple workers. For that, the manifest file has to be available when START_BACKUP
 is called. 

That means, backup manifest should support its creation while excluding the checksum during START_BACKUP().
I also need the directory information as well for two reasons:

- In plain format, base path has to exist before we can write the file. we can extract the base path from the file
but doing that for all files does not seem a good idea.
- base backup does not include the content of some directories but those directories although empty, are still
expected in PGDATA.

I can make these changes part of parallel backup (which would be on top of backup manifest patch) or
these changes can be done as part of manifest patch and then parallel can use them. 

Robert what do you suggest?


-- 
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexey Kondratov
Дата:
Сообщение: Re: Two pg_rewind patches (auto generate recovery conf and ensureclean shutdown)
Следующее
От: Robert Haas
Дата:
Сообщение: Re: stress test for parallel workers