Re: refactoring basebackup.c

Поиск

Список

Период

Сортировка

От	Dipesh Pandit
Тема	Re: refactoring basebackup.c
Дата	4 марта 2022 г. 08:31:50
Msg-id	CAN1g5_H+8ptP9NLzAn1Qx1tu=0Y6Ohp_v5xgYJbpWBC7CwCL8Q@mail.gmail.com обсуждение исходный текст
Ответ на	Re: refactoring basebackup.c (Jeevan Ladhe <jeevanladhe.os@gmail.com>)
Ответы	walmethods.c is kind of a mess (was Re: refactoring basebackup.c)
Список	pgsql-hackers

Дерево обсуждения

Hi,

> > It will be good if we can also fix

> > CreateWalTarMethod to support LZ4 and ZSTD.

> Ok we will see, either Dipesh or I will take care of it.

I took a look at the CreateWalTarMethod to support LZ4 compression

for WAL files. The current implementation involves a 3 step to backup

a WAL file to a tar archive. For each file:

It first writes the header in the function tar_open_for_write, flushes the contents of tar to disk and stores the header offset.
Next, the contents of WAL are written to the tar archive.
In the end, it recalculates the checksum in function tar_close() and overwrites the header at an offset stored in step #1.

The need for overwriting header in CreateWalTarMethod is mainly related to

partial WAL files where the size of the WAL file < WalSegSize. The file is being

padded and checksum is recalculated after adding pad bytes.

If we go ahead and implement LZ4 support for CreateWalTarMethod then

we have a problem here at step #3. In order to achieve better compression

ratio, compressed LZ4 blocks are linked to each other and these blocks

are decoded sequentially. If we overwrite the header as part of step #3 then

it corrupts the link between compressed LZ4 blocks. Although LZ4 provides

an option to write the compressed block independently (using blockMode

option set to LZ4F_blockIndepedent) but it is still a problem because we don't

know if overwriting the header after recalculating the checksum will not overlap

the boundary of the next block.

GZIP manages to overcome this problem as it provides an option to turn on/off

compression on the fly while writing a compressed archive with the help of zlib

library function deflateParams(). The current gzip implementation for

CreateWalTarMethod uses this library function to turn off compression just before

step #1 and it writes the uncompressed header of size equal to TAR_BLOCK_SIZE.

It uses the same library function to turn on the compression for writing the contents

of the WAL file as part of step #2. It again turns off the compression just before step

#3 to overwrite the header. The header is overwritten at the same offset with size

equal to TAR_BLOCK_SIZE.

Since GZIP provides this option to enable/disable compression, it is possible to

control the size of data we are writing to a compressed archive. Even if we overwrite

an already written block in a compressed archive there is no risk of it overlapping

with the boundary of the next block. This mechanism is not available in LZ4 and ZSTD.

In order to support LZ4 and ZSTD compression for CreateWalTarMethod we may

need to refactor this code unless I am missing something. We need to somehow

add the padding bytes in case of partial WAL before we send it to the compressed

archive. This will make sure that all files which are being compressed does not

require any padding as the size is always equal to WalSegSize. There is no need to

recalculate the checksum and we can avoid overwriting the header as part of

step #3.

Thoughts?

Thanks,

Dipesh

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: refactoring basebackup.c