Обсуждение: Making WAL archiving faster — multi-file support and async ideas

Поиск
Список
Период
Сортировка

Making WAL archiving faster — multi-file support and async ideas

От
Stepan Neretin
Дата:

Hi hackers,

We’ve been thinking about how to make WAL archiving faster.

This topic was previously discussed in [1], and we’ve taken a first step by implementing the attached patch, which adds support for archiving multiple WAL files in one go.

The idea is straightforward: instead of invoking the archive command or callback once per WAL file, we allow passing a batch of files. The patch introduces support for new placeholders:

  • %F – list of WAL file names

  • %P – list of their full paths

  • %N – number of files

Since PostgreSQL already reads multiple files into memory and caches them before archiving, this change avoids repeated fork() calls and reduces overhead in high-throughput setups.

Of course, there are trade-offs. After discussing with Andrey Borodin, we noted that if even one file in the batch fails to archive, we currently have to retry the whole batch. While it’s technically possible to return a list of successfully archived files, that would complicate the API and introduce messy edge cases.

So we’re also exploring a more flexible idea: an asynchronous archiver mode.

The idea is to have PostgreSQL write WAL file names (marked .ready) into a FIFO or pipe, and let an archive process or library asynchronously consume and archive them. It would send back confirmations (or failures) through another pipe, allowing PostgreSQL to retry failed files as needed. This could decouple archiving from the archiver loop and open the door to more efficient and parallel implementations.

We’d appreciate feedback on both directions:

  • Do you think the idea in the attached patch — batching WAL files for archiving — is viable? Is it something worth pursuing?

  • What do you think about the async archiver concept? Would it fit PostgreSQL’s architecture and operational expectations?

Thanks,
Stepan Neretin

[1] https://www.postgresql.org/message-id/flat/BC335D75-105B-403F-9473-976C8BBC32E3%40yandex-team.ru#d45caa9d1075734567164f73371baf00

Вложения

Re: Making WAL archiving faster — multi-file support and async ideas

От
Daniil Davydov
Дата:
Hi,

On Tue, Jul 29, 2025 at 3:56 PM Stepan Neretin <slpmcf@gmail.com> wrote:
>
> We’ve been thinking about how to make WAL archiving faster.
> This topic was previously discussed in [1], and we’ve taken a first step by implementing the attached patch, which
addssupport for archiving multiple WAL files in one go. 
> The idea is straightforward: instead of invoking the archive command or callback once per WAL file, we allow passing
abatch of files. 
>

Thanks for the patch!
My first comments are related to code style and naming. I believe that
fixing them will make the review process more convenient.

1)
Please, try to make your diff with vanilla code as small as possible.
If you believe that some formatting of the code "around" your solution
can be made better, you can provide a different .patch file, containing
such corrections.

2)
You have added "archive_files_cb". Maybe it should be renamed to
something like "archive_multiple_files_cb"? During review, I regularly
messed up these callbacks and functions with the same
naming (...file and ...files).

3)
shell_archive.c : I think that we should place "shell_archive_file" code
over "run_archive_command" code. Again, it will noticeably reduce diff.

4)
pgarch.c : definition of ArchiveXlogArg and ArchiveFilesArg structures
and ArchiveCallbackFn type definition should be placed in the head of the file.

5)
"pgarch_ArchiverCopyLoopMulti" function :
I guess that "ArchiveCallbacks->check_configured_cb != NULL" should be
placed at the beginning of the while loop (by analogy with
pgarch_ArchiverCopyLoop).

6)
Please, run pgindent on your patch.

--
Best regards,
Daniil Davydov



Re: Making WAL archiving faster — multi-file support and async ideas

От
Alyona Vinter
Дата:
Hi,
I have some concerns about the parallel archiver due to the requirement for sequential WAL archiving. The fundamental rule is that WAL segments must be archived in strict sequential order for a successful restore. Consider a scenario where PostgreSQL has segments 1, 2, 3, and 4 ready, and the parallel archiver successfully copies segments 1, 2, and 4, but misses segment 3. The user might be unaware of the gap and could attempt a restore using the incomplete archive. While we hope this would cause a clear error during recovery, there is a risk that partial application of non-sequential segments might lead to silent corruption or other unforeseen issues.
 
As far as I know, tools like cp naturally process files in the order they are received. This ensures that files are processed correctly, and it seems easier and more reliable to build on this concept than to introduce a new parallel paradigm. Given that the archiver already uses a priority queue, this should not be difficult to implement.

Thanks for considering my feedback.
--
Best regards,
Alyona Vinter

On Mon, 25 Aug 2025 at 11:00, Stepan Neretin <slpmcf@gmail.com> wrote:

Hi hackers,

We’ve been thinking about how to make WAL archiving faster.

This topic was previously discussed in [1], and we’ve taken a first step by implementing the attached patch, which adds support for archiving multiple WAL files in one go.

The idea is straightforward: instead of invoking the archive command or callback once per WAL file, we allow passing a batch of files. The patch introduces support for new placeholders:

  • %F – list of WAL file names

  • %P – list of their full paths

  • %N – number of files

Since PostgreSQL already reads multiple files into memory and caches them before archiving, this change avoids repeated fork() calls and reduces overhead in high-throughput setups.

Of course, there are trade-offs. After discussing with Andrey Borodin, we noted that if even one file in the batch fails to archive, we currently have to retry the whole batch. While it’s technically possible to return a list of successfully archived files, that would complicate the API and introduce messy edge cases.

So we’re also exploring a more flexible idea: an asynchronous archiver mode.

The idea is to have PostgreSQL write WAL file names (marked .ready) into a FIFO or pipe, and let an archive process or library asynchronously consume and archive them. It would send back confirmations (or failures) through another pipe, allowing PostgreSQL to retry failed files as needed. This could decouple archiving from the archiver loop and open the door to more efficient and parallel implementations.

We’d appreciate feedback on both directions:

  • Do you think the idea in the attached patch — batching WAL files for archiving — is viable? Is it something worth pursuing?

  • What do you think about the async archiver concept? Would it fit PostgreSQL’s architecture and operational expectations?

Thanks,
Stepan Neretin

[1] https://www.postgresql.org/message-id/flat/BC335D75-105B-403F-9473-976C8BBC32E3%40yandex-team.ru#d45caa9d1075734567164f73371baf00

Re: Making WAL archiving faster — multi-file support and async ideas

От
Greg Sabino Mullane
Дата:
On Mon, Aug 25, 2025 at 4:31 AM Alyona Vinter <dlaaren8@gmail.com> wrote:
... could attempt a restore using the incomplete archive. While we hope this would cause a clear error during recovery, there is a risk that partial application of non-sequential segments might lead to silent corruption or other unforeseen issues.

Can you expand on how that could happen? Postgres knows the name of the next WAL to look for, so it's not going to ever jump over a missing file.

Cheers,
Greg

--
Enterprise Postgres Software Products & Tech Support

Re: Making WAL archiving faster — multi-file support and async ideas

От
Alyona Vinter
Дата:
Hi Greg!

Thanks for your question — it made me take a closer look at the recovery process. You're absolutely right, and I appreciate you pointing that out.
Postgres requests history files from the archive, which helps determine whether to wait for the next segment or if the timeline is finished. If Postgres detects that the segment isn't in the archive yet, it simply waits for it to appear. Let me know if I’ve missed anything here.
Then I see no fundamental problem with the parallel archiver =)

Best wishes,
Alyona Vinter