Обсуждение: wal segment size
Hi,
I see very little advice on tuning WAL segment size.
One of my clients has a few datawarehouses at around 8 - 16 TB
On one of the nodes there are approx 15000 WAL segments of 16MB each, totalling approx 230GB. The archiver is archiving approx one per second, so approx 4 hours to clear.
Would we gain anything by bumping the WAL segment size?
Thanks,
Colin
On Wed, 2025-12-17 at 16:13 +0100, Colin 't Hart wrote: > I see very little advice on tuning WAL segment size. > > One of my clients has a few datawarehouses at around 8 - 16 TB > > On one of the nodes there are approx 15000 WAL segments of 16MB each, totalling > approx 230GB. The archiver is archiving approx one per second, so approx 4 hours to clear. > > Would we gain anything by bumping the WAL segment size? Very likely yes, if the problem is the overhead of starting the archive_command. Another thing that can slow down archiving is if you compress these segments too aggressively. Yours, Laurenz Albe
On Wed, Dec 17, 2025 at 10:13 AM Colin 't Hart <colinthart@gmail.com> wrote:
Hi,I see very little advice on tuning WAL segment size.One of my clients has a few datawarehouses at around 8 - 16 TBOn one of the nodes there are approx 15000 WAL segments of 16MB each, totalling approx 230GB. The archiver is archiving approx one per second, so approx 4 hours to clear.
One second to archive 16 measly MB seems really darned slow. Another vote to that problem.
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
Thanks Laurenz, that confirms what I was assuming. Archiving is via pgbackrest to a backup server, over SSH. Approx 750ms to archive each segment is crazy -- I'll check compression parameters too.
Any reason not to bump it up to 1GB? Or is that overkill?
/Colin
On Wed, 17 Dec 2025 at 16:25, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Wed, 2025-12-17 at 16:13 +0100, Colin 't Hart wrote:
> I see very little advice on tuning WAL segment size.
>
> One of my clients has a few datawarehouses at around 8 - 16 TB
>
> On one of the nodes there are approx 15000 WAL segments of 16MB each, totalling
> approx 230GB. The archiver is archiving approx one per second, so approx 4 hours to clear.
>
> Would we gain anything by bumping the WAL segment size?
Very likely yes, if the problem is the overhead of starting the archive_command.
Another thing that can slow down archiving is if you compress these segments
too aggressively.
Yours,
Laurenz Albe
On 12/17/25 08:10, Colin 't Hart wrote: > Thanks Laurenz, that confirms what I was assuming. Archiving is via > pgbackrest to a backup server, over SSH. Approx 750ms to archive each > segment is crazy -- I'll check compression parameters too. How much of that time is network travel? What are the configuration settings for the archiving portion of pgBackRest? > > Any reason not to bump it up to 1GB? Or is that overkill? > > /Colin > > On Wed, 17 Dec 2025 at 16:25, Laurenz Albe <laurenz.albe@cybertec.at > <mailto:laurenz.albe@cybertec.at>> wrote: > > On Wed, 2025-12-17 at 16:13 +0100, Colin 't Hart wrote: > > I see very little advice on tuning WAL segment size. > > > > One of my clients has a few datawarehouses at around 8 - 16 TB > > > > On one of the nodes there are approx 15000 WAL segments of 16MB > each, totalling > > approx 230GB. The archiver is archiving approx one per second, so > approx 4 hours to clear. > > > > Would we gain anything by bumping the WAL segment size? > > Very likely yes, if the problem is the overhead of starting the > archive_command. > > Another thing that can slow down archiving is if you compress these > segments > too aggressively. > > Yours, > Laurenz Albe > -- Adrian Klaver adrian.klaver@aklaver.com
On Wed, Dec 17, 2025 at 11:10 AM Colin 't Hart <colinthart@gmail.com> wrote:
Thanks Laurenz, that confirms what I was assuming. Archiving is via pgbackrest to a backup server, over SSH. Approx 750ms to archive each segment is crazy -- I'll check compression parameters too.
Switch to archive-async = on. When doing that, the typical time drops to 10ms or less. Also use a compress-type of lz4 or zst, which perform way better than the default gz. If you are encrypting, that's a bottleneck you just have to deal with, no shortcuts there. :)
tl;dr try other things before messing with the WAL size. The current size can work very well even on very large and very, very busy systems.
Cheers,
Greg
--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support
Adding this to ~/.ssh/config will almost totally eliminate the cost of ssh authentication:
Host *
ControlMaster auto
#ControlPath /run/user/%i/%L_%r_at_%n:%p
ControlPath ~/.ssh/%L_%r_at_%n:%p
ControlPersist 5m
ControlMaster auto
#ControlPath /run/user/%i/%L_%r_at_%n:%p
ControlPath ~/.ssh/%L_%r_at_%n:%p
ControlPersist 5m
Pointing ControlPath to /run/user/%i is even faster, but it doesn't always exist if you sudo into the postgres service account.
On Wed, Dec 17, 2025 at 11:10 AM Colin 't Hart <colinthart@gmail.com> wrote:
Thanks Laurenz, that confirms what I was assuming. Archiving is via pgbackrest to a backup server, over SSH. Approx 750ms to archive each segment is crazy -- I'll check compression parameters too.Any reason not to bump it up to 1GB? Or is that overkill?/ColinOn Wed, 17 Dec 2025 at 16:25, Laurenz Albe <laurenz.albe@cybertec.at> wrote:On Wed, 2025-12-17 at 16:13 +0100, Colin 't Hart wrote:
> I see very little advice on tuning WAL segment size.
>
> One of my clients has a few datawarehouses at around 8 - 16 TB
>
> On one of the nodes there are approx 15000 WAL segments of 16MB each, totalling
> approx 230GB. The archiver is archiving approx one per second, so approx 4 hours to clear.
>
> Would we gain anything by bumping the WAL segment size?
Very likely yes, if the problem is the overhead of starting the archive_command.
Another thing that can slow down archiving is if you compress these segments
too aggressively.
Yours,
Laurenz Albe
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
On Wed, 2025-12-17 at 12:21 -0500, Greg Sabino Mullane wrote: > On Wed, Dec 17, 2025 at 11:10 AM Colin 't Hart <colinthart@gmail.com> wrote: > > Thanks Laurenz, that confirms what I was assuming. Archiving is via pgbackrest > > to a backup server, over SSH. Approx 750ms to archive each segment is crazy -- > > I'll check compression parameters too. > > Switch to archive-async = on. When doing that, the typical time drops to 10ms or less. > Also use a compress-type of lz4 or zst, which perform way better than the default gz. > If you are encrypting, that's a bottleneck you just have to deal with, no shortcuts there. :) I second that. Asynchronous archiving in pgBackRest tends to work around the problem. > tl;dr try other things before messing with the WAL size. The current size can work very > well even on very large and very, very busy systems. On the other hand, 16MB on a very busy system is somewhat ridiculous. A somewhat bigger segment size may be appropriate. Yours, Laurenz Albe
As an oracle dba new to Postgres, I’m used to the concept of context switches and latch issues with regards to transactionlog switches. Does Postgres have a similar mechanism with latching etc when it switches to a new wal segment thatis alleviated when increasing the size of the wal segments? Regards Andrew Sent from my iPhone > On 17 Dec 2025, at 18:58, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Wed, 2025-12-17 at 12:21 -0500, Greg Sabino Mullane wrote: >>> On Wed, Dec 17, 2025 at 11:10 AM Colin 't Hart <colinthart@gmail.com> wrote: >>> Thanks Laurenz, that confirms what I was assuming. Archiving is via pgbackrest >>> to a backup server, over SSH. Approx 750ms to archive each segment is crazy -- >>> I'll check compression parameters too. >> >> Switch to archive-async = on. When doing that, the typical time drops to 10ms or less. >> Also use a compress-type of lz4 or zst, which perform way better than the default gz. >> If you are encrypting, that's a bottleneck you just have to deal with, no shortcuts there. :) > > I second that. Asynchronous archiving in pgBackRest tends to work around the problem. > >> tl;dr try other things before messing with the WAL size. The current size can work very >> well even on very large and very, very busy systems. > > On the other hand, 16MB on a very busy system is somewhat ridiculous. > A somewhat bigger segment size may be appropriate. > > Yours, > Laurenz Albe > >
On Fri, 2025-12-19 at 09:26 +0100, Andrew wrote: > As an oracle dba new to Postgres, I’m used to the concept of context switches and latch issues > with regards to transaction log switches. Does Postgres have a similar mechanism with latching > etc when it switches to a new wal segment that is alleviated when increasing the size of the > wal segments? Not really. PostgreSQL doesn't reuse WAL segments in a circular fashion like Oracle does. At the end of a checkpoint, it creates new, empty WAL segments for future use, so if there is a need to switch to a new segment, there is no need to wait for anything. Yours, Laurenz Albe
What's the behaviour when pg_resetwal is used to change the WAL segment size?
This note is worrying to me:
--
While
pg_resetwal will set the WAL starting address beyond the latest existing WAL segment file, some segment size changes can cause previous WAL file names to be reused. It is recommended to use -l together with this option to manually set the WAL starting address if WAL file name overlap will cause problems with your archiving strategy.--
Why can a segment size change cause previous WAL file names to be reused?
Do we need to take a new backup immediately after changing the WAL segment size?
Thanks,
Colin
On Fri, 19 Dec 2025 at 14:09, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Fri, 2025-12-19 at 09:26 +0100, Andrew wrote:
> As an oracle dba new to Postgres, I’m used to the concept of context switches and latch issues
> with regards to transaction log switches. Does Postgres have a similar mechanism with latching
> etc when it switches to a new wal segment that is alleviated when increasing the size of the
> wal segments?
Not really. PostgreSQL doesn't reuse WAL segments in a circular fashion like Oracle does.
At the end of a checkpoint, it creates new, empty WAL segments for future use, so if there
is a need to switch to a new segment, there is no need to wait for anything.
Yours,
Laurenz Albe
On Fri, 2025-12-19 at 14:25 +0100, Colin 't Hart wrote: > What's the behaviour when pg_resetwal is used to change the WAL segment size? > > This note is worrying to me: > -- > While pg_resetwal will set the WAL starting address beyond the latest existing WAL segment file, some segment size changescan cause previous WAL file names to be reused. It is recommended to use -l together with this option to manuallyset the WAL starting address if WAL file name overlap will cause problems with your archiving strategy. > -- > Why can a segment size change cause previous WAL file names to be reused? > > Do we need to take a new backup immediately after changing the WAL segment size? I think that is supposed to mean that the new WAL numbering scheme might produce the same WAL segment name as a WAL segment name had long ago, so you might overwrite that earlier segment in the WAL archive, which could prevent you from recovering from an old backup that needs the overwritten WAl segment to recover. I'm not sure how likely that is to happen. It never harms to run an extra backup. The main thing is that you shut down PostgreSQL cleanly before running "pg_resetwal" and that you only change the WAL segment size, nothing else. Yours, Laurenz Albe