Обсуждение: explosive WAL growth

Поиск
Список
Период
Сортировка

explosive WAL growth

От
Scott Ribe
Дата:
Something is causing our WAL to grow 160GB/hour *faster* than archiving. (Archiving appears to be working normally.)
Thisstarted in the past couple of days. 

I am having some trouble finding the cause of this. I am looking at pg_stat_statements, # calls, time, shared blocks
written.I am also looking at recent client app deployments. 

My next step might be to use something like pg_waldump to see what's in this WAL.

Any suggestions?

--
Scott Ribe
scott_ribe@elevated-dev.com
https://www.linkedin.com/in/scottribe/






Re: explosive WAL growth

От
Alvaro Herrera
Дата:
On 2024-Feb-27, Scott Ribe wrote:

> Something is causing our WAL to grow 160GB/hour *faster* than archiving. (Archiving appears to be working normally.)
Thisstarted in the past couple of days.
 
> 
> I am having some trouble finding the cause of this. I am looking at pg_stat_statements, # calls, time, shared blocks
written.I am also looking at recent client app deployments.
 

It could be autovacuum processing large tables that haven't been
processed in a long time, in emergency mode.  This happens if you turn
it off and do nothing to substitute it.

Some backup tools implement parallel archiving of WAL files, which helps
you deal with massive accumulation of them.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/



Re: explosive WAL growth

От
Lucio Chiessi
Дата:
Has any autovacuum, vacuum, or reindex occurred?
These operations generate lots of WAL depending on the size of the data related.

Lucio Chiessi

Senior Database Administrator

Trustly, Inc.

M: +55 27 996360276

  

    



On Tue, Feb 27, 2024 at 3:10 PM Scott Ribe <scott_ribe@elevated-dev.com> wrote:
Something is causing our WAL to grow 160GB/hour *faster* than archiving. (Archiving appears to be working normally.) This started in the past couple of days.

I am having some trouble finding the cause of this. I am looking at pg_stat_statements, # calls, time, shared blocks written. I am also looking at recent client app deployments.

My next step might be to use something like pg_waldump to see what's in this WAL.

Any suggestions?

--
Scott Ribe
scott_ribe@elevated-dev.com
https://www.linkedin.com/in/scottribe/






Please read our privacy policy here on how we process your personal data in accordance with the General Data Protection Regulation (EU) 2016/679 (the “GDPR”) and other applicable data protection legislation.

Re: explosive WAL growth

От
Scott Ribe
Дата:
Found it. Kubernetes cronjob task.

> On Feb 27, 2024, at 11:22 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> It could be autovacuum processing large tables that haven't been
> processed in a long time, in emergency mode.  This happens if you turn
> it off and do nothing to substitute it.

We have autovacuum on, and pretty well tuned.

> Some backup tools implement parallel archiving of WAL files, which helps
> you deal with massive accumulation of them.

We'll get that in an upgrade we're getting ready to deploy. Ironically, it might have just hidden this problem ;-)


Re: explosive WAL growth

От
Karthik Yellapragada
Дата:
Just out of curiosity, what was the job doing?

On Tue, Feb 27, 2024 at 10:49 AM Scott Ribe <scott_ribe@elevated-dev.com> wrote:
Found it. Kubernetes cronjob task.

> On Feb 27, 2024, at 11:22 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> It could be autovacuum processing large tables that haven't been
> processed in a long time, in emergency mode.  This happens if you turn
> it off and do nothing to substitute it.

We have autovacuum on, and pretty well tuned.

> Some backup tools implement parallel archiving of WAL files, which helps
> you deal with massive accumulation of them.

We'll get that in an upgrade we're getting ready to deploy. Ironically, it might have just hidden this problem ;-)

--
Thanks & Regards
Karthik Yellapragada
+1 860 830 5235

Re: explosive WAL growth

От
Scott Ribe
Дата:
> On Feb 27, 2024, at 11:58 AM, Karthik Yellapragada <karthik.yellapragada@gmail.com> wrote:
>
> Just out of curiosity, what was the job doing?

Large batch upsert. But something is wrong for it to be generating WAL that quickly. Having identified it, now we can
drilldown to see what's wrong with it. 


Re: explosive WAL growth

От
vignesh kumar
Дата:
Check the Wal_keep_size or wal_keep_segments if they are too huge occupying the wal space.

Take a waldump and see what's really going on. Usually the blob will have wals split into multiple chunks to accomodate the filemaps. 

Also use slot based replication so that primary need not maintain all the wals .. instead it maintains only wals required for the slot to have it replicated.

Check the min wal size and max wal size..

If none of them helps.. your application is shooting up load.


From: Scott Ribe <scott_ribe@elevated-dev.com>
Sent: Tuesday, February 27, 2024 11:39:58 PM
To: Pgsql-admin <pgsql-admin@lists.postgresql.org>
Subject: explosive WAL growth
 
Something is causing our WAL to grow 160GB/hour *faster* than archiving. (Archiving appears to be working normally.) This started in the past couple of days.

I am having some trouble finding the cause of this. I am looking at pg_stat_statements, # calls, time, shared blocks written. I am also looking at recent client app deployments.

My next step might be to use something like pg_waldump to see what's in this WAL.

Any suggestions?

--
Scott Ribe
scott_ribe@elevated-dev.com
https://www.linkedin.com/in/scottribe/