Обсуждение: pg_xlog no longer rotating out

Поиск
Список
Период
Сортировка

pg_xlog no longer rotating out

От
Mark Steben
Дата:
Good afternoon,
We run postgres 9.4 with hot standby streaming replication.  We have been running abnormally high updates and the pg_xlogs have queued to about 80 percent of disk capacity.  This often happens and the network is not fast enough to handle the transfer of logs to the replicated server.
What I have done in the past, and done again today is to change the archive_command from the script that scp's the logs to '/bin/true'.  This stops the logship and allows the normal pg_xlog rotation to catch up.

This has not happened this time.  Logs have been stored on pg_xlog since June 30 and the number of logs keep climbing.  I am now manually moving some of the logs that were processed on June 30 off to another disk to alleviate the space. PLEASE HELP if you can.

It looks like archiving is still occurring despite '/bin/true' being set.  Can I safely kill -term the archive thread?

Thank you for your prompt response.

--
Mark Steben
 Database Administrator
@utoRevenue | Autobase 
  CRM division of Dominion Dealer Solutions 
95D Ashley Ave.
West Springfield, MA 01089

t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
 www.drivedominion.com





Re: pg_xlog no longer rotating out

От
Alvaro Herrera
Дата:
On 2019-Jul-02, Mark Steben wrote:

> Good afternoon,
> We run postgres 9.4 with hot standby streaming replication.  We have been
> running abnormally high updates and the pg_xlogs have queued to about 80
> percent of disk capacity.  This often happens and the network is not fast
> enough to handle the transfer of logs to the replicated server.
> What I have done in the past, and done again today is to change the
> archive_command from the script that scp's the logs to '/bin/true'.  This
> stops the logship and allows the normal pg_xlog rotation to catch up.

Did you "pg_ctl reload" after changing the config file?

Maybe you have an inactive (or otherwise not advancing) replication
slot?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_xlog no longer rotating out

От
Johannes Truschnigg
Дата:
Hi Mark,

On Tue, Jul 02, 2019 at 04:30:25PM -0400, Mark Steben wrote:
> Good afternoon,
> We run postgres 9.4 with hot standby streaming replication.  We have been
> running abnormally high updates and the pg_xlogs have queued to about 80
> percent of disk capacity.  This often happens and the network is not fast
> enough to handle the transfer of logs to the replicated server.
> What I have done in the past, and done again today is to change the
> archive_command from the script that scp's the logs to '/bin/true'.  This
> stops the logship and allows the normal pg_xlog rotation to catch up.

Postgres cares only about the exit code of your "archive_command", so even
when you're just trying to get rid of WAL segments (without actually moving
them into an archive), it's best to set it to something that gives some kind
of indication that something's actually happening with the WAL. I'd recommend
replacing your `scp` invocation with a roughly equivalent command line,
prefixed with `echo`, so you can check whether or not your archive_command is
executed at all, and which WAL segment it's targeting.

What is your *normal* archive_command setting, with `scp` involved? Do you
have any "special" ssh_config flags in effect for its target host?


> This has not happened this time.  Logs have been stored on pg_xlog since
> June 30 and the number of logs keep climbing.  I am now manually moving
> some of the logs that were processed on June 30 off to another disk to
> alleviate the space. PLEASE HELP if you can.

Manually touching pg_xlog (or pg_wal on newer releases) is not a good idea, as
I'm sure you're aware. If I were in your position, my next steps would be:

1.) Create a number (think maybe five or six) files that are a few hundred
MBytes in size on the filesystem hosting your database. The idea is that, IF
your filesystems fills up, you have a number of shots to restart the database
without breaking a sweat after removing one of these "insurance files".

2.) Figure out what's the root cause for your WAL not being
archived/recognized as successfully processed, and what WAL you postmaster is
currently trying to deal with, as detailed above, and fix it - or return to
the list with your findings if you can't see what's wrong.


> It looks like archiving is still occurring despite '/bin/true' being set.
> Can I safely kill -term the archive thread?

How exactly do you figure that there's still actual archiving going on? (Are
you monitoring the remote host's "auth" log or anything like that, which would
probably trigger upon scp-incurred logins?)

Which process (full output from `ps -fp <that-particular-PID>`) are you
talking about killing in particular?



With all that said, you shouldn't really be using use `scp` as an
archive_command in the first place. Maybe take a look at
https://johannes.truschnigg.info/code/pg_archive_wal_segment-2.0.0/ (and its
README file) for properties to consider when choosing an archive_command
that's fit for the job. (In my view, it's always better to perform archiving
to a *host-local* filesystem, and have its consumers pick it up from there,
instead of shipping directly onto a consumer's filesystem over the network.)

--
with best regards:
- Johannes Truschnigg ( johannes@truschnigg.info )

www:   https://johannes.truschnigg.info/
phone: +43 650 2 133337
xmpp:  johannes@truschnigg.info

Please do not bother me with HTML-email or attachments. Thank you.

Вложения

Re: pg_xlog no longer rotating out

От
Mark Steben
Дата:
Thank you both your prompt response - much appreciated.
Alvaro, you were correct.  There was a replication slot that was inactive. We had to kill
replication on a slave server over the weekend and I never deleted the replication slot.
Once I deleted it, logs also cleared out.
Johannes, I will also strongly consider your insight/recommendations.

On Tue, Jul 2, 2019 at 5:09 PM Johannes Truschnigg <johannes@truschnigg.info> wrote:
Hi Mark,

On Tue, Jul 02, 2019 at 04:30:25PM -0400, Mark Steben wrote:
> Good afternoon,
> We run postgres 9.4 with hot standby streaming replication.  We have been
> running abnormally high updates and the pg_xlogs have queued to about 80
> percent of disk capacity.  This often happens and the network is not fast
> enough to handle the transfer of logs to the replicated server.
> What I have done in the past, and done again today is to change the
> archive_command from the script that scp's the logs to '/bin/true'.  This
> stops the logship and allows the normal pg_xlog rotation to catch up.

Postgres cares only about the exit code of your "archive_command", so even
when you're just trying to get rid of WAL segments (without actually moving
them into an archive), it's best to set it to something that gives some kind
of indication that something's actually happening with the WAL. I'd recommend
replacing your `scp` invocation with a roughly equivalent command line,
prefixed with `echo`, so you can check whether or not your archive_command is
executed at all, and which WAL segment it's targeting.

What is your *normal* archive_command setting, with `scp` involved? Do you
have any "special" ssh_config flags in effect for its target host?


> This has not happened this time.  Logs have been stored on pg_xlog since
> June 30 and the number of logs keep climbing.  I am now manually moving
> some of the logs that were processed on June 30 off to another disk to
> alleviate the space. PLEASE HELP if you can.

Manually touching pg_xlog (or pg_wal on newer releases) is not a good idea, as
I'm sure you're aware. If I were in your position, my next steps would be:

1.) Create a number (think maybe five or six) files that are a few hundred
MBytes in size on the filesystem hosting your database. The idea is that, IF
your filesystems fills up, you have a number of shots to restart the database
without breaking a sweat after removing one of these "insurance files".

2.) Figure out what's the root cause for your WAL not being
archived/recognized as successfully processed, and what WAL you postmaster is
currently trying to deal with, as detailed above, and fix it - or return to
the list with your findings if you can't see what's wrong.


> It looks like archiving is still occurring despite '/bin/true' being set.
> Can I safely kill -term the archive thread?

How exactly do you figure that there's still actual archiving going on? (Are
you monitoring the remote host's "auth" log or anything like that, which would
probably trigger upon scp-incurred logins?)

Which process (full output from `ps -fp <that-particular-PID>`) are you
talking about killing in particular?



With all that said, you shouldn't really be using use `scp` as an
archive_command in the first place. Maybe take a look at
https://johannes.truschnigg.info/code/pg_archive_wal_segment-2.0.0/ (and its
README file) for properties to consider when choosing an archive_command
that's fit for the job. (In my view, it's always better to perform archiving
to a *host-local* filesystem, and have its consumers pick it up from there,
instead of shipping directly onto a consumer's filesystem over the network.)

--
with best regards:
- Johannes Truschnigg ( johannes@truschnigg.info )

www:   https://johannes.truschnigg.info/
phone: +43 650 2 133337
xmpp:  johannes@truschnigg.info

Please do not bother me with HTML-email or attachments. Thank you.


--
Mark Steben
 Database Administrator
@utoRevenue | Autobase 
  CRM division of Dominion Dealer Solutions 
95D Ashley Ave.
West Springfield, MA 01089

t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
 www.drivedominion.com