Обсуждение: PostgreSQL switchover process

Поиск
Список
Период
Сортировка

PostgreSQL switchover process

От
rams nalabolu
Дата:
Hi All,

I'm trying to do the Switchover process in postgres.
1) I have my primary cluster running on port 5432 on server A
2)  secondary cluster on the port 5432 on server B
I did setup streaming replication b/w primary and secondary using replication slots.

I did the following steps to perform switchover
1) ran a checkpoint on primary.
2) verified secondary status and it is up to date with primary.
3) stopped primary using pg_ctl -mf i.e force.
4) promoted primary using trigger file and verified the status of the new primary; it is not in recovery mode and running on new timeline ID 2.
5) I did create the recovery.conf file on old primary i.e new standby with 
      primaray_coninfo: new primary(B)
      recovery_target_timeline='latest'
      primary_slot_name: "slot I created in new_standby"
6) I started the new standby. It is up and running but it is not in recovery mode.

But the wal sender process on primary and receiver process on secondary are not running. i.e it looks like my replication is broken.
what steps am I missing here without rebuilding a new standby i.e old primary I want the replication happen b/w my new primary and secondary.

Could you suggest any good documentation?

thanks in advance.

Thanks,
Veeru.
 
 



Re: PostgreSQL switchover process

От
Michael Paquier
Дата:
On Mon, Aug 24, 2020 at 03:56:43PM -0400, rams nalabolu wrote:
> I did the following steps to perform switchover
> 1) ran a checkpoint on primary.
> 2) verified secondary status and it is up to date with primary.
> 3) stopped primary using pg_ctl -mf i.e force.

This is not a force mode, but the fast mode, where all existing
connections are forcibly stopped, and that the shutdown is clean, with
a shutdown checkpoint generated before finishing the shutdown
sequence.  During this shutdown, the primary makes sure that all
standbys have flushed WAL up to the point of the shutdown checkpoint.
Note that this makes rather unnecessary the checkpoint you ran on the
primary in step 1.

> 4) promoted primary using trigger file and verified the status of the new
> primary; it is not in recovery mode and running on new timeline ID 2.

I think that you mean promotion of the standby here.

> 5) I did create the recovery.conf file on old primary i.e new standby with
>       primaray_coninfo: new primary(B)
>       recovery_target_timeline='latest'
>       primary_slot_name: "slot I created in new_standby"

Typo here.  You mean primary_conninfo.

> 6) I started the new standby. It is up and running but it is not in
> recovery mode.

This switchover flow is a good base, so it should be possible to reuse
your previous primary as a standby.

> But the wal sender process on primary and receiver process on secondary are
> not running. i.e it looks like my replication is broken.
> what steps am I missing here without rebuilding a new standby i.e old
> primary I want the replication happen b/w my new primary and
> secondary.

It may be many things without more information.  Could you check
pg_stat_replication on the primary and pg_stat_wal_receiver on the
standby?  Most likely something is wrong with primary_conninfo, but
the logs of the standby should have enough information to let you know
what happened.  Another thing you are not telling is the version of
PostgreSQL you are using here.  In 12 and newer versions,
support for recovery.conf has been removed, requiring roughly the
creation of standby.signal with all recovery parameters set in
postgresql.conf if you want to set up a standby.
--
Michael

Вложения

Re: PostgreSQL switchover process

От
Fujii Masao
Дата:

On 2020/08/25 11:22, Michael Paquier wrote:
> On Mon, Aug 24, 2020 at 03:56:43PM -0400, rams nalabolu wrote:
>> I did the following steps to perform switchover
>> 1) ran a checkpoint on primary.
>> 2) verified secondary status and it is up to date with primary.
>> 3) stopped primary using pg_ctl -mf i.e force.
> 
> This is not a force mode, but the fast mode, where all existing
> connections are forcibly stopped, and that the shutdown is clean, with
> a shutdown checkpoint generated before finishing the shutdown
> sequence.  During this shutdown, the primary makes sure that all
> standbys have flushed WAL up to the point of the shutdown checkpoint.
> Note that this makes rather unnecessary the checkpoint you ran on the
> primary in step 1.
> 
>> 4) promoted primary using trigger file and verified the status of the new
>> primary; it is not in recovery mode and running on new timeline ID 2.
> 
> I think that you mean promotion of the standby here.
> 
>> 5) I did create the recovery.conf file on old primary i.e new standby with
>>        primaray_coninfo: new primary(B)
>>        recovery_target_timeline='latest'
>>        primary_slot_name: "slot I created in new_standby"

Did you enable standby_mode?


> 
> Typo here.  You mean primary_conninfo.
> 
>> 6) I started the new standby. It is up and running but it is not in
>> recovery mode.
> 
> This switchover flow is a good base, so it should be possible to reuse
> your previous primary as a standby.
> 
>> But the wal sender process on primary and receiver process on secondary are
>> not running. i.e it looks like my replication is broken.
>> what steps am I missing here without rebuilding a new standby i.e old
>> primary I want the replication happen b/w my new primary and
>> secondary.
> 
> It may be many things without more information.  Could you check
> pg_stat_replication on the primary and pg_stat_wal_receiver on the
> standby?  Most likely something is wrong with primary_conninfo, but
> the logs of the standby should have enough information to let you know
> what happened.  Another thing you are not telling is the version of
> PostgreSQL you are using here.  In 12 and newer versions,
> support for recovery.conf has been removed, requiring roughly the
> creation of standby.signal with all recovery parameters set in
> postgresql.conf if you want to set up a standby.
> --
> Michael
> 

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: PostgreSQL switchover process

От
rams nalabolu
Дата:
Yes, I did. 

On Mon, Aug 24, 2020 at 10:43 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:


On 2020/08/25 11:22, Michael Paquier wrote:
> On Mon, Aug 24, 2020 at 03:56:43PM -0400, rams nalabolu wrote:
>> I did the following steps to perform switchover
>> 1) ran a checkpoint on primary.
>> 2) verified secondary status and it is up to date with primary.
>> 3) stopped primary using pg_ctl -mf i.e force.
>
> This is not a force mode, but the fast mode, where all existing
> connections are forcibly stopped, and that the shutdown is clean, with
> a shutdown checkpoint generated before finishing the shutdown
> sequence.  During this shutdown, the primary makes sure that all
> standbys have flushed WAL up to the point of the shutdown checkpoint.
> Note that this makes rather unnecessary the checkpoint you ran on the
> primary in step 1.
>
>> 4) promoted primary using trigger file and verified the status of the new
>> primary; it is not in recovery mode and running on new timeline ID 2.
>
> I think that you mean promotion of the standby here.
>
>> 5) I did create the recovery.conf file on old primary i.e new standby with
>>        primaray_coninfo: new primary(B)
>>        recovery_target_timeline='latest'
>>        primary_slot_name: "slot I created in new_standby"

Did you enable standby_mode?


>
> Typo here.  You mean primary_conninfo.
>
>> 6) I started the new standby. It is up and running but it is not in
>> recovery mode.
>
> This switchover flow is a good base, so it should be possible to reuse
> your previous primary as a standby.
>
>> But the wal sender process on primary and receiver process on secondary are
>> not running. i.e it looks like my replication is broken.
>> what steps am I missing here without rebuilding a new standby i.e old
>> primary I want the replication happen b/w my new primary and
>> secondary.
>
> It may be many things without more information.  Could you check
> pg_stat_replication on the primary and pg_stat_wal_receiver on the
> standby?  Most likely something is wrong with primary_conninfo, but
> the logs of the standby should have enough information to let you know
> what happened.  Another thing you are not telling is the version of
> PostgreSQL you are using here.  In 12 and newer versions,
> support for recovery.conf has been removed, requiring roughly the
> creation of standby.signal with all recovery parameters set in
> postgresql.conf if you want to set up a standby.
> --
> Michael
>

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION