Обсуждение: New recovery_target_timeline=primary option

Поиск
Список
Период
Сортировка

New recovery_target_timeline=primary option

От
"Efrain J. Berdecia"
Дата:

One-line Summary: This new recovery_target_timeline option would ensure that when rebuilding a replica cluster, the recovery stays in the primary cluster's timeline making it fool proof and avoiding recovery timeline inconsistencies.


Business Use-case: Reduce human interaction when rebuilding replicas where unwanted timelines might have been archived in the repo and speed up recovery.


User impact with the change: New parameter option available 


Implementation details: I would need a subject matter expert to please make this feature a reality 

Estimated Development Time: unknown 


Category: Include the text: Restore, replication


Thanks in advance 
Efrain J Berdecia 

Re: New recovery_target_timeline=primary option

От
"Euler Taveira"
Дата:
On Thu, Sep 11, 2025, at 9:17 PM, Efrain J. Berdecia wrote:
> *One-line Summary:* This new recovery_target_timeline option would 
> ensure that when rebuilding a replica cluster, the recovery stays in 
> the primary cluster's timeline making it fool proof and avoiding 
> recovery timeline inconsistencies.
>

Do you understand what the timeline is for? [1] You are proposing to implement
exactly what it is protecting you from: overwrite previous archived WAL after a
recovery.


[1] https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-TIMELINES


-- 
Euler Taveira
EDB   https://www.enterprisedb.com/



Re: New recovery_target_timeline=primary option

От
"Efrain J. Berdecia"
Дата:
This option would only be applicable when the standby.signal file is used only for restoring a cluster for the purposes of establishing a standby replica.


On Thu, Sep 11, 2025 at 8:50 PM, Euler Taveira
<euler@eulerto.com> wrote:
On Thu, Sep 11, 2025, at 9:17 PM, Efrain J. Berdecia wrote:
> *One-line Summary:* This new recovery_target_timeline option would
> ensure that when rebuilding a replica cluster, the recovery stays in
> the primary cluster's timeline making it fool proof and avoiding
> recovery timeline inconsistencies.
>

Do you understand what the timeline is for? [1] You are proposing to implement
exactly what it is protecting you from: overwrite previous archived WAL after a

recovery.



[1] https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-TIMELINES


--
Euler Taveira
EDB  https://www.enterprisedb.com/

Re: New recovery_target_timeline=primary option

От
"David G. Johnston"
Дата:
On Thursday, September 11, 2025, Efrain J. Berdecia <ejberdecia@yahoo.com> wrote:

One-line Summary: This new recovery_target_timeline option would ensure that when rebuilding a replica cluster, the recovery stays in the primary cluster's timeline making it fool proof and avoiding recovery timeline inconsistencies.


Business Use-case: Reduce human interaction when rebuilding replicas where unwanted timelines might have been archived in the repo and speed up recovery.


User impact with the change: New parameter option available 


Implementation details: I would need a subject matter expert to please make this feature a reality 

Estimated Development Time: unknown 


Category: Include the text: Restore, replication


Feature requests with this little info are probably better discussed on the -general list to garner support for the idea.

David J.
 

Re: New recovery_target_timeline=primary option

От
"Efrain J. Berdecia"
Дата:
The error I would like to address with this feature is the following:

FATAL: highest timeline xxx of the primary is behind timeline yyy

Where the restored standby for some reason has applied wal files that made is go beyond the currents primary timeline.

Seems to me postgres already had more than enough logic to keep the restored standby's timeline in sync with the primary but is choosing to put out a fatal error instead. This foxes human intervention by having to specify the exact timeline needed to match the primary. I think this could be covered by the proposed option.


On Thu, Sep 11, 2025 at 8:50 PM, Euler Taveira
<euler@eulerto.com> wrote:
On Thu, Sep 11, 2025, at 9:17 PM, Efrain J. Berdecia wrote:
> *One-line Summary:* This new recovery_target_timeline option would
> ensure that when rebuilding a replica cluster, the recovery stays in
> the primary cluster's timeline making it fool proof and avoiding
> recovery timeline inconsistencies.
>

Do you understand what the timeline is for? [1] You are proposing to implement
exactly what it is protecting you from: overwrite previous archived WAL after a

recovery.



[1] https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-TIMELINES


--
Euler Taveira
EDB  https://www.enterprisedb.com/

Re: New recovery_target_timeline=primary option

От
"Euler Taveira"
Дата:
On Thu, Sep 11, 2025, at 10:07 PM, Efrain J. Berdecia wrote:
> The error I would like to address with this feature is the following:
>
> FATAL: highest timeline xxx of the primary is behind timeline yyy
>

It seems your procedure to set up a standby is incorrect. See [1]. You are not
using the base backup from the primary server.

You didn't describe the whole procedure so it is hard to point out where the
problem is.


[1] https://www.postgresql.org/docs/current/warm-standby.html#STANDBY-SERVER-SETUP


-- 
Euler Taveira
EDB   https://www.enterprisedb.com/



Re: New recovery_target_timeline=primary option

От
"Efrain J. Berdecia"
Дата:
A typical scenario would be if we have a high availability setup with two replicated clusters, primary and a standby. Throw patroni in the mix to manage automatic failover.

If we use a backup solution like PGbackrest to take full backups and archive the Wal files. Let's say we have a scenario where patroni starts flapping between the clusters and promotes both clusters several times but finally settles and chooses to continue running the primary cluster with an older timeline than the newest timeline in the pgbackrest repo, then when we try to reinit or restore the standby, by default, it will attempt to restore to latest timeline.

Leaving the admins to have to figure out what is the correct timeline to restore to, which at the end of the day needs to match the primary's timeline anyways, regardless of the latest timeline files in the pgbackrest repo.

Is either that or the admins need to go in the archive repo and manually delete the related wall files from the timeline that doesn't match the primary to prevent conflicts.

Is a common scenario.


On Thu, Sep 11, 2025 at 9:05 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:
On Thursday, September 11, 2025, Efrain J. Berdecia <ejberdecia@yahoo.com> wrote:

One-line Summary: This new recovery_target_timeline option would ensure that when rebuilding a replica cluster, the recovery stays in the primary cluster's timeline making it fool proof and avoiding recovery timeline inconsistencies.


Business Use-case: Reduce human interaction when rebuilding replicas where unwanted timelines might have been archived in the repo and speed up recovery.


User impact with the change: New parameter option available 


Implementation details: I would need a subject matter expert to please make this feature a reality 

Estimated Development Time: unknown 


Category: Include the text: Restore, replication


Feature requests with this little info are probably better discussed on the -general list to garner support for the idea.

David J.
 

Re: New recovery_target_timeline=primary option

От
"Efrain J. Berdecia"
Дата:
Even the documentation states/warns:

"Set restore_command to a simple command to copy files from the WAL archive. If you plan to have multiple standby servers for high availability purposes, make sure that recovery_target_timeline is set to latest (the default), to make the standby server follow the timeline change that occurs at failover to another standby."

By default, recovery_target_timeline is set to latest. What I'm recommending is an option to set it to just follow or stay within the primarie's timeline without having to receive the fatal message stated before that ends up stopping the recovery of the standby.

Supposed we have timelines 1-3 archived in our backup repo. Currently our streaming replication setup is running in timeline 3. But now, we need to restore the primary to timeline 2.  We can specify recovery_target_timeline=2 to initially restore the primary. But when I go to reinit or rebuild the standby, why not just add a new option, recovery_target_timeline=primary, that forces the standby to just stay on the primaries timeline without having to figure out the correct timeline for the standby.

Without this new parameter or without specifying the timeline when restoring the standby, the restore will take the standby to timeline 3 and get the fatal error message. This happens a lot on setups using tools like patroni.

Just trying to make the administrator's and HA tools lives a little easier when setting up a standby.


On Thu, Sep 11, 2025 at 9:19 PM, Euler Taveira
<euler@eulerto.com> wrote:
On Thu, Sep 11, 2025, at 10:07 PM, Efrain J. Berdecia wrote:
> The error I would like to address with this feature is the following:
>
> FATAL: highest timeline xxx of the primary is behind timeline yyy
>

It seems your procedure to set up a standby is incorrect. See [1]. You are not
using the base backup from the primary server.

You didn't describe the whole procedure so it is hard to point out where the
problem is.


[1] https://www.postgresql.org/docs/current/warm-standby.html#STANDBY-SERVER-SETUP



--
Euler Taveira
EDB  https://www.enterprisedb.com/