Обсуждение: Resetting recovery target parameters in pg_createsubscriber

Поиск
Список
Период
Сортировка

Resetting recovery target parameters in pg_createsubscriber

От
D Laaren
Дата:
Hi Hackers,

I noticed that pg_createsubscriber sets recovery target params for
correct recovery before converting a physical replica to a logical
one but does not reset them afterward. It can lead to recovery
failures in certain scenarios.
For example, if recovery begins from a checkpoint where no WAL records
need to be applied, the system might incorrectly determine that the
recovery target was never reached because these parameters remain
active.

I’ve attached a TAP test to reproduce the issue.
The proposed patch ensures all recovery parameters are reset after
conversion to prevent such edge cases.

I would appreciate any feedback.
--
Regards,
Alyona Vinter
Вложения

RE: Resetting recovery target parameters in pg_createsubscriber

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Laaren,

> I noticed that pg_createsubscriber sets recovery target params for
> correct recovery before converting a physical replica to a logical
> one but does not reset them afterward. It can lead to recovery
> failures in certain scenarios.
> For example, if recovery begins from a checkpoint where no WAL records
> need to be applied, the system might incorrectly determine that the
> recovery target was never reached because these parameters remain
> active.

Thanks for reporting.
I have known that parameters won't be overwritten, but I didn't recognize the
case that recovery fails.

> I’ve attached a TAP test to reproduce the issue.
> The proposed patch ensures all recovery parameters are reset after
> conversion to prevent such edge cases.

WriteRecoveryConfig() has been used to setup the recovery parameters. Can we
follow the way to restore them?

Also, can we add a test to 040_pg_createsubscriber? IIUC it is enough to check
one of recovery parameter is reset after the conversion.

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Re: Resetting recovery target parameters in pg_createsubscriber

От
Michael Paquier
Дата:
On Mon, Sep 01, 2025 at 02:06:34AM +0000, Hayato Kuroda (Fujitsu) wrote:
> WriteRecoveryConfig() has been used to setup the recovery parameters. Can we
> follow the way to restore them?
>
> Also, can we add a test to 040_pg_createsubscriber? IIUC it is enough to check
> one of recovery parameter is reset after the conversion.

Yeah, we'd want some tests to check the behaviors and expectations in
this tool.  This tool is complex enough that this is going to be
mandatory, and making a test cheaper is always nicer.

FWIW, I find the proposed patch a bit dangerous.  It updates
pg_createsubscriber.c so as an ALTER SYSTEM is used to reset the
parameters, but the recovery parameters are updated via
WriteRecoveryConfig() which is the code path holding the knowledge
that postgresql.auto.conf is used to hold the recovery parameters.

I don't like much the fact this creates a duplication with
setup_recovery() for the list of parameters handled.  All the recovery
parameters are forced to a hardcoded value, except
recovery_target_lsn.  So perhaps it would be better to maintain in
pg_createsubscriber.c a list made of (GUC names, values), with the LSN
part handled as an exception for the value to assign.

GenerateRecoveryConfig() can work with a replication connection,
relying on ALTER SYSTEM would not be able to do that properly, so
perhaps we should just invent a new routine that resets a portion of
the file on disk because recovery_gen.c's code already assumes that it
has write access to a data folder to do its work?
--
Michael

Вложения

Re: Resetting recovery target parameters in pg_createsubscriber

От
Alyona Vinter
Дата:
Dear Michael and Hayato,

Thank you both for your valuable feedback on the previous patch version.

I've reworked the patch based on your suggestions - the new version should address the concerns about ALTER SYSTEM and follows the same patterns as the 'setup_recovery' code.

I kept primary_conninfo as-is for now since I'm not totally sure if we need to touch it

I look forward to your feedback! ;)

Best regards,
Alyona Vinter
Вложения

RE: Resetting recovery target parameters in pg_createsubscriber

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Alyona,

Thanks for updating the patch!
Sadly, your patch cannot be applied cleanly. Even after the manual merge, it could not
be built. Maybe `dbinfo` should be `dbinfos.dbinfo`. Obtained message is written in [1].
(cfbot seemed not to run correctly)

Regarding patch content, your patch restores the postgresql.auto.conf after the
command runs. Initially I felt that it is enough to set below GUCs becasue only
they are changed from the default. Is there a reason why you fully restore them?

```
recovery_target_inclusive    true
recovery_target_action        pause
recovery_target_lsn            ""
```

[1]
```
../postgres/src/bin/pg_basebackup/pg_createsubscriber.c: In function ‘main’:
../postgres/src/bin/pg_basebackup/pg_createsubscriber.c:2526:31: error: ‘dbinfo’ undeclared (first use in this
function);did you mean ‘dbinfos’?
 
 2526 |         reset_recovery_params(dbinfo, subscriber_dir);
      |                               ^~~~~~
      |                               dbinfos
```

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Re: Resetting recovery target parameters in pg_createsubscriber

От
Alyona Vinter
Дата:
Dear Hayato, 

Thank you for the review! My apologies for the error in the patch -- it looks like I accidentally modified it before sending =(. I've attached the fixed versions below. 

> Regarding patch content, your patch restores the postgresql.auto.conf after the
> command runs. Initially I felt that it is enough to set below GUCs becasue only
> they are changed from the default. Is there a reason why you fully restore them?

I just found it easier to restore the original state of 'postgresql.auto.conf', as removing parameters from the file resets them to their default values. This approach achieves the same final state without having to explicitly set each one.
Вложения

Re: Resetting recovery target parameters in pg_createsubscriber

От
Alyona Vinter
Дата:
Hi,

CFbot indicated some issues with the patch. I've attached rebased versions of the patches, so hopefully everything will be ok this time.

Best regards,
Alyona Vinter

On Fri, 5 Sept 2025 at 12:51, Alyona Vinter <dlaaren8@gmail.com> wrote:
Dear Hayato, 

Thank you for the review! My apologies for the error in the patch -- it looks like I accidentally modified it before sending =(. I've attached the fixed versions below. 

> Regarding patch content, your patch restores the postgresql.auto.conf after the
> command runs. Initially I felt that it is enough to set below GUCs becasue only
> they are changed from the default. Is there a reason why you fully restore them?

I just found it easier to restore the original state of 'postgresql.auto.conf', as removing parameters from the file resets them to their default values. This approach achieves the same final state without having to explicitly set each one.
Вложения

Re: Resetting recovery target parameters in pg_createsubscriber

От
Alyona Vinter
Дата:
Sorry, wrong patches again. Here are the correct ones.

Best regards,
Alyona Vinter
Вложения

Re: Resetting recovery target parameters in pg_createsubscriber

От
Alexander Korotkov
Дата:
Hello, Alyona!

On Mon, Sep 8, 2025 at 8:35 AM Alyona Vinter <dlaaren8@gmail.com> wrote:
>
> Sorry, wrong patches again. Here are the correct ones.

I went though this patches.
1) I've removed the array of parameters.  I see it was proposed by
Michael upthread.  But I think his proposal came from the fact we walk
trough the parameters twice.  But we end up walking trough the
parameter once in setup_recovery(), while reset_recovery_params() just
restores the previous contents.  I think it makes sense to keep the
changes minimal.
2) I reordered patches so that helper function goes first.  I think it
essential to order commit in the way that every commit leaves our tree
in working state.
3) I make pgpreltidy run over  040_pg_createsubscriber.pl.
Any thought?

------
Regards,
Alexander Korotkov
Supabase

Вложения

Re: Resetting recovery target parameters in pg_createsubscriber

От
Michael Paquier
Дата:
On Mon, Sep 15, 2025 at 10:29:47AM +0300, Alexander Korotkov wrote:
> I went though this patches.
> 1) I've removed the array of parameters.  I see it was proposed by
> Michael upthread.  But I think his proposal came from the fact we walk
> trough the parameters twice.  But we end up walking trough the
> parameter once in setup_recovery(), while reset_recovery_params() just
> restores the previous contents.  I think it makes sense to keep the
> changes minimal.

Yeah, my concern was about the duplication of the list.  As long as a
fix does not do any of that, I'm OK.  Sorry if my idea of a list of
parameters felt misguided if we make recovery_gen.c smarter with the
handling of the on-disk files.

> 2) I reordered patches so that helper function goes first.  I think it
> essential to order commit in the way that every commit leaves our tree
> in working state.

Yep.  That would create some noise if one bisects for example.  These
are always annoying because they make analysis of a range of commits
longer with more false positives.  If you have a large range of
commits, the odds are usually very low, but who knows..

> 3) I make pgpreltidy run over  040_pg_createsubscriber.pl.
> Any thought?

GetRecoveryConfig() and ReplaceRecoveryConfig() should have some
documentation, regarding what the callers of these functions can
expect from them.

+    use_recovery_conf =
+        PQserverVersion(pgconn) < MINIMUM_VERSION_FOR_RECOVERY_GUC;
+
+    snprintf(tmp_filename, MAXPGPATH, "%s/%s.tmp", target_dir,
+             use_recovery_conf ? "recovery.conf" : "postgresql.auto.conf");
+
+    snprintf(filename, MAXPGPATH, "%s/%s", target_dir,
+             use_recovery_conf ? "recovery.conf" : "postgresql.auto.conf"

No need for use_recovery_conf.  You could just set a pointer to the
file name instead and avoid the duplication.


+    cf = fopen(tmp_filename, "w");
+    if (cf == NULL)
+        pg_fatal("could not open file \"%s\": %m", tmp_filename);

"a" is used in fopen() when calling WriteRecoveryConfig() when under
use_recovery_conf.  Perhaps this inconsistency should be specified as
a comment because we are generating a temporary file from scratch with
the new recovery GUC contents?

This patch also means that pg_createsubscriber is written so as the
contents added to recovery.conf/postgresql.auto.conf by
setup_recovery() are never reset if there is a failure in-flight.  Is
that OK or should we also have an exit callback to do the cleanup work
in such cases?

Perhaps these internal manipulations should be documented as well, to
make the users of this tool aware of steps they may need to take in
the event of an in-flight failure?  pg_createsubscriber includes a
"How it works" section that explains how the tool works, including the
part about the recovery parameters.  The changes of this patch become
implied facts, and are not reflected in the docs.  That sounds like a
problem to me because we are hiding some of the the internal logic,
but the docs are written so as they explain all these details.
--
Michael

Вложения

Re: Resetting recovery target parameters in pg_createsubscriber

От
Alyona Vinter
Дата:
Hi Michael and Alexander,

Thank you both for your help! I really appreciate it.
As a newcomer here, I might make some mistakes, but I hope with your guidance I can avoid making them in the future =)

> Yeah, my concern was about the duplication of the list.  As long as a
> fix does not do any of that, I'm OK.  Sorry if my idea of a list of
> parameters felt misguided if we make recovery_gen.c smarter with the
> handling of the on-disk files.

I got your concern about avoiding duplication. I thought that defining all parameters explicitly in the file header would lead to clearer and nicer code, which is why I left it that way (even without duplicating). But now I agree with Alexander's point about keeping the changes minimal.

> This patch also means that pg_createsubscriber is written so as the
> contents added to recovery.conf/postgresql.auto.conf by
> setup_recovery() are never reset if there is a failure in-flight.  Is
> that OK or should we also have an exit callback to do the cleanup work
> in such cases?

It's a good idea to add an exit callback. Additionally, I'd like to propose adding a pre-flight check at the start. This check would look for any existing recovery configuration that might be an artifact from a previous aborted run and warn the user or handle it appropriately. What do you think about implementing both the exit callback and the pre-flight check?

> pg_createsubscriber includes a
> "How it works" section that explains how the tool works, including the
> part about the recovery parameters.   

I looked through the `pg_createsubscriber.c` file but wasn't able to locate a "How it works" section. Could you please point me to the specific file or line number you are referring to? Or do you mean all the descriptive comments? For context, I'm currently working on the version where my patch is being tested in CI.

I will work on improving the code and will also add the documentation notes that Michael has pointed out ASAP.

Best regards,
Alyona Vinter

Re: Resetting recovery target parameters in pg_createsubscriber

От
Michael Paquier
Дата:
On Tue, Sep 16, 2025 at 05:27:43PM +0700, Alyona Vinter wrote:
>> This patch also means that pg_createsubscriber is written so as the
>> contents added to recovery.conf/postgresql.auto.conf by
>> setup_recovery() are never reset if there is a failure in-flight.  Is
>> that OK or should we also have an exit callback to do the cleanup work
>> in such cases?
>
> It's a good idea to add an exit callback. Additionally, I'd like to propose
> adding a pre-flight check at the start. This check would look for any
> existing recovery configuration that might be an artifact from a previous
> aborted run and warn the user or handle it appropriately. What do you think
> about implementing both the exit callback and the pre-flight check?

I am not sure how much a pre-flight check would help if we have an
exit callback that would make sure that things are cleaned up on exit.
Is there any need to worry about a kill(9) that would cause the exit
cleanup callback to not be called?  We don't bother about that
usually, so I don't see a strong case for it here, either.  :)

Alexander may have a different opinion.

> I will work on improving the code and will also add the documentation notes
> that Michael has pointed out ASAP.

Thanks.
--
Michael

Вложения

Re: Resetting recovery target parameters in pg_createsubscriber

От
Alena Vinter
Дата:
Hi,

I'm back with improvements :)
I've added code comments in `recovery_gen.c` and expanded the documentation in `pg_createsubscriber.sgml`.

About the recovery parameters cleanup: I thought about adding an exit callback, but it doesn't really make sense because once the target server gets promoted (which happens soon after we set the parameters), there's no point in cleaning up - the server is already promoted and can't be used as a replica again and must be recreated. Also, `reset_recovery_params()` might call `exit()` itself, which could cause problems with the cleanup callback.
So I think it's better to just warn users about leftover parameters and let them handle the cleanup manually if needed.

By the way, is it ok that the second patch includes both code and test changes together, or should I split them into separate commits?

I look forward to your feedback!

Regards,
Alena Vinter
Вложения

Re: Resetting recovery target parameters in pg_createsubscriber

От
Michael Paquier
Дата:
On Tue, Sep 23, 2025 at 12:04:04PM +0700, Alena Vinter wrote:
> About the recovery parameters cleanup: I thought about adding an exit
> callback, but it doesn't really make sense because once the target server
> gets promoted (which happens soon after we set the parameters), there's no
> point in cleaning up - the server is already promoted and can't be used as
> a replica again and must be recreated. Also, `reset_recovery_params()`
> might call `exit()` itself, which could cause problems with the cleanup
> callback.

Your argument does not consider one case, which is very common:
pg_rewind.  Even if the standby finishes recovery and is promoted with
its new recovery parameters, we could rewind it rather than recreate a
new standby from scratch.  That's cheaper than recreating a new
physical replica from scratch.  Keeping the recovery parameters added
by pg_createsubscriber around would make pg_rewind's work more
complicated, because it does similar manipulations, for different
requirements.

The tipping point where we would not be able to reuse the promoted
standby happens as the last step of pg_createsuscriber in
modify_subscriber_sysid() where its system ID is changed.  Before
that, the code also makes an effort of cleaning up anything that's
been created in-betwee.

Even the system ID argument is not entirely true, actually.  One could
also decide to switch the system ID back to what it was previously to
match with the primary.  That requires a bit more magic, but that's
not impossible.

> So I think it's better to just warn users about leftover parameters and let
> them handle the cleanup manually if needed.

Warnings tend to be ignored and missed, especially these days where
vendors automate these actions.  It is true that there could be an
argument about requiring extra implementation steps on each vendor
side, but they would also need to keep up with any new GUCs that
pg_createsubscriber may add in the future when setting up its recovery
parameters, which would mean extra work for everybody, increasing the
range of problems for some logic that's isolated to
pg_createsubscriber.

In short, I disagree with what you are doing here: we should take the
extra step and clean up anything that's been created by the tool when
we know we can safely do so (aka adding a static flag that the
existing cleanup callback should rely on, which is already what your
patch 0003 does to show a warning).

> By the way, is it ok that the second patch includes both code and test
> changes together, or should I split them into separate commits?

The tests and the fix touch entirely separate code paths, keeping them
together is no big deal.
--
Michael

Вложения

Re: Resetting recovery target parameters in pg_createsubscriber

От
Alena Vinter
Дата:
Hi,

> In short, I disagree with what you are doing here: we should take the
> extra step and clean up anything that's been created by the tool when
> we know we can safely do so

I got your point, thanks for pointing to the `pg_rewind` case. I've attached a new version of the patches. I've changed `ReplaceRecoveryConfig` a little bit -- now it returns false in case of an error instead of exiting.

Best wishes,
Alena Vinter
Вложения

Re: Resetting recovery target parameters in pg_createsubscriber

От
Michael Paquier
Дата:
On Mon, Sep 29, 2025 at 04:57:09PM +0700, Alena Vinter wrote:
> I got your point, thanks for pointing to the `pg_rewind` case. I've
> attached a new version of the patches. I've changed `ReplaceRecoveryConfig`
> a little bit -- now it returns false in case of an error instead of exiting.

 #include "common/logging.h"
+#include "common/file_utils.h"

Incorrect include file ordering.

+GetRecoveryConfig(PGconn *pgconn, const char *target_dir)
[...]
+    char        data[1024];
[...]
+    while ((bytes_read = fread(data, 1, sizeof(data), cf)) > 0)
+    {
+        data[bytes_read] = '\0';
+        appendPQExpBufferStr(contents, data);
+    }

You are assuming that this will never overflow.  However, recovery
parameters could include commands, which are mostly limited to
MAXPGPATH, itself 1024.  So that's unsafe.  The in-core routine
pg_get_line(), or the rest in pg_get_line.c is safer to use, relying
on malloc() for the frontend and the lines fetched.

+           pg_log_warning_hint("Manual removal of recovery parameters is required from 'postgresql.auto.conf'
(PostgreSQL%d+) or 'recovery.conf' (older versions)", +                               MINIMUM_VERSION_FOR_RECOVERY_GUC
/10000);
 

Hmm, okay here.  You would need that hint anyway if you cannot connect
to determine to which file the recovery parameters need to go to, the
other code paths failures in ReplaceRecoveryConfig() would include the
file name, which offers a sufficient hint about the version, but a
connect_database() failure does not.

+static bool recovery_params_set = false;
+static bool recovery_params_reset = false;

Hmm.  We may need an explanation about these, in the shape of a
comment, to document what's expected from them.  Rather than two
booleans, using an enum tracking the state of the parameters would be
cleaner?  And actually, you do not need two flags.  Why not just
switch recovery_params_set to false once ReplaceRecoveryConfig() is
called?

+reset_recovery_params(const struct LogicalRepInfo *dbinfo, const char
*datadir)
[...]
+   recoveryconfcontents = GenerateRecoveryConfig(conn, NULL, NULL);

Why do we need to call again GenerateRecoveryConfig() when resetting
recovery.conf/postgresql.conf.sample with its original contents before
switching the system ID of the new replica?  I may be missing
something, of course, but we're done with recovery so I don't quite
see the point in appending the recovery config generated with the
original contents.  If this is justified (don't think it is), this
deserves a comment to explain the reason behind this logic.
--
Michael

Вложения

Re: Resetting recovery target parameters in pg_createsubscriber

От
Alena Vinter
Дата:
HI Michael,

Thank you for the review!

> Why not just
> switch recovery_params_set to false once ReplaceRecoveryConfig() is
> called?

Stupid me!

> Why do we need to call again GenerateRecoveryConfig() when resetting
> recovery.conf/postgresql.conf.sample with its original contents before
> switching the system ID of the new replica?  I may be missing
> something, of course, but we're done with recovery so I don't quite
> see the point in appending the recovery config generated with the
> original contents.  If this is justified (don't think it is), this
> deserves a comment to explain the reason behind this logic.

This relates to the point I mentioned earlier about being unsure whether we should preserve `primary_conninfo`:
> I kept primary_conninfo as-is for now since I'm not totally sure if we need to touch it

The reason I called `GenerateRecoveryConfig()` was to regenerate the `primary_conninfo` string in the recovery configuration file. If we should remove it, then the reset function can be much simpler. Could you please help me to clarify should we regenerate `primary_conninfo` or we can safely purge it?

Best regards,
Alena Vinter

Re: Resetting recovery target parameters in pg_createsubscriber

От
Michael Paquier
Дата:
On Tue, Sep 30, 2025 at 12:22:08PM +0700, Alena Vinter wrote:
> This relates to the point I mentioned earlier about being unsure whether we
> should preserve `primary_conninfo`:
> > I kept primary_conninfo as-is for now since I'm not totally sure if we
> need to touch it
>
> The reason I called `GenerateRecoveryConfig()` was to regenerate the
> `primary_conninfo` string in the recovery configuration file. If we should
> remove it, then the reset function can be much simpler. Could you please
> help me to clarify should we regenerate `primary_conninfo` or we can safely
> purge it?

Based on the contents of the latest patch, we reset the parameters
after promoting the node, and primary_conninfo only matters while we
are in recovery, for a standby recovery WAL using the streaming
replication protocol.
--
Michael

Вложения

RE: Resetting recovery target parameters in pg_createsubscriber

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Alena,

Thanks for updating the patch. Few comments.

```
+    /* Before setting up the recovery parameters save the original content. */
+    savedrecoveryconfcontents = GetRecoveryConfig(conn, datadir);
```

To confirm, you put the connection to the primary/publisher instead of standby/subscriber.
But it is harmless because streaming replication requires that both instances
have the same major version. Is it correct?

```
+            pg_log_warning_hint("Manual removal of recovery parameters is required from 'postgresql.auto.conf'
(PostgreSQL%d+) or 'recovery.conf' (older versions)",
 
+                                MINIMUM_VERSION_FOR_RECOVERY_GUC / 10000);
```

Can we cache the version info when we firstly connect to the primary node to
print appropriate filename? Or is it hacky?

```
+    if (dry_run)
+    {
+        appendPQExpBufferStr(savedrecoveryconfcontents, "# dry run mode");
+    }
```

Per my understanding, setup_recovery() puts the indicator becasue the content
can be printed. I think it is not needed since reset_recovery_params() does not
have that, or we can even print the parameters.

```
+sub test_param_absent
+{
+    my ($node, $param) = @_;
+    my $auto_conf = $node->data_dir . '/postgresql.auto.conf';
+
+    return 1 unless -e $auto_conf;
+
+    my $content = slurp_file($auto_conf);
+    return $content !~ /^\s*$param\s*=/m;
+}
```

Can you add a short comment atop the function? Something like:
"Check whether the given parameter is specified in postgresql.auto.conf"

Best regards,
Hayato Kuroda
FUJITSU LIMITED