Обсуждение: pg_recvlogical: Prevent flushed data from being re-sent after restarting replication

Поиск
Список
Период
Сортировка

pg_recvlogical: Prevent flushed data from being re-sent after restarting replication

От
Fujii Masao
Дата:
Hi,

When pg_recvlogical loses connection, it reconnects and restarts
replication unless
--no-loop option is used. I noticed that in this scenario, data that
has already been
flushed can be re-sent after restarting replication. This happens
because the replication
start position used when restarting replication is taken from the write position
in the last status update message, which may be older than the actual
position of
the last flushed data. As a result, some flushed data could exist newer than
the replication start position and be re-sent. Is this a bug?

To fix this issue, I'd like to propose the attached patch that fixes
this by ensuring
all written data is flushed to disk before restarting replication and by using
the last flushed position as the replication start point. This prevents already
flushed data from being re-sent.

Additionally, when the --no-loop option is used, I found that pg_recvlogical
could previously exit without flushing written data, risking data loss.
The attached patch fixes this issue by also ensuring that all data is flushed
to disk before exiting with --no-loop.

Thought?

Regards,


-- 
Fujii Masao

Вложения
Hi Fujii,


I see what you mean. For reviewing I started with writing a test that 
just reproduces the bug and documents the current behaviour.

I expected that by applying your patch the test would fail, but then we 
would just update the test accordingly. Surprisingly the test continues 
to pass.

I attached the test for your consideration. I'll have a look again 
tomorrow at both.


Kind regards,

Mircea Cadariu

Вложения
Hi,


Update: I added another test to the attached test-only patch. This new 
test uses pg_terminate_backend to trigger reconnection.

Assuming the tests are fully correct (your input appreciated on this) we 
can use them to validate the solution.


Kind regards,

Mircea Cadariu


Вложения
On Wed, Nov 19, 2025 at 11:36 PM Mircea Cadariu
<cadariu.mircea@gmail.com> wrote:
>
> Hi,
>
>
> Update: I added another test to the attached test-only patch. This new
> test uses pg_terminate_backend to trigger reconnection.
>
> Assuming the tests are fully correct (your input appreciated on this) we
> can use them to validate the solution.

Thanks for testing!

BTW, I reproduced the issue as follows:

#1. Start the server

#2. Start pg_recvlogical:
$ pg_recvlogical -S myslot -d postgres --create-slot --start -f test.out

#3. Insert data every second:
$ psql
=# create table t(i serial);
=# insert into t values(default);
=# \watch 1

#4. In a separate session, terminate walsender to force pg_recvlogical
to restart replication:
$ psql
=# select pg_terminate_backend(pid) from pg_stat_replication;

#5. Wait for pg_recvlogical to restart replication

#6. You will see duplicate records written to the output file, for example:
$ cat test.out
BEGIN 798
table public.t: INSERT: i[integer]:42
COMMIT 798
BEGIN 799
table public.t: INSERT: i[integer]:43
COMMIT 799
BEGIN 792
table public.t: INSERT: i[integer]:36
COMMIT 792
BEGIN 793
table public.t: INSERT: i[integer]:37
COMMIT 793
BEGIN 794
table public.t: INSERT: i[integer]:38
COMMIT 794
BEGIN 795
table public.t: INSERT: i[integer]:39
COMMIT 795
BEGIN 796
table public.t: INSERT: i[integer]:40
COMMIT 796
BEGIN 797
table public.t: INSERT: i[integer]:41
COMMIT 797
BEGIN 798
table public.t: INSERT: i[integer]:42
COMMIT 798
BEGIN 799
table public.t: INSERT: i[integer]:43
COMMIT 799


With the patch applied, these duplicate records no longer appear in
the pg_recvlogical output.

Regards,

--
Fujii Masao



On 19/11/2025 14:54, Fujii Masao wrote:

> With the patch applied, these duplicate records no longer appear in
> the pg_recvlogical output.

Thanks! Works like a charm. I confirm duplicates no longer appear with 
the patch applied.

You can consider adding a test about this in "030_pg_recvlogical.pl", 
proposal below:


use IPC::Run qw(start);
my $outfile = $node->basedir . '/reconnect.out';

$node->command_ok(
     [
         'pg_recvlogical',
         '--slot' => 'reconnect_test',
         '--dbname' => $node->connstr('postgres'),
         '--create-slot',
     ],
     'slot created for reconnection test');

$node->safe_psql('postgres', 'CREATE TABLE t(x int);');
$node->safe_psql('postgres', 'INSERT INTO t VALUES (1);');

my $recv = start [
     'pg_recvlogical',
     '--slot', 'reconnect_test',
     '--dbname', $node->connstr('postgres'),
     '--start',
     '--file', $outfile,
     '--fsync-interval', '1',
     '--status-interval', '60',
     '--verbose'
], '>', \my $out, '2>', \my $err;

sleep(3);

my $backend_pid = $node->safe_psql('postgres',
     "SELECT active_pid FROM pg_replication_slots WHERE slot_name = 
'reconnect_test'");

if ($backend_pid ne '')
{
     $node->safe_psql('postgres', "SELECT 
pg_terminate_backend($backend_pid)");
}

sleep(6);

$node->safe_psql('postgres', 'INSERT INTO t VALUES (2);');

sleep(3);

$recv->signal('TERM');
$recv->finish();

open(my $file, '<', $outfile);
my $count = 0;
while (<$file>) {
     if (/INSERT/) {
         $count = $count + 1;
     }
}
close($file);

cmp_ok($count, '==', 2, 'two INSERTs');

$node->command_ok(
     [
         'pg_recvlogical',
         '--slot' => 'reconnect_test',
         '--dbname' => $node->connstr('postgres'),
         '--drop-slot'
     ],
     'reconnect_test slot dropped');


-- 
Regards,
Mircea Cadariu




Hi,


An update: I have two topics from the review perspective.

On the test I proposed to be added to 030_pg_recvlogical.pl, I found a 
way to write it without using sleeps (which risk flakyness in CI). I've 
attached it as a patch for your consideration. I checked the test in the 
following way: on master it fails, but with your patch it passes.

Secondly I noticed in function sendFeedback at line 166, the startpos is 
set to output_written_lsn. This seems to counter conceptually the change 
you made in the patch, however it seems to not affect correctness. Shall 
we remove this line as to avoid confusion?

Вложения
On Mon, Nov 24, 2025 at 6:03 PM Mircea Cadariu <cadariu.mircea@gmail.com> wrote:
>
> Hi,
>
>
> An update: I have two topics from the review perspective.
>
> On the test I proposed to be added to 030_pg_recvlogical.pl, I found a
> way to write it without using sleeps (which risk flakyness in CI). I've
> attached it as a patch for your consideration. I checked the test in the
> following way: on master it fails, but with your patch it passes.

Thanks for writing the test case and turning it into a patch. I agree that
we should add a regression test to ensure the reported issue doesn't recur.

It looks like the v1 patch you attached accidentally includes
the patch file itself. Could you remove that?

After applying the patch, git diff --check reported trailing whitespace.
Could you fix that?

+    '--fsync-interval', '1',
+    '--status-interval', '1',

Wouldn't it be safer to use a larger value (e.g., 100) for --status-interval?
With a very small interval, the status feedback might happen before
the walsender is terminated and pg_recvlogical reconnects, which could
prevent the duplicate data from appearing even without the patch.

+use IPC::Run qw(start);
<snip>
+my $recv = start [

For simplicity, would it be better to avoid "use IPC::Run qw(start)" and
just call "IPC::Run::start" directly?

+# Wait only for initial connection
+$node->poll_query_until('postgres',
+    "SELECT active_pid IS NOT NULL FROM pg_replication_slots WHERE
slot_name = 'reconnect_test'");

This is unlikely, but pg_recvlogical's connection could be terminated
immediately after connecting, before receiving any data. If that happens,
the test might behave unexpectedly. To make the test more robust,
should we instead poll on:

        SELECT pg_read_file('$outfile') ~ 'INSERT'

instead, to ensure that some data has actually been received before
terminating the backend?


> Secondly I noticed in function sendFeedback at line 166, the startpos is
> set to output_written_lsn. This seems to counter conceptually the change
> you made in the patch, however it seems to not affect correctness. Shall
> we remove this line as to avoid confusion?

Isn't this necessary when - is specified for --file, causing OutputFsync() to
be skipped?

Regards,

--
Fujii Masao



Hi,

On 25/11/2025 17:16, Fujii Masao wrote:
> Thanks for writing the test case and turning it into a patch. I agree that
> we should add a regression test to ensure the reported issue doesn't recur.
Thanks for your feedback, updated patch is attached. Again, I checked 
that it fails in master, but passes with your patch.
> It looks like the v1 patch you attached accidentally includes
> the patch file itself. Could you remove that?
Whoops, not sure what happened there, fixed.
> +    '--fsync-interval', '1',
> +    '--status-interval', '1',
>
> Wouldn't it be safer to use a larger value (e.g., 100) for --status-interval?
> With a very small interval, the status feedback might happen before
> the walsender is terminated and pg_recvlogical reconnects, which could
> prevent the duplicate data from appearing even without the patch.
Yes indeed good one. I actually had it set to 60 in the previous version 
I sent earlier.

> +use IPC::Run qw(start);
> <snip>
> +my $recv = start [
>
> For simplicity, would it be better to avoid "use IPC::Run qw(start)" and
> just call "IPC::Run::start" directly?
Indeed, done.

> +# Wait only for initial connection
> +$node->poll_query_until('postgres',
> +    "SELECT active_pid IS NOT NULL FROM pg_replication_slots WHERE
> slot_name = 'reconnect_test'");
>
> This is unlikely, but pg_recvlogical's connection could be terminated
> immediately after connecting, before receiving any data. If that happens,
> the test might behave unexpectedly. To make the test more robust,
> should we instead poll on:
>
>          SELECT pg_read_file('$outfile') ~ 'INSERT'
>
> instead, to ensure that some data has actually been received before
> terminating the backend?

> +# Wait only for initial connection
> +$node->poll_query_until('postgres',
> +    "SELECT active_pid IS NOT NULL FROM pg_replication_slots WHERE
> slot_name = 'reconnect_test'");
>
> This is unlikely, but pg_recvlogical's connection could be terminated
> immediately after connecting, before receiving any data. If that happens,
> the test might behave unexpectedly. To make the test more robust,
> should we instead poll on:
>
>          SELECT pg_read_file('$outfile') ~ 'INSERT'
>
> instead, to ensure that some data has actually been received before
> terminating the backend?
>
>
>
>> Secondly I noticed in function sendFeedback at line 166, the startpos is
>> set to output_written_lsn. This seems to counter conceptually the change
>> you made in the patch, however it seems to not affect correctness. Shall
>> we remove this line as to avoid confusion?
> Isn't this necessary when - is specified for --file, causing 
> OutputFsync() to be skipped? Regards, 
Yes, for sure. Would really like to avoid introducing flake in CI due to 
this test.

> Isn't this necessary when - is specified for --file, causing 
> OutputFsync() to be skipped? 
Upon another look, indeed. When writing to a regular file (--file -) 
that assignment is redundant but harmless. But like you said, when 
writing to stdout, without that line, startpos would never be updated.

> Additionally, when the --no-loop option is used, I found that 
> pg_recvlogical
> could previously exit without flushing written data, risking data loss.
> The attached patch fixes this issue by also ensuring that all data is 
> flushed
> to disk before exiting with --no-loop.

Should we think of some kind of test also for this part?

-- 
Thanks,
Mircea Cadariu

Вложения
On Wed, Nov 26, 2025 at 10:25 PM Mircea Cadariu
<cadariu.mircea@gmail.com> wrote:
>
> Hi,
>
> On 25/11/2025 17:16, Fujii Masao wrote:
> > Thanks for writing the test case and turning it into a patch. I agree that
> > we should add a regression test to ensure the reported issue doesn't recur.
> Thanks for your feedback, updated patch is attached. Again, I checked
> that it fails in master, but passes with your patch.

Thanks for updating the patch and testing!

I've made a few minor adjustments to the test patch.
The updated version is attached.

Changes include:
- Tweaked and added some comments in the test.
- Ran pgperltidy to clean up the formatting of 030_pg_recvlogical.pl.
- Reused the existing table test_table instead of creating a new table t.
    (While considering a better name for t, I noticed test_table was
already available)
- Used the "option => value" style in IPC::Run::start() for
consistency with other tests.
- Simplified the SQL used to wait for INSERT to appear in
pg_recvlogical's output file.
- Switched from open() to slurp_file(), since other tests use
slurp_file() for reading files.

Thought?


> > Additionally, when the --no-loop option is used, I found that
> > pg_recvlogical
> > could previously exit without flushing written data, risking data loss.
> > The attached patch fixes this issue by also ensuring that all data is
> > flushed
> > to disk before exiting with --no-loop.
>
> Should we think of some kind of test also for this part?

I'm not sure if it's really worth adding such test...

Regards,

--
Fujii Masao

Вложения

Re: pg_recvlogical: Prevent flushed data from being re-sent after restarting replication

От
Mircea Cadariu
Дата:

Hi,

On 28/11/2025 02:15, Fujii Masao wrote:
I've made a few minor adjustments to the test patch.
The updated version is attached.

Thanks for the updated patch! Nice improvements.

Two futher proposals for the current version of the test. 

Shall we use slurp_file then everywhere we need file reads? (instead of pg_read_file)

The following can be seen as nits for your consideration. 

We can consider making the string provided in the "or die" to be consistent with the comment. We can pick one of the options below and specify the same for each.  

* receive and write the first INSERT / receive first INSERT

* establish a new connection / to reconnect

* receive and write  / receive

If we are mentioning multiple INSERTs instead of just one, might read better if we add the "s" at the end. This might be just my preference though, I leave it up to you. 

-- 
Thanks,
Mircea Cadariu
On 28/11/2025 02:15, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> I've made a few minor adjustments to the test patch.
> The updated version is attached.

Hi,
I was reading your code and had a question about the new code you added in the main() function of pg_recvlogical.c:
  if (outfd != -1 && strcmp(outfile, "-") != 0)
   OutputFsync(feGetCurrentTimestamp());
In the stream loop, the StreamLogicalLog() function already contains similar code:
  if (outfd != -1 &&
   feTimestampDifferenceExceeds(output_last_fsync, now,
           fsync_interval))
  {
   if (!OutputFsync(now))
    goto error;
  }

If the outfile becomes unwritable due to external reasons, would the error reporting here be redundant with the error handling in StreamLogicalLog()?

Best regards,
--
Yilin Zhang