Обсуждение: pg_recvlogical: Prevent flushed data from being re-sent after restarting replication
pg_recvlogical: Prevent flushed data from being re-sent after restarting replication
От
Fujii Masao
Дата:
Hi, When pg_recvlogical loses connection, it reconnects and restarts replication unless --no-loop option is used. I noticed that in this scenario, data that has already been flushed can be re-sent after restarting replication. This happens because the replication start position used when restarting replication is taken from the write position in the last status update message, which may be older than the actual position of the last flushed data. As a result, some flushed data could exist newer than the replication start position and be re-sent. Is this a bug? To fix this issue, I'd like to propose the attached patch that fixes this by ensuring all written data is flushed to disk before restarting replication and by using the last flushed position as the replication start point. This prevents already flushed data from being re-sent. Additionally, when the --no-loop option is used, I found that pg_recvlogical could previously exit without flushing written data, risking data loss. The attached patch fixes this issue by also ensuring that all data is flushed to disk before exiting with --no-loop. Thought? Regards, -- Fujii Masao
Вложения
Re: pg_recvlogical: Prevent flushed data from being re-sent after restarting replication
От
Mircea Cadariu
Дата:
Hi Fujii, I see what you mean. For reviewing I started with writing a test that just reproduces the bug and documents the current behaviour. I expected that by applying your patch the test would fail, but then we would just update the test accordingly. Surprisingly the test continues to pass. I attached the test for your consideration. I'll have a look again tomorrow at both. Kind regards, Mircea Cadariu
Вложения
Re: pg_recvlogical: Prevent flushed data from being re-sent after restarting replication
От
Mircea Cadariu
Дата:
Hi, Update: I added another test to the attached test-only patch. This new test uses pg_terminate_backend to trigger reconnection. Assuming the tests are fully correct (your input appreciated on this) we can use them to validate the solution. Kind regards, Mircea Cadariu
Вложения
On Wed, Nov 19, 2025 at 11:36 PM Mircea Cadariu <cadariu.mircea@gmail.com> wrote: > > Hi, > > > Update: I added another test to the attached test-only patch. This new > test uses pg_terminate_backend to trigger reconnection. > > Assuming the tests are fully correct (your input appreciated on this) we > can use them to validate the solution. Thanks for testing! BTW, I reproduced the issue as follows: #1. Start the server #2. Start pg_recvlogical: $ pg_recvlogical -S myslot -d postgres --create-slot --start -f test.out #3. Insert data every second: $ psql =# create table t(i serial); =# insert into t values(default); =# \watch 1 #4. In a separate session, terminate walsender to force pg_recvlogical to restart replication: $ psql =# select pg_terminate_backend(pid) from pg_stat_replication; #5. Wait for pg_recvlogical to restart replication #6. You will see duplicate records written to the output file, for example: $ cat test.out BEGIN 798 table public.t: INSERT: i[integer]:42 COMMIT 798 BEGIN 799 table public.t: INSERT: i[integer]:43 COMMIT 799 BEGIN 792 table public.t: INSERT: i[integer]:36 COMMIT 792 BEGIN 793 table public.t: INSERT: i[integer]:37 COMMIT 793 BEGIN 794 table public.t: INSERT: i[integer]:38 COMMIT 794 BEGIN 795 table public.t: INSERT: i[integer]:39 COMMIT 795 BEGIN 796 table public.t: INSERT: i[integer]:40 COMMIT 796 BEGIN 797 table public.t: INSERT: i[integer]:41 COMMIT 797 BEGIN 798 table public.t: INSERT: i[integer]:42 COMMIT 798 BEGIN 799 table public.t: INSERT: i[integer]:43 COMMIT 799 With the patch applied, these duplicate records no longer appear in the pg_recvlogical output. Regards, -- Fujii Masao
Re: pg_recvlogical: Prevent flushed data from being re-sent after restarting replication
От
Mircea Cadariu
Дата:
On 19/11/2025 14:54, Fujii Masao wrote:
> With the patch applied, these duplicate records no longer appear in
> the pg_recvlogical output.
Thanks! Works like a charm. I confirm duplicates no longer appear with
the patch applied.
You can consider adding a test about this in "030_pg_recvlogical.pl",
proposal below:
use IPC::Run qw(start);
my $outfile = $node->basedir . '/reconnect.out';
$node->command_ok(
[
'pg_recvlogical',
'--slot' => 'reconnect_test',
'--dbname' => $node->connstr('postgres'),
'--create-slot',
],
'slot created for reconnection test');
$node->safe_psql('postgres', 'CREATE TABLE t(x int);');
$node->safe_psql('postgres', 'INSERT INTO t VALUES (1);');
my $recv = start [
'pg_recvlogical',
'--slot', 'reconnect_test',
'--dbname', $node->connstr('postgres'),
'--start',
'--file', $outfile,
'--fsync-interval', '1',
'--status-interval', '60',
'--verbose'
], '>', \my $out, '2>', \my $err;
sleep(3);
my $backend_pid = $node->safe_psql('postgres',
"SELECT active_pid FROM pg_replication_slots WHERE slot_name =
'reconnect_test'");
if ($backend_pid ne '')
{
$node->safe_psql('postgres', "SELECT
pg_terminate_backend($backend_pid)");
}
sleep(6);
$node->safe_psql('postgres', 'INSERT INTO t VALUES (2);');
sleep(3);
$recv->signal('TERM');
$recv->finish();
open(my $file, '<', $outfile);
my $count = 0;
while (<$file>) {
if (/INSERT/) {
$count = $count + 1;
}
}
close($file);
cmp_ok($count, '==', 2, 'two INSERTs');
$node->command_ok(
[
'pg_recvlogical',
'--slot' => 'reconnect_test',
'--dbname' => $node->connstr('postgres'),
'--drop-slot'
],
'reconnect_test slot dropped');
--
Regards,
Mircea Cadariu
Re: pg_recvlogical: Prevent flushed data from being re-sent after restarting replication
От
Mircea Cadariu
Дата:
Hi, An update: I have two topics from the review perspective. On the test I proposed to be added to 030_pg_recvlogical.pl, I found a way to write it without using sleeps (which risk flakyness in CI). I've attached it as a patch for your consideration. I checked the test in the following way: on master it fails, but with your patch it passes. Secondly I noticed in function sendFeedback at line 166, the startpos is set to output_written_lsn. This seems to counter conceptually the change you made in the patch, however it seems to not affect correctness. Shall we remove this line as to avoid confusion?
Вложения
On Mon, Nov 24, 2025 at 6:03 PM Mircea Cadariu <cadariu.mircea@gmail.com> wrote:
>
> Hi,
>
>
> An update: I have two topics from the review perspective.
>
> On the test I proposed to be added to 030_pg_recvlogical.pl, I found a
> way to write it without using sleeps (which risk flakyness in CI). I've
> attached it as a patch for your consideration. I checked the test in the
> following way: on master it fails, but with your patch it passes.
Thanks for writing the test case and turning it into a patch. I agree that
we should add a regression test to ensure the reported issue doesn't recur.
It looks like the v1 patch you attached accidentally includes
the patch file itself. Could you remove that?
After applying the patch, git diff --check reported trailing whitespace.
Could you fix that?
+ '--fsync-interval', '1',
+ '--status-interval', '1',
Wouldn't it be safer to use a larger value (e.g., 100) for --status-interval?
With a very small interval, the status feedback might happen before
the walsender is terminated and pg_recvlogical reconnects, which could
prevent the duplicate data from appearing even without the patch.
+use IPC::Run qw(start);
<snip>
+my $recv = start [
For simplicity, would it be better to avoid "use IPC::Run qw(start)" and
just call "IPC::Run::start" directly?
+# Wait only for initial connection
+$node->poll_query_until('postgres',
+ "SELECT active_pid IS NOT NULL FROM pg_replication_slots WHERE
slot_name = 'reconnect_test'");
This is unlikely, but pg_recvlogical's connection could be terminated
immediately after connecting, before receiving any data. If that happens,
the test might behave unexpectedly. To make the test more robust,
should we instead poll on:
SELECT pg_read_file('$outfile') ~ 'INSERT'
instead, to ensure that some data has actually been received before
terminating the backend?
> Secondly I noticed in function sendFeedback at line 166, the startpos is
> set to output_written_lsn. This seems to counter conceptually the change
> you made in the patch, however it seems to not affect correctness. Shall
> we remove this line as to avoid confusion?
Isn't this necessary when - is specified for --file, causing OutputFsync() to
be skipped?
Regards,
--
Fujii Masao
Re: pg_recvlogical: Prevent flushed data from being re-sent after restarting replication
От
Mircea Cadariu
Дата:
Hi,
On 25/11/2025 17:16, Fujii Masao wrote:
> Thanks for writing the test case and turning it into a patch. I agree that
> we should add a regression test to ensure the reported issue doesn't recur.
Thanks for your feedback, updated patch is attached. Again, I checked
that it fails in master, but passes with your patch.
> It looks like the v1 patch you attached accidentally includes
> the patch file itself. Could you remove that?
Whoops, not sure what happened there, fixed.
> + '--fsync-interval', '1',
> + '--status-interval', '1',
>
> Wouldn't it be safer to use a larger value (e.g., 100) for --status-interval?
> With a very small interval, the status feedback might happen before
> the walsender is terminated and pg_recvlogical reconnects, which could
> prevent the duplicate data from appearing even without the patch.
Yes indeed good one. I actually had it set to 60 in the previous version
I sent earlier.
> +use IPC::Run qw(start);
> <snip>
> +my $recv = start [
>
> For simplicity, would it be better to avoid "use IPC::Run qw(start)" and
> just call "IPC::Run::start" directly?
Indeed, done.
> +# Wait only for initial connection
> +$node->poll_query_until('postgres',
> + "SELECT active_pid IS NOT NULL FROM pg_replication_slots WHERE
> slot_name = 'reconnect_test'");
>
> This is unlikely, but pg_recvlogical's connection could be terminated
> immediately after connecting, before receiving any data. If that happens,
> the test might behave unexpectedly. To make the test more robust,
> should we instead poll on:
>
> SELECT pg_read_file('$outfile') ~ 'INSERT'
>
> instead, to ensure that some data has actually been received before
> terminating the backend?
> +# Wait only for initial connection
> +$node->poll_query_until('postgres',
> + "SELECT active_pid IS NOT NULL FROM pg_replication_slots WHERE
> slot_name = 'reconnect_test'");
>
> This is unlikely, but pg_recvlogical's connection could be terminated
> immediately after connecting, before receiving any data. If that happens,
> the test might behave unexpectedly. To make the test more robust,
> should we instead poll on:
>
> SELECT pg_read_file('$outfile') ~ 'INSERT'
>
> instead, to ensure that some data has actually been received before
> terminating the backend?
>
>
>
>> Secondly I noticed in function sendFeedback at line 166, the startpos is
>> set to output_written_lsn. This seems to counter conceptually the change
>> you made in the patch, however it seems to not affect correctness. Shall
>> we remove this line as to avoid confusion?
> Isn't this necessary when - is specified for --file, causing
> OutputFsync() to be skipped? Regards,
Yes, for sure. Would really like to avoid introducing flake in CI due to
this test.
> Isn't this necessary when - is specified for --file, causing
> OutputFsync() to be skipped?
Upon another look, indeed. When writing to a regular file (--file -)
that assignment is redundant but harmless. But like you said, when
writing to stdout, without that line, startpos would never be updated.
> Additionally, when the --no-loop option is used, I found that
> pg_recvlogical
> could previously exit without flushing written data, risking data loss.
> The attached patch fixes this issue by also ensuring that all data is
> flushed
> to disk before exiting with --no-loop.
Should we think of some kind of test also for this part?
--
Thanks,
Mircea Cadariu
Вложения
On Wed, Nov 26, 2025 at 10:25 PM Mircea Cadariu
<cadariu.mircea@gmail.com> wrote:
>
> Hi,
>
> On 25/11/2025 17:16, Fujii Masao wrote:
> > Thanks for writing the test case and turning it into a patch. I agree that
> > we should add a regression test to ensure the reported issue doesn't recur.
> Thanks for your feedback, updated patch is attached. Again, I checked
> that it fails in master, but passes with your patch.
Thanks for updating the patch and testing!
I've made a few minor adjustments to the test patch.
The updated version is attached.
Changes include:
- Tweaked and added some comments in the test.
- Ran pgperltidy to clean up the formatting of 030_pg_recvlogical.pl.
- Reused the existing table test_table instead of creating a new table t.
(While considering a better name for t, I noticed test_table was
already available)
- Used the "option => value" style in IPC::Run::start() for
consistency with other tests.
- Simplified the SQL used to wait for INSERT to appear in
pg_recvlogical's output file.
- Switched from open() to slurp_file(), since other tests use
slurp_file() for reading files.
Thought?
> > Additionally, when the --no-loop option is used, I found that
> > pg_recvlogical
> > could previously exit without flushing written data, risking data loss.
> > The attached patch fixes this issue by also ensuring that all data is
> > flushed
> > to disk before exiting with --no-loop.
>
> Should we think of some kind of test also for this part?
I'm not sure if it's really worth adding such test...
Regards,
--
Fujii Masao
Вложения
Re: pg_recvlogical: Prevent flushed data from being re-sent after restarting replication
От
Mircea Cadariu
Дата:
Hi,
On 28/11/2025 02:15, Fujii Masao wrote:
I've made a few minor adjustments to the test patch. The updated version is attached.
Thanks for the updated patch! Nice improvements.
Two futher proposals for the current version of the test.
Shall we use slurp_file then everywhere we need file reads? (instead of pg_read_file)
The following can be seen as nits for your consideration.
We can consider making the string provided in the "or die" to be consistent with the comment. We can pick one of the options below and specify the same for each.
* receive and write the first INSERT / receive first INSERT
* establish a new connection / to reconnect
* receive and write / receive
If we are mentioning multiple INSERTs instead of just one, might read better if we add the "s" at the end. This might be just my preference though, I leave it up to you.
-- Thanks, Mircea Cadariu
Re:Re: pg_recvlogical: Prevent flushed data from being re-sent after restarting replication
От
"Yilin Zhang"
Дата:
On 28/11/2025 02:15, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> I've made a few minor adjustments to the test patch.
> The updated version is attached.
Hi,
I was reading your code and had a question about the new code you added in the main() function of pg_recvlogical.c:
if (outfd != -1 && strcmp(outfile, "-") != 0)
OutputFsync(feGetCurrentTimestamp());
In the stream loop, the StreamLogicalLog() function already contains similar code:
if (outfd != -1 &&
feTimestampDifferenceExceeds(output_last_fsync, now,
fsync_interval))
{
if (!OutputFsync(now))
goto error;
}
If the outfile becomes unwritable due to external reasons, would the error reporting here be redundant with the error handling in StreamLogicalLog()?
Best regards,
--
Yilin Zhang