Обсуждение: Excessive LOG messages from replication slot sync worker

Поиск

Список

Период

Сортировка

Excessive LOG messages from replication slot sync worker

От

Fujii Masao

Дата:

12 августа, 06:22:11

Hi,

While testing the replication slot synchronization feature, I set up
three instances on the same machine:

- Physical replication primary (also Logical replication publisher)
- Physical replication standby
- Logical replication subscriber

----------------------------------------
initdb -D pri --encoding=UTF8 --locale=C
cat <<EOF >> pri/postgresql.conf
wal_level = logical
log_line_prefix = '%m [pri][%b] '
logging_collector = on
EOF
pg_ctl -D pri start
pgbench -i
cat <<EOF | psql
CREATE PUBLICATION mypub FOR ALL TABLES;
SELECT pg_create_physical_replication_slot('pslot');
EOF
cat <<EOF >> pri/postgresql.conf
synchronized_standby_slots = 'pslot'
EOF
pg_ctl -D pri reload

pg_basebackup -D sby -c fast
touch sby/standby.signal
cat <<EOF >> sby/postgresql.conf
port = 5433
primary_conninfo = 'host=localhost port=5432 dbname=postgres'
primary_slot_name = 'pslot'
sync_replication_slots = on
hot_standby_feedback = on
log_line_prefix = '%m [sby][%b] '
EOF
pg_ctl -D sby start

initdb -D sub --encoding=UTF8 --locale=C
cat <<EOF >> sub/postgresql.conf
port = 5434
wal_level = logical
log_line_prefix = '%m [sub][%b] '
logging_collector = on
EOF
pg_ctl -D sub start
pgbench -i -Itp -p 5434
cat <<EOF | psql -p 5434
CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5432
dbname=postgres' PUBLICATION mypub WITH (failover);
EOF
----------------------------------------


I noticed that the replication slot sync worker on the standby generates
a large number of LOG messages, about two per 200–300 ms, during
pgbench runs (pgbench -c 10 -j 10 -T 60). For example,

----------------------------------------
2025-08-12 11:48:21.067 JST [sby][slotsync worker] LOG:  starting
logical decoding for slot "mysub"
2025-08-12 11:48:21.067 JST [sby][slotsync worker] DETAIL:  Streaming
transactions committing after 0/04F317B8, reading WAL from 0/04006FC0.
2025-08-12 11:48:21.067 JST [sby][slotsync worker] LOG:  logical
decoding found consistent point at 0/04006FC0
2025-08-12 11:48:21.067 JST [sby][slotsync worker] DETAIL:  There are
no running transactions.
2025-08-12 11:48:21.296 JST [sby][slotsync worker] LOG:  starting
logical decoding for slot "mysub"
2025-08-12 11:48:21.296 JST [sby][slotsync worker] DETAIL:  Streaming
transactions committing after 0/04F9DC90, reading WAL from 0/04006FC0.
2025-08-12 11:48:21.297 JST [sby][slotsync worker] LOG:  logical
decoding found consistent point at 0/04006FC0
2025-08-12 11:48:21.297 JST [sby][slotsync worker] DETAIL:  There are
no running transactions.
2025-08-12 11:48:21.526 JST [sby][slotsync worker] LOG:  starting
logical decoding for slot "mysub"
2025-08-12 11:48:21.526 JST [sby][slotsync worker] DETAIL:  Streaming
transactions committing after 0/0506A5C0, reading WAL from 0/04006FC0.
2025-08-12 11:48:21.527 JST [sby][slotsync worker] LOG:  logical
decoding found consistent point at 0/04006FC0
2025-08-12 11:48:21.527 JST [sby][slotsync worker] DETAIL:  There are
no running transactions.
2025-08-12 11:48:21.762 JST [sby][slotsync worker] LOG:  starting
logical decoding for slot "mysub"
2025-08-12 11:48:21.762 JST [sby][slotsync worker] DETAIL:  Streaming
transactions committing after 0/050C7F70, reading WAL from 0/04006FC0.
2025-08-12 11:48:21.763 JST [sby][slotsync worker] LOG:  logical
decoding found consistent point at 0/04006FC0
2025-08-12 11:48:21.763 JST [sby][slotsync worker] DETAIL:  There are
no running transactions.
2025-08-12 11:48:21.999 JST [sby][slotsync worker] LOG:  starting
logical decoding for slot "mysub"
2025-08-12 11:48:21.999 JST [sby][slotsync worker] DETAIL:  Streaming
transactions committing after 0/05111420, reading WAL from 0/04006FC0.
2025-08-12 11:48:21.999 JST [sby][slotsync worker] LOG:  logical
decoding found consistent point at 0/04006FC0
2025-08-12 11:48:21.999 JST [sby][slotsync worker] DETAIL:  There are
no running transactions.
----------------------------------------


 Is this expected behavior, a configuration mistake on my side, or perhaps
 an oversight in the commit of the replication slot synchronization feature?

This seems more like debug output left over from development rather than
something useful for users. If this is true, would it make sense to change
these messages from LOG to DEBUG?

Regards,


--
Fujii Masao

Re: Excessive LOG messages from replication slot sync worker

От

shveta malik

Дата:

12 августа, 09:54:10

On Tue, Aug 12, 2025 at 8:52 AM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> Hi,
>
> While testing the replication slot synchronization feature, I set up
> three instances on the same machine:
>
> - Physical replication primary (also Logical replication publisher)
> - Physical replication standby
> - Logical replication subscriber
>
> ----------------------------------------
> initdb -D pri --encoding=UTF8 --locale=C
> cat <<EOF >> pri/postgresql.conf
> wal_level = logical
> log_line_prefix = '%m [pri][%b] '
> logging_collector = on
> EOF
> pg_ctl -D pri start
> pgbench -i
> cat <<EOF | psql
> CREATE PUBLICATION mypub FOR ALL TABLES;
> SELECT pg_create_physical_replication_slot('pslot');
> EOF
> cat <<EOF >> pri/postgresql.conf
> synchronized_standby_slots = 'pslot'
> EOF
> pg_ctl -D pri reload
>
> pg_basebackup -D sby -c fast
> touch sby/standby.signal
> cat <<EOF >> sby/postgresql.conf
> port = 5433
> primary_conninfo = 'host=localhost port=5432 dbname=postgres'
> primary_slot_name = 'pslot'
> sync_replication_slots = on
> hot_standby_feedback = on
> log_line_prefix = '%m [sby][%b] '
> EOF
> pg_ctl -D sby start
>
> initdb -D sub --encoding=UTF8 --locale=C
> cat <<EOF >> sub/postgresql.conf
> port = 5434
> wal_level = logical
> log_line_prefix = '%m [sub][%b] '
> logging_collector = on
> EOF
> pg_ctl -D sub start
> pgbench -i -Itp -p 5434
> cat <<EOF | psql -p 5434
> CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5432
> dbname=postgres' PUBLICATION mypub WITH (failover);
> EOF
> ----------------------------------------
>
>
> I noticed that the replication slot sync worker on the standby generates
> a large number of LOG messages, about two per 200–300 ms, during
> pgbench runs (pgbench -c 10 -j 10 -T 60). For example,
>
> ----------------------------------------
> 2025-08-12 11:48:21.067 JST [sby][slotsync worker] LOG:  starting
> logical decoding for slot "mysub"
> 2025-08-12 11:48:21.067 JST [sby][slotsync worker] DETAIL:  Streaming
> transactions committing after 0/04F317B8, reading WAL from 0/04006FC0.
> 2025-08-12 11:48:21.067 JST [sby][slotsync worker] LOG:  logical
> decoding found consistent point at 0/04006FC0
> 2025-08-12 11:48:21.067 JST [sby][slotsync worker] DETAIL:  There are
> no running transactions.
> 2025-08-12 11:48:21.296 JST [sby][slotsync worker] LOG:  starting
> logical decoding for slot "mysub"
> 2025-08-12 11:48:21.296 JST [sby][slotsync worker] DETAIL:  Streaming
> transactions committing after 0/04F9DC90, reading WAL from 0/04006FC0.
> 2025-08-12 11:48:21.297 JST [sby][slotsync worker] LOG:  logical
> decoding found consistent point at 0/04006FC0
> 2025-08-12 11:48:21.297 JST [sby][slotsync worker] DETAIL:  There are
> no running transactions.
> 2025-08-12 11:48:21.526 JST [sby][slotsync worker] LOG:  starting
> logical decoding for slot "mysub"
> 2025-08-12 11:48:21.526 JST [sby][slotsync worker] DETAIL:  Streaming
> transactions committing after 0/0506A5C0, reading WAL from 0/04006FC0.
> 2025-08-12 11:48:21.527 JST [sby][slotsync worker] LOG:  logical
> decoding found consistent point at 0/04006FC0
> 2025-08-12 11:48:21.527 JST [sby][slotsync worker] DETAIL:  There are
> no running transactions.
> 2025-08-12 11:48:21.762 JST [sby][slotsync worker] LOG:  starting
> logical decoding for slot "mysub"
> 2025-08-12 11:48:21.762 JST [sby][slotsync worker] DETAIL:  Streaming
> transactions committing after 0/050C7F70, reading WAL from 0/04006FC0.
> 2025-08-12 11:48:21.763 JST [sby][slotsync worker] LOG:  logical
> decoding found consistent point at 0/04006FC0
> 2025-08-12 11:48:21.763 JST [sby][slotsync worker] DETAIL:  There are
> no running transactions.
> 2025-08-12 11:48:21.999 JST [sby][slotsync worker] LOG:  starting
> logical decoding for slot "mysub"
> 2025-08-12 11:48:21.999 JST [sby][slotsync worker] DETAIL:  Streaming
> transactions committing after 0/05111420, reading WAL from 0/04006FC0.
> 2025-08-12 11:48:21.999 JST [sby][slotsync worker] LOG:  logical
> decoding found consistent point at 0/04006FC0
> 2025-08-12 11:48:21.999 JST [sby][slotsync worker] DETAIL:  There are
> no running transactions.
> ----------------------------------------
>
>
>  Is this expected behavior, a configuration mistake on my side, or perhaps
>  an oversight in the commit of the replication slot synchronization feature?
>
> This seems more like debug output left over from development rather than
> something useful for users. If this is true, would it make sense to change
> these messages from LOG to DEBUG?
>

Thank You for reporting the issue.

It looks like commit 2633dae (mistakenly) introduced a change ([1]) in
SnapBuildSnapshotExists(), altering the format used for snapshot file
names during the search. However, SnapBuildSerialize still uses the
old format ("%s/%X-%X.snap"), which led to the slot-sync worker being
unable to locate existing snapshot files. Reverting the format
resolved the issue for me. Could you please check if the attached
patch works for you as well?

[1]:
SnapBuildSnapshotExists()
- sprintf(path, "%s/%X-%X.snap",
+ sprintf(path, "%s/%08X-%08X.snap",

thanks
Shveta

Вложения

0001-Revert-wrong-change-in-SnapBuildSnapshotExists.patch

Re: Excessive LOG messages from replication slot sync worker

От

Japin Li

Дата:

12 августа, 10:38:34

On Tue, Aug 12, 2025 at 12:24:10PM +0530, shveta malik wrote:
> On Tue, Aug 12, 2025 at 8:52 AM Fujii Masao <masao.fujii@gmail.com> wrote:
> >
> > Hi,
> >
> > While testing the replication slot synchronization feature, I set up
> > three instances on the same machine:
> >
> > - Physical replication primary (also Logical replication publisher)
> > - Physical replication standby
> > - Logical replication subscriber
> >
> > ----------------------------------------
> > initdb -D pri --encoding=UTF8 --locale=C
> > cat <<EOF >> pri/postgresql.conf
> > wal_level = logical
> > log_line_prefix = '%m [pri][%b] '
> > logging_collector = on
> > EOF
> > pg_ctl -D pri start
> > pgbench -i
> > cat <<EOF | psql
> > CREATE PUBLICATION mypub FOR ALL TABLES;
> > SELECT pg_create_physical_replication_slot('pslot');
> > EOF
> > cat <<EOF >> pri/postgresql.conf
> > synchronized_standby_slots = 'pslot'
> > EOF
> > pg_ctl -D pri reload
> >
> > pg_basebackup -D sby -c fast
> > touch sby/standby.signal
> > cat <<EOF >> sby/postgresql.conf
> > port = 5433
> > primary_conninfo = 'host=localhost port=5432 dbname=postgres'
> > primary_slot_name = 'pslot'
> > sync_replication_slots = on
> > hot_standby_feedback = on
> > log_line_prefix = '%m [sby][%b] '
> > EOF
> > pg_ctl -D sby start
> >
> > initdb -D sub --encoding=UTF8 --locale=C
> > cat <<EOF >> sub/postgresql.conf
> > port = 5434
> > wal_level = logical
> > log_line_prefix = '%m [sub][%b] '
> > logging_collector = on
> > EOF
> > pg_ctl -D sub start
> > pgbench -i -Itp -p 5434
> > cat <<EOF | psql -p 5434
> > CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5432
> > dbname=postgres' PUBLICATION mypub WITH (failover);
> > EOF
> > ----------------------------------------
> >
> >
> > I noticed that the replication slot sync worker on the standby generates
> > a large number of LOG messages, about two per 200–300 ms, during
> > pgbench runs (pgbench -c 10 -j 10 -T 60). For example,
> >
> > ----------------------------------------
> > 2025-08-12 11:48:21.067 JST [sby][slotsync worker] LOG:  starting
> > logical decoding for slot "mysub"
> > 2025-08-12 11:48:21.067 JST [sby][slotsync worker] DETAIL:  Streaming
> > transactions committing after 0/04F317B8, reading WAL from 0/04006FC0.
> > 2025-08-12 11:48:21.067 JST [sby][slotsync worker] LOG:  logical
> > decoding found consistent point at 0/04006FC0
> > 2025-08-12 11:48:21.067 JST [sby][slotsync worker] DETAIL:  There are
> > no running transactions.
> > 2025-08-12 11:48:21.296 JST [sby][slotsync worker] LOG:  starting
> > logical decoding for slot "mysub"
> > 2025-08-12 11:48:21.296 JST [sby][slotsync worker] DETAIL:  Streaming
> > transactions committing after 0/04F9DC90, reading WAL from 0/04006FC0.
> > 2025-08-12 11:48:21.297 JST [sby][slotsync worker] LOG:  logical
> > decoding found consistent point at 0/04006FC0
> > 2025-08-12 11:48:21.297 JST [sby][slotsync worker] DETAIL:  There are
> > no running transactions.
> > 2025-08-12 11:48:21.526 JST [sby][slotsync worker] LOG:  starting
> > logical decoding for slot "mysub"
> > 2025-08-12 11:48:21.526 JST [sby][slotsync worker] DETAIL:  Streaming
> > transactions committing after 0/0506A5C0, reading WAL from 0/04006FC0.
> > 2025-08-12 11:48:21.527 JST [sby][slotsync worker] LOG:  logical
> > decoding found consistent point at 0/04006FC0
> > 2025-08-12 11:48:21.527 JST [sby][slotsync worker] DETAIL:  There are
> > no running transactions.
> > 2025-08-12 11:48:21.762 JST [sby][slotsync worker] LOG:  starting
> > logical decoding for slot "mysub"
> > 2025-08-12 11:48:21.762 JST [sby][slotsync worker] DETAIL:  Streaming
> > transactions committing after 0/050C7F70, reading WAL from 0/04006FC0.
> > 2025-08-12 11:48:21.763 JST [sby][slotsync worker] LOG:  logical
> > decoding found consistent point at 0/04006FC0
> > 2025-08-12 11:48:21.763 JST [sby][slotsync worker] DETAIL:  There are
> > no running transactions.
> > 2025-08-12 11:48:21.999 JST [sby][slotsync worker] LOG:  starting
> > logical decoding for slot "mysub"
> > 2025-08-12 11:48:21.999 JST [sby][slotsync worker] DETAIL:  Streaming
> > transactions committing after 0/05111420, reading WAL from 0/04006FC0.
> > 2025-08-12 11:48:21.999 JST [sby][slotsync worker] LOG:  logical
> > decoding found consistent point at 0/04006FC0
> > 2025-08-12 11:48:21.999 JST [sby][slotsync worker] DETAIL:  There are
> > no running transactions.
> > ----------------------------------------
> >
> >
> >  Is this expected behavior, a configuration mistake on my side, or perhaps
> >  an oversight in the commit of the replication slot synchronization feature?
> >
> > This seems more like debug output left over from development rather than
> > something useful for users. If this is true, would it make sense to change
> > these messages from LOG to DEBUG?
> >
> 
> Thank You for reporting the issue.
> 
> It looks like commit 2633dae (mistakenly) introduced a change ([1]) in
> SnapBuildSnapshotExists(), altering the format used for snapshot file
> names during the search. However, SnapBuildSerialize still uses the
> old format ("%s/%X-%X.snap"), which led to the slot-sync worker being
> unable to locate existing snapshot files. Reverting the format
> resolved the issue for me. Could you please check if the attached
> patch works for you as well?
> 
> [1]:
> SnapBuildSnapshotExists()
> - sprintf(path, "%s/%X-%X.snap",
> + sprintf(path, "%s/%08X-%08X.snap",
> 

Sorry, it's my fault.  I forgot to update the format in SnapBuildSerialize().

I'd prefer to update the format in SnapBuildSerialize() instead of reverting
this change.

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 8532bfd27e5..db7051170fc 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1529,7 +1529,7 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
      * unless the user used pg_resetwal or similar. If a user did so, there's
      * no hope continuing to decode anyway.
      */
-    sprintf(path, "%s/%X-%X.snap",
+    sprintf(path, "%s/%08X-%08X.snap",
             PG_LOGICAL_SNAPSHOTS_DIR,
             LSN_FORMAT_ARGS(lsn));

@@ -1573,7 +1573,7 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
     elog(DEBUG1, "serializing snapshot to %s", path);

     /* to make sure only we will write to this tempfile, include pid */
-    sprintf(tmppath, "%s/%X-%X.snap.%d.tmp",
+    sprintf(tmppath, "%s/%08X-%08X.snap.%d.tmp",
             PG_LOGICAL_SNAPSHOTS_DIR,
             LSN_FORMAT_ARGS(lsn), MyProcPid);

-- 
Best regards,
Japin Li
ChengDu WenWu Information Technology Co., LTD.

Re: Excessive LOG messages from replication slot sync worker

От

Fujii Masao

Дата:

12 августа, 11:57:45

On Tue, Aug 12, 2025 at 4:38 PM Japin Li <japinli@hotmail.com> wrote:
>
> On Tue, Aug 12, 2025 at 12:24:10PM +0530, shveta malik wrote:
> > It looks like commit 2633dae (mistakenly) introduced a change ([1]) in
> > SnapBuildSnapshotExists(), altering the format used for snapshot file
> > names during the search. However, SnapBuildSerialize still uses the
> > old format ("%s/%X-%X.snap"), which led to the slot-sync worker being
> > unable to locate existing snapshot files.

Thanks for the investigation!


> Sorry, it's my fault.  I forgot to update the format in SnapBuildSerialize().
>
> I'd prefer to update the format in SnapBuildSerialize() instead of reverting
> this change.

+1.
If we make this change, we should also update other places using "%X-%X"?

$ git grep -E "%X-%X.(snap|tmp)"
contrib/pg_logicalinspect/pg_logicalinspect.c:  if (sscanf(filename,
"%X-%X.snap", &hi, &lo) != 2)
contrib/pg_logicalinspect/pg_logicalinspect.c:  sprintf(tmpfname,
"%X-%X.snap", hi, lo);
src/backend/replication/logical/snapbuild.c:    sprintf(path, "%s/%X-%X.snap",
src/backend/replication/logical/snapbuild.c:    sprintf(tmppath,
"%s/%X-%X.snap.%d.tmp",
src/backend/replication/logical/snapbuild.c:    sprintf(path, "%s/%X-%X.snap",
src/backend/replication/logical/snapbuild.c:            if
(sscanf(snap_de->d_name, "%X-%X.snap", &hi, &lo) != 2)

Regards,

--
Fujii Masao

Re: Excessive LOG messages from replication slot sync worker

От

Japin Li

Дата:

12 августа, 12:25:30

On Tue, Aug 12, 2025 at 05:57:45PM +0900, Fujii Masao wrote:
> On Tue, Aug 12, 2025 at 4:38 PM Japin Li <japinli@hotmail.com> wrote:
> >
> > On Tue, Aug 12, 2025 at 12:24:10PM +0530, shveta malik wrote:
> > > It looks like commit 2633dae (mistakenly) introduced a change ([1]) in
> > > SnapBuildSnapshotExists(), altering the format used for snapshot file
> > > names during the search. However, SnapBuildSerialize still uses the
> > > old format ("%s/%X-%X.snap"), which led to the slot-sync worker being
> > > unable to locate existing snapshot files.
> 
> Thanks for the investigation!
> 
> 
> > Sorry, it's my fault.  I forgot to update the format in SnapBuildSerialize().
> >
> > I'd prefer to update the format in SnapBuildSerialize() instead of reverting
> > this change.
> 
> +1.
> If we make this change, we should also update other places using "%X-%X"?
> 
Agreed.

> $ git grep -E "%X-%X.(snap|tmp)"
> contrib/pg_logicalinspect/pg_logicalinspect.c:  if (sscanf(filename,
> "%X-%X.snap", &hi, &lo) != 2)
> contrib/pg_logicalinspect/pg_logicalinspect.c:  sprintf(tmpfname,
> "%X-%X.snap", hi, lo);
> src/backend/replication/logical/snapbuild.c:    sprintf(path, "%s/%X-%X.snap",
> src/backend/replication/logical/snapbuild.c:    sprintf(tmppath,
> "%s/%X-%X.snap.%d.tmp",
> src/backend/replication/logical/snapbuild.c:    sprintf(path, "%s/%X-%X.snap",
> src/backend/replication/logical/snapbuild.c:            if
> (sscanf(snap_de->d_name, "%X-%X.snap", &hi, &lo) != 2)
> 

I believe that the format %X-%X also works with sscanf(). However, to maintain
consistency, the format for sscanf() has been updated as well.

-- 
Best regards,
Japin Li
ChengDu WenWu Information Technology Co., LTD.

Вложения

v2-0001-Standardize-snapshot-filename-formatting.patch

Re: Excessive LOG messages from replication slot sync worker

От

Fujii Masao

Дата:

12 августа, 13:14:38

On Tue, Aug 12, 2025 at 6:25 PM Japin Li <japinli@hotmail.com> wrote:
> I believe that the format %X-%X also works with sscanf(). However, to maintain
> consistency, the format for sscanf() has been updated as well.

Yes.
Thanks for the patch!

Since we're changing the first "%X" in "%X-%X" to "%08X", the example
file names in the docs should be updated too. For example:

$ git grep "\.snap" | grep pglogicalinspect.sgml
doc/src/sgml/pglogicalinspect.sgml:name         | 0-40796E18.snap
doc/src/sgml/pglogicalinspect.sgml:postgres=# SELECT * FROM
pg_get_logical_snapshot_meta('0-40796E18.snap');
doc/src/sgml/pglogicalinspect.sgml:name     | 0-40796E18.snap
doc/src/sgml/pglogicalinspect.sgml:name         | 0-40796E18.snap
doc/src/sgml/pglogicalinspect.sgml:postgres=# SELECT * FROM
pg_get_logical_snapshot_info('0-40796E18.snap');
doc/src/sgml/pglogicalinspect.sgml:name                     | 0-40796E18.snap

I also noticed that the regression tests for pg_logicalinspect use file
names in the old format. This doesn't cause test failures, but should
we update them to match the new format?

...
contrib/pg_logicalinspect/sql/pg_logicalinspect.sql:SELECT
pg_get_logical_snapshot_info('0/40796E18.snap');
contrib/pg_logicalinspect/sql/pg_logicalinspect.sql:SELECT
pg_get_logical_snapshot_info('../snapshots/0-40796E18.snap');
contrib/pg_logicalinspect/sql/pg_logicalinspect.sql:SELECT
pg_get_logical_snapshot_meta('0-40796E18.foo.snap');
contrib/pg_logicalinspect/sql/pg_logicalinspect.sql:SELECT
pg_get_logical_snapshot_meta('0/40796E18.snap');
...

Regards,

--
Fujii Masao

Re: Excessive LOG messages from replication slot sync worker

От

Fujii Masao

Дата:

12 августа, 13:58:25

On Tue, Aug 12, 2025 at 7:14 PM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Tue, Aug 12, 2025 at 6:25 PM Japin Li <japinli@hotmail.com> wrote:
> > I believe that the format %X-%X also works with sscanf(). However, to maintain
> > consistency, the format for sscanf() has been updated as well.
>
> Yes.
> Thanks for the patch!

With the patch, the excessive LOG messages from the replication slot sync
worker are gone. However, it still logged periodically, about two messages
every 30 seconds, for example:

2025-08-12 19:49:03.288 JST [sby][slotsync worker] LOG:  starting
logical decoding for slot "mysub"
2025-08-12 19:49:03.288 JST [sby][slotsync worker] DETAIL:  Streaming
transactions committing after 0/0FAD3940, reading WAL from 0/0D4C9CE8.
2025-08-12 19:49:03.289 JST [sby][slotsync worker] LOG:  logical
decoding found consistent point at 0/0D4C9CE8
2025-08-12 19:49:03.289 JST [sby][slotsync worker] DETAIL:  Logical
decoding will begin using saved snapshot.
2025-08-12 19:49:46.048 JST [sby][slotsync worker] LOG:  starting
logical decoding for slot "mysub"
2025-08-12 19:49:46.048 JST [sby][slotsync worker] DETAIL:  Streaming
transactions committing after 0/10A60230, reading WAL from 0/0FAEE120.
2025-08-12 19:49:46.049 JST [sby][slotsync worker] LOG:  logical
decoding found consistent point at 0/0FAEE120
2025-08-12 19:49:46.049 JST [sby][slotsync worker] DETAIL:  Logical
decoding will begin using saved snapshot.

I'm not sure these messages are useful for end users, and LOG might not be
the right level. They seem more like debug information, so should we consider
changing them to DEBUG, even with the proposed patch applied?

Regards,

--
Fujii Masao

Re: Excessive LOG messages from replication slot sync worker

От

Japin Li

Дата:

12 августа, 14:05:56

On Tue, Aug 12, 2025 at 07:14:38PM +0900, Fujii Masao wrote:
> On Tue, Aug 12, 2025 at 6:25 PM Japin Li <japinli@hotmail.com> wrote:
> > I believe that the format %X-%X also works with sscanf(). However, to maintain
> > consistency, the format for sscanf() has been updated as well.
> 
> Yes.
> Thanks for the patch!
> 
> Since we're changing the first "%X" in "%X-%X" to "%08X", the example
> file names in the docs should be updated too. For example:
> 
> $ git grep "\.snap" | grep pglogicalinspect.sgml
> doc/src/sgml/pglogicalinspect.sgml:name         | 0-40796E18.snap
> doc/src/sgml/pglogicalinspect.sgml:postgres=# SELECT * FROM
> pg_get_logical_snapshot_meta('0-40796E18.snap');
> doc/src/sgml/pglogicalinspect.sgml:name     | 0-40796E18.snap
> doc/src/sgml/pglogicalinspect.sgml:name         | 0-40796E18.snap
> doc/src/sgml/pglogicalinspect.sgml:postgres=# SELECT * FROM
> pg_get_logical_snapshot_info('0-40796E18.snap');
> doc/src/sgml/pglogicalinspect.sgml:name                     | 0-40796E18.snap
> 
> I also noticed that the regression tests for pg_logicalinspect use file
> names in the old format. This doesn't cause test failures, but should
> we update them to match the new format?
> 
> ...
> contrib/pg_logicalinspect/sql/pg_logicalinspect.sql:SELECT
> pg_get_logical_snapshot_info('0/40796E18.snap');
> contrib/pg_logicalinspect/sql/pg_logicalinspect.sql:SELECT
> pg_get_logical_snapshot_info('../snapshots/0-40796E18.snap');
> contrib/pg_logicalinspect/sql/pg_logicalinspect.sql:SELECT
> pg_get_logical_snapshot_meta('0-40796E18.foo.snap');
> contrib/pg_logicalinspect/sql/pg_logicalinspect.sql:SELECT
> pg_get_logical_snapshot_meta('0/40796E18.snap');
> ...
> 

Update in v3 patch.

OTOH, I also update reorder buffer spill file path.

$ git grep -E 'xid-.*-lsn.*spill'
src/backend/replication/logical/reorderbuffer.c: snprintf(path, MAXPGPATH, "%s/%s/xid-%u-lsn-%X-%X.spill",

-- 
Best regards,
Japin Li
ChengDu WenWu Information Technology Co., LTD.

Вложения

v3-0001-Standardize-snapshot-filename-formatting.patch

Re: Excessive LOG messages from replication slot sync worker

От

Japin Li

Дата:

12 августа, 14:13:51

On Tue, Aug 12, 2025 at 07:58:25PM +0900, Fujii Masao wrote:
> On Tue, Aug 12, 2025 at 7:14 PM Fujii Masao <masao.fujii@gmail.com> wrote:
> >
> > On Tue, Aug 12, 2025 at 6:25 PM Japin Li <japinli@hotmail.com> wrote:
> > > I believe that the format %X-%X also works with sscanf(). However, to maintain
> > > consistency, the format for sscanf() has been updated as well.
> >
> > Yes.
> > Thanks for the patch!
> 
> With the patch, the excessive LOG messages from the replication slot sync
> worker are gone. However, it still logged periodically, about two messages
> every 30 seconds, for example:
> 
> 2025-08-12 19:49:03.288 JST [sby][slotsync worker] LOG:  starting
> logical decoding for slot "mysub"
> 2025-08-12 19:49:03.288 JST [sby][slotsync worker] DETAIL:  Streaming
> transactions committing after 0/0FAD3940, reading WAL from 0/0D4C9CE8.
> 2025-08-12 19:49:03.289 JST [sby][slotsync worker] LOG:  logical
> decoding found consistent point at 0/0D4C9CE8
> 2025-08-12 19:49:03.289 JST [sby][slotsync worker] DETAIL:  Logical
> decoding will begin using saved snapshot.
> 2025-08-12 19:49:46.048 JST [sby][slotsync worker] LOG:  starting
> logical decoding for slot "mysub"
> 2025-08-12 19:49:46.048 JST [sby][slotsync worker] DETAIL:  Streaming
> transactions committing after 0/10A60230, reading WAL from 0/0FAEE120.
> 2025-08-12 19:49:46.049 JST [sby][slotsync worker] LOG:  logical
> decoding found consistent point at 0/0FAEE120
> 2025-08-12 19:49:46.049 JST [sby][slotsync worker] DETAIL:  Logical
> decoding will begin using saved snapshot.
> 
>
I think this is not related to the logical snapshot name.

> 
> I'm not sure these messages are useful for end users, and LOG might not be
> the right level. They seem more like debug information, so should we consider
> changing them to DEBUG, even with the proposed patch applied?

+1.

-- 
Best regards,
Japin Li
ChengDu WenWu Information Technology Co., LTD.

Re: Excessive LOG messages from replication slot sync worker

От

Fujii Masao

Дата:

12 августа, 16:02:32

On Tue, Aug 12, 2025 at 8:06 PM Japin Li <japinli@hotmail.com> wrote:
>
> On Tue, Aug 12, 2025 at 07:14:38PM +0900, Fujii Masao wrote:
> > On Tue, Aug 12, 2025 at 6:25 PM Japin Li <japinli@hotmail.com> wrote:
> > > I believe that the format %X-%X also works with sscanf(). However, to maintain
> > > consistency, the format for sscanf() has been updated as well.
> >
> > Yes.
> > Thanks for the patch!
> >
> > Since we're changing the first "%X" in "%X-%X" to "%08X", the example
> > file names in the docs should be updated too. For example:
> >
> > $ git grep "\.snap" | grep pglogicalinspect.sgml
> > doc/src/sgml/pglogicalinspect.sgml:name         | 0-40796E18.snap
> > doc/src/sgml/pglogicalinspect.sgml:postgres=# SELECT * FROM
> > pg_get_logical_snapshot_meta('0-40796E18.snap');
> > doc/src/sgml/pglogicalinspect.sgml:name     | 0-40796E18.snap
> > doc/src/sgml/pglogicalinspect.sgml:name         | 0-40796E18.snap
> > doc/src/sgml/pglogicalinspect.sgml:postgres=# SELECT * FROM
> > pg_get_logical_snapshot_info('0-40796E18.snap');
> > doc/src/sgml/pglogicalinspect.sgml:name                     | 0-40796E18.snap
> >
> > I also noticed that the regression tests for pg_logicalinspect use file
> > names in the old format. This doesn't cause test failures, but should
> > we update them to match the new format?
> >
> > ...
> > contrib/pg_logicalinspect/sql/pg_logicalinspect.sql:SELECT
> > pg_get_logical_snapshot_info('0/40796E18.snap');
> > contrib/pg_logicalinspect/sql/pg_logicalinspect.sql:SELECT
> > pg_get_logical_snapshot_info('../snapshots/0-40796E18.snap');
> > contrib/pg_logicalinspect/sql/pg_logicalinspect.sql:SELECT
> > pg_get_logical_snapshot_meta('0-40796E18.foo.snap');
> > contrib/pg_logicalinspect/sql/pg_logicalinspect.sql:SELECT
> > pg_get_logical_snapshot_meta('0/40796E18.snap');
> > ...
> >
>
> Update in v3 patch.

Thanks for updating the patch!


> OTOH, I also update reorder buffer spill file path.
>
> $ git grep -E 'xid-.*-lsn.*spill'
> src/backend/replication/logical/reorderbuffer.c: snprintf(path, MAXPGPATH, "%s/%s/xid-%u-lsn-%X-%X.spill",

Should we also update LOGICAL_REWRITE_FORMAT to use %08X?

That said, the number of places to change is growing, and we're drifting
away from the original issue.... So I started thinking it might be better
to simply applying Shveta's patch to revert the incorrect format change
to fix the problem, and then discussing LSN-based filename standardization
in a separate thread.

Regards,

--
Fujii Masao

Re: Excessive LOG messages from replication slot sync worker

От

Japin Li

Дата:

13 августа, 03:59:21

On Tue, Aug 12, 2025 at 10:02:32PM +0900, Fujii Masao wrote:
> On Tue, Aug 12, 2025 at 8:06 PM Japin Li <japinli@hotmail.com> wrote:
> > OTOH, I also update reorder buffer spill file path.
> >
> > $ git grep -E 'xid-.*-lsn.*spill'
> > src/backend/replication/logical/reorderbuffer.c: snprintf(path, MAXPGPATH, "%s/%s/xid-%u-lsn-%X-%X.spill",
> 
> Should we also update LOGICAL_REWRITE_FORMAT to use %08X?
> 
> That said, the number of places to change is growing, and we're drifting
> away from the original issue.... So I started thinking it might be better
> to simply applying Shveta's patch to revert the incorrect format change
> to fix the problem, and then discussing LSN-based filename standardization
> in a separate thread.
> 

Agreed.  See the new thread [1] for a discussion on this.

[1]
https://www.postgresql.org/message-id/ME0P300MB0445795E3FF4CA5A1143164FB62AA@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM

-- 
Best regards,
Japin Li
ChengDu WenWu Information Technology Co., LTD.

Re: Excessive LOG messages from replication slot sync worker

От

Amit Kapila

Дата:

13 августа, 05:53:59

On Tue, Aug 12, 2025 at 6:32 PM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> Should we also update LOGICAL_REWRITE_FORMAT to use %08X?
>
> That said, the number of places to change is growing, and we're drifting
> away from the original issue.... So I started thinking it might be better
> to simply applying Shveta's patch to revert the incorrect format change
> to fix the problem, and then discussing LSN-based filename standardization
> in a separate thread.
>

+1.

--
With Regards,
Amit Kapila.

Re: Excessive LOG messages from replication slot sync worker

От

Amit Kapila

Дата:

13 августа, 07:00:23

On Tue, Aug 12, 2025 at 4:28 PM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> I'm not sure these messages are useful for end users, and LOG might not be
> the right level. They seem more like debug information, so should we consider
> changing them to DEBUG, even with the proposed patch applied?
>

Yeah, we can do that for slotsync path but it should be okay to keep
at LOG level for logical replication/decoding path as there it
shouldn't hit as often as in slotsync path. But OTH, there is a
argument that it can be DEBUG for logical replication/decoding path as
well. From my experience, I can tell that these messages have been
helpful in finding BF failures and debugging bugs from user reports,
so there is value in keeping them at LOG level.

--
With Regards,
Amit Kapila.

Re: Excessive LOG messages from replication slot sync worker

От

Fujii Masao

Дата:

14 августа, 07:08:40

On Wed, Aug 13, 2025 at 11:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Aug 12, 2025 at 6:32 PM Fujii Masao <masao.fujii@gmail.com> wrote:
> >
> > Should we also update LOGICAL_REWRITE_FORMAT to use %08X?
> >
> > That said, the number of places to change is growing, and we're drifting
> > away from the original issue.... So I started thinking it might be better
> > to simply applying Shveta's patch to revert the incorrect format change
> > to fix the problem, and then discussing LSN-based filename standardization
> > in a separate thread.
> >
>
> +1.

I've pushed Shveta's patch. Thanks!

Regards,

--
Fujii Masao

Re: Excessive LOG messages from replication slot sync worker

От

Fujii Masao

Дата:

14 августа, 07:12:47

On Wed, Aug 13, 2025 at 1:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Aug 12, 2025 at 4:28 PM Fujii Masao <masao.fujii@gmail.com> wrote:
> >
> > I'm not sure these messages are useful for end users, and LOG might not be
> > the right level. They seem more like debug information, so should we consider
> > changing them to DEBUG, even with the proposed patch applied?
> >
>
> Yeah, we can do that for slotsync path but it should be okay to keep
> at LOG level for logical replication/decoding path as there it
> shouldn't hit as often as in slotsync path.

It looks like those two LOG messages are also generated by
pg_logical_slot_get_binary_changes(). When this function is
called every few seconds to capture changes in real time,
it can produce a large number of these messages. So shouldn't
we consider changing their log level to DEBUG also in such cases?

> But OTH, there is a
> argument that it can be DEBUG for logical replication/decoding path as
> well. From my experience, I can tell that these messages have been
> helpful in finding BF failures and debugging bugs from user reports,
> so there is value in keeping them at LOG level.

I understand these messages are useful for debugging,
but if they're now not intended for end users, perhaps they
should be DEBUG-level instead? We could also introduce
a new GUC to control whether these messages are logged for
debugging purposes and use it in regression tests for logical
decoding, or just set log_min_messages to DEBUG in those tests.
Thought?

Regards,

--
Fujii Masao

Re: Excessive LOG messages from replication slot sync worker

От

Amit Kapila

Дата:

14 августа, 12:09:53

On Thu, Aug 14, 2025 at 9:43 AM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Wed, Aug 13, 2025 at 1:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Aug 12, 2025 at 4:28 PM Fujii Masao <masao.fujii@gmail.com> wrote:
> > >
> > > I'm not sure these messages are useful for end users, and LOG might not be
> > > the right level. They seem more like debug information, so should we consider
> > > changing them to DEBUG, even with the proposed patch applied?
> > >
> >
> > Yeah, we can do that for slotsync path but it should be okay to keep
> > at LOG level for logical replication/decoding path as there it
> > shouldn't hit as often as in slotsync path.
>
> It looks like those two LOG messages are also generated by
> pg_logical_slot_get_binary_changes(). When this function is
> called every few seconds to capture changes in real time,
> it can produce a large number of these messages.
>

I am not sure how plugin developers or users are using these
functions, but for built-in logical replication, these messages should
appear at the start of the walsender once. This area is relatively
complex so it becomes easier for us to understand the bug reports with
the help of these messages. TBH, I don't know if they are really
helpful for users so I see the merits of your point to change the
level of these LOGs. However, as a selfish developer who works on
various bugs in this area I found these and other logical decoding
messages quite useful to analyze the bug reports. I haven't checked
all but there is another message as follows which would fall in same
category:
elog(LOG, "%X/%08X has been already streamed, forwarding to %X/%08X",

BTW, I have not seen user reports where they complain about these
LOGs. Now, it is possible that users may not have noticed or maybe I
missed to spot such reports.

--
With Regards,
Amit Kapila.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Excessive LOG messages from replication slot sync worker

Вложения

Вложения

Вложения