Обсуждение: Excessive LOG messages from replication slot sync worker
Hi, While testing the replication slot synchronization feature, I set up three instances on the same machine: - Physical replication primary (also Logical replication publisher) - Physical replication standby - Logical replication subscriber ---------------------------------------- initdb -D pri --encoding=UTF8 --locale=C cat <<EOF >> pri/postgresql.conf wal_level = logical log_line_prefix = '%m [pri][%b] ' logging_collector = on EOF pg_ctl -D pri start pgbench -i cat <<EOF | psql CREATE PUBLICATION mypub FOR ALL TABLES; SELECT pg_create_physical_replication_slot('pslot'); EOF cat <<EOF >> pri/postgresql.conf synchronized_standby_slots = 'pslot' EOF pg_ctl -D pri reload pg_basebackup -D sby -c fast touch sby/standby.signal cat <<EOF >> sby/postgresql.conf port = 5433 primary_conninfo = 'host=localhost port=5432 dbname=postgres' primary_slot_name = 'pslot' sync_replication_slots = on hot_standby_feedback = on log_line_prefix = '%m [sby][%b] ' EOF pg_ctl -D sby start initdb -D sub --encoding=UTF8 --locale=C cat <<EOF >> sub/postgresql.conf port = 5434 wal_level = logical log_line_prefix = '%m [sub][%b] ' logging_collector = on EOF pg_ctl -D sub start pgbench -i -Itp -p 5434 cat <<EOF | psql -p 5434 CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5432 dbname=postgres' PUBLICATION mypub WITH (failover); EOF ---------------------------------------- I noticed that the replication slot sync worker on the standby generates a large number of LOG messages, about two per 200–300 ms, during pgbench runs (pgbench -c 10 -j 10 -T 60). For example, ---------------------------------------- 2025-08-12 11:48:21.067 JST [sby][slotsync worker] LOG: starting logical decoding for slot "mysub" 2025-08-12 11:48:21.067 JST [sby][slotsync worker] DETAIL: Streaming transactions committing after 0/04F317B8, reading WAL from 0/04006FC0. 2025-08-12 11:48:21.067 JST [sby][slotsync worker] LOG: logical decoding found consistent point at 0/04006FC0 2025-08-12 11:48:21.067 JST [sby][slotsync worker] DETAIL: There are no running transactions. 2025-08-12 11:48:21.296 JST [sby][slotsync worker] LOG: starting logical decoding for slot "mysub" 2025-08-12 11:48:21.296 JST [sby][slotsync worker] DETAIL: Streaming transactions committing after 0/04F9DC90, reading WAL from 0/04006FC0. 2025-08-12 11:48:21.297 JST [sby][slotsync worker] LOG: logical decoding found consistent point at 0/04006FC0 2025-08-12 11:48:21.297 JST [sby][slotsync worker] DETAIL: There are no running transactions. 2025-08-12 11:48:21.526 JST [sby][slotsync worker] LOG: starting logical decoding for slot "mysub" 2025-08-12 11:48:21.526 JST [sby][slotsync worker] DETAIL: Streaming transactions committing after 0/0506A5C0, reading WAL from 0/04006FC0. 2025-08-12 11:48:21.527 JST [sby][slotsync worker] LOG: logical decoding found consistent point at 0/04006FC0 2025-08-12 11:48:21.527 JST [sby][slotsync worker] DETAIL: There are no running transactions. 2025-08-12 11:48:21.762 JST [sby][slotsync worker] LOG: starting logical decoding for slot "mysub" 2025-08-12 11:48:21.762 JST [sby][slotsync worker] DETAIL: Streaming transactions committing after 0/050C7F70, reading WAL from 0/04006FC0. 2025-08-12 11:48:21.763 JST [sby][slotsync worker] LOG: logical decoding found consistent point at 0/04006FC0 2025-08-12 11:48:21.763 JST [sby][slotsync worker] DETAIL: There are no running transactions. 2025-08-12 11:48:21.999 JST [sby][slotsync worker] LOG: starting logical decoding for slot "mysub" 2025-08-12 11:48:21.999 JST [sby][slotsync worker] DETAIL: Streaming transactions committing after 0/05111420, reading WAL from 0/04006FC0. 2025-08-12 11:48:21.999 JST [sby][slotsync worker] LOG: logical decoding found consistent point at 0/04006FC0 2025-08-12 11:48:21.999 JST [sby][slotsync worker] DETAIL: There are no running transactions. ---------------------------------------- Is this expected behavior, a configuration mistake on my side, or perhaps an oversight in the commit of the replication slot synchronization feature? This seems more like debug output left over from development rather than something useful for users. If this is true, would it make sense to change these messages from LOG to DEBUG? Regards, -- Fujii Masao
On Tue, Aug 12, 2025 at 8:52 AM Fujii Masao <masao.fujii@gmail.com> wrote: > > Hi, > > While testing the replication slot synchronization feature, I set up > three instances on the same machine: > > - Physical replication primary (also Logical replication publisher) > - Physical replication standby > - Logical replication subscriber > > ---------------------------------------- > initdb -D pri --encoding=UTF8 --locale=C > cat <<EOF >> pri/postgresql.conf > wal_level = logical > log_line_prefix = '%m [pri][%b] ' > logging_collector = on > EOF > pg_ctl -D pri start > pgbench -i > cat <<EOF | psql > CREATE PUBLICATION mypub FOR ALL TABLES; > SELECT pg_create_physical_replication_slot('pslot'); > EOF > cat <<EOF >> pri/postgresql.conf > synchronized_standby_slots = 'pslot' > EOF > pg_ctl -D pri reload > > pg_basebackup -D sby -c fast > touch sby/standby.signal > cat <<EOF >> sby/postgresql.conf > port = 5433 > primary_conninfo = 'host=localhost port=5432 dbname=postgres' > primary_slot_name = 'pslot' > sync_replication_slots = on > hot_standby_feedback = on > log_line_prefix = '%m [sby][%b] ' > EOF > pg_ctl -D sby start > > initdb -D sub --encoding=UTF8 --locale=C > cat <<EOF >> sub/postgresql.conf > port = 5434 > wal_level = logical > log_line_prefix = '%m [sub][%b] ' > logging_collector = on > EOF > pg_ctl -D sub start > pgbench -i -Itp -p 5434 > cat <<EOF | psql -p 5434 > CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5432 > dbname=postgres' PUBLICATION mypub WITH (failover); > EOF > ---------------------------------------- > > > I noticed that the replication slot sync worker on the standby generates > a large number of LOG messages, about two per 200–300 ms, during > pgbench runs (pgbench -c 10 -j 10 -T 60). For example, > > ---------------------------------------- > 2025-08-12 11:48:21.067 JST [sby][slotsync worker] LOG: starting > logical decoding for slot "mysub" > 2025-08-12 11:48:21.067 JST [sby][slotsync worker] DETAIL: Streaming > transactions committing after 0/04F317B8, reading WAL from 0/04006FC0. > 2025-08-12 11:48:21.067 JST [sby][slotsync worker] LOG: logical > decoding found consistent point at 0/04006FC0 > 2025-08-12 11:48:21.067 JST [sby][slotsync worker] DETAIL: There are > no running transactions. > 2025-08-12 11:48:21.296 JST [sby][slotsync worker] LOG: starting > logical decoding for slot "mysub" > 2025-08-12 11:48:21.296 JST [sby][slotsync worker] DETAIL: Streaming > transactions committing after 0/04F9DC90, reading WAL from 0/04006FC0. > 2025-08-12 11:48:21.297 JST [sby][slotsync worker] LOG: logical > decoding found consistent point at 0/04006FC0 > 2025-08-12 11:48:21.297 JST [sby][slotsync worker] DETAIL: There are > no running transactions. > 2025-08-12 11:48:21.526 JST [sby][slotsync worker] LOG: starting > logical decoding for slot "mysub" > 2025-08-12 11:48:21.526 JST [sby][slotsync worker] DETAIL: Streaming > transactions committing after 0/0506A5C0, reading WAL from 0/04006FC0. > 2025-08-12 11:48:21.527 JST [sby][slotsync worker] LOG: logical > decoding found consistent point at 0/04006FC0 > 2025-08-12 11:48:21.527 JST [sby][slotsync worker] DETAIL: There are > no running transactions. > 2025-08-12 11:48:21.762 JST [sby][slotsync worker] LOG: starting > logical decoding for slot "mysub" > 2025-08-12 11:48:21.762 JST [sby][slotsync worker] DETAIL: Streaming > transactions committing after 0/050C7F70, reading WAL from 0/04006FC0. > 2025-08-12 11:48:21.763 JST [sby][slotsync worker] LOG: logical > decoding found consistent point at 0/04006FC0 > 2025-08-12 11:48:21.763 JST [sby][slotsync worker] DETAIL: There are > no running transactions. > 2025-08-12 11:48:21.999 JST [sby][slotsync worker] LOG: starting > logical decoding for slot "mysub" > 2025-08-12 11:48:21.999 JST [sby][slotsync worker] DETAIL: Streaming > transactions committing after 0/05111420, reading WAL from 0/04006FC0. > 2025-08-12 11:48:21.999 JST [sby][slotsync worker] LOG: logical > decoding found consistent point at 0/04006FC0 > 2025-08-12 11:48:21.999 JST [sby][slotsync worker] DETAIL: There are > no running transactions. > ---------------------------------------- > > > Is this expected behavior, a configuration mistake on my side, or perhaps > an oversight in the commit of the replication slot synchronization feature? > > This seems more like debug output left over from development rather than > something useful for users. If this is true, would it make sense to change > these messages from LOG to DEBUG? > Thank You for reporting the issue. It looks like commit 2633dae (mistakenly) introduced a change ([1]) in SnapBuildSnapshotExists(), altering the format used for snapshot file names during the search. However, SnapBuildSerialize still uses the old format ("%s/%X-%X.snap"), which led to the slot-sync worker being unable to locate existing snapshot files. Reverting the format resolved the issue for me. Could you please check if the attached patch works for you as well? [1]: SnapBuildSnapshotExists() - sprintf(path, "%s/%X-%X.snap", + sprintf(path, "%s/%08X-%08X.snap", thanks Shveta
Вложения
On Tue, Aug 12, 2025 at 12:24:10PM +0530, shveta malik wrote: > On Tue, Aug 12, 2025 at 8:52 AM Fujii Masao <masao.fujii@gmail.com> wrote: > > > > Hi, > > > > While testing the replication slot synchronization feature, I set up > > three instances on the same machine: > > > > - Physical replication primary (also Logical replication publisher) > > - Physical replication standby > > - Logical replication subscriber > > > > ---------------------------------------- > > initdb -D pri --encoding=UTF8 --locale=C > > cat <<EOF >> pri/postgresql.conf > > wal_level = logical > > log_line_prefix = '%m [pri][%b] ' > > logging_collector = on > > EOF > > pg_ctl -D pri start > > pgbench -i > > cat <<EOF | psql > > CREATE PUBLICATION mypub FOR ALL TABLES; > > SELECT pg_create_physical_replication_slot('pslot'); > > EOF > > cat <<EOF >> pri/postgresql.conf > > synchronized_standby_slots = 'pslot' > > EOF > > pg_ctl -D pri reload > > > > pg_basebackup -D sby -c fast > > touch sby/standby.signal > > cat <<EOF >> sby/postgresql.conf > > port = 5433 > > primary_conninfo = 'host=localhost port=5432 dbname=postgres' > > primary_slot_name = 'pslot' > > sync_replication_slots = on > > hot_standby_feedback = on > > log_line_prefix = '%m [sby][%b] ' > > EOF > > pg_ctl -D sby start > > > > initdb -D sub --encoding=UTF8 --locale=C > > cat <<EOF >> sub/postgresql.conf > > port = 5434 > > wal_level = logical > > log_line_prefix = '%m [sub][%b] ' > > logging_collector = on > > EOF > > pg_ctl -D sub start > > pgbench -i -Itp -p 5434 > > cat <<EOF | psql -p 5434 > > CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5432 > > dbname=postgres' PUBLICATION mypub WITH (failover); > > EOF > > ---------------------------------------- > > > > > > I noticed that the replication slot sync worker on the standby generates > > a large number of LOG messages, about two per 200–300 ms, during > > pgbench runs (pgbench -c 10 -j 10 -T 60). For example, > > > > ---------------------------------------- > > 2025-08-12 11:48:21.067 JST [sby][slotsync worker] LOG: starting > > logical decoding for slot "mysub" > > 2025-08-12 11:48:21.067 JST [sby][slotsync worker] DETAIL: Streaming > > transactions committing after 0/04F317B8, reading WAL from 0/04006FC0. > > 2025-08-12 11:48:21.067 JST [sby][slotsync worker] LOG: logical > > decoding found consistent point at 0/04006FC0 > > 2025-08-12 11:48:21.067 JST [sby][slotsync worker] DETAIL: There are > > no running transactions. > > 2025-08-12 11:48:21.296 JST [sby][slotsync worker] LOG: starting > > logical decoding for slot "mysub" > > 2025-08-12 11:48:21.296 JST [sby][slotsync worker] DETAIL: Streaming > > transactions committing after 0/04F9DC90, reading WAL from 0/04006FC0. > > 2025-08-12 11:48:21.297 JST [sby][slotsync worker] LOG: logical > > decoding found consistent point at 0/04006FC0 > > 2025-08-12 11:48:21.297 JST [sby][slotsync worker] DETAIL: There are > > no running transactions. > > 2025-08-12 11:48:21.526 JST [sby][slotsync worker] LOG: starting > > logical decoding for slot "mysub" > > 2025-08-12 11:48:21.526 JST [sby][slotsync worker] DETAIL: Streaming > > transactions committing after 0/0506A5C0, reading WAL from 0/04006FC0. > > 2025-08-12 11:48:21.527 JST [sby][slotsync worker] LOG: logical > > decoding found consistent point at 0/04006FC0 > > 2025-08-12 11:48:21.527 JST [sby][slotsync worker] DETAIL: There are > > no running transactions. > > 2025-08-12 11:48:21.762 JST [sby][slotsync worker] LOG: starting > > logical decoding for slot "mysub" > > 2025-08-12 11:48:21.762 JST [sby][slotsync worker] DETAIL: Streaming > > transactions committing after 0/050C7F70, reading WAL from 0/04006FC0. > > 2025-08-12 11:48:21.763 JST [sby][slotsync worker] LOG: logical > > decoding found consistent point at 0/04006FC0 > > 2025-08-12 11:48:21.763 JST [sby][slotsync worker] DETAIL: There are > > no running transactions. > > 2025-08-12 11:48:21.999 JST [sby][slotsync worker] LOG: starting > > logical decoding for slot "mysub" > > 2025-08-12 11:48:21.999 JST [sby][slotsync worker] DETAIL: Streaming > > transactions committing after 0/05111420, reading WAL from 0/04006FC0. > > 2025-08-12 11:48:21.999 JST [sby][slotsync worker] LOG: logical > > decoding found consistent point at 0/04006FC0 > > 2025-08-12 11:48:21.999 JST [sby][slotsync worker] DETAIL: There are > > no running transactions. > > ---------------------------------------- > > > > > > Is this expected behavior, a configuration mistake on my side, or perhaps > > an oversight in the commit of the replication slot synchronization feature? > > > > This seems more like debug output left over from development rather than > > something useful for users. If this is true, would it make sense to change > > these messages from LOG to DEBUG? > > > > Thank You for reporting the issue. > > It looks like commit 2633dae (mistakenly) introduced a change ([1]) in > SnapBuildSnapshotExists(), altering the format used for snapshot file > names during the search. However, SnapBuildSerialize still uses the > old format ("%s/%X-%X.snap"), which led to the slot-sync worker being > unable to locate existing snapshot files. Reverting the format > resolved the issue for me. Could you please check if the attached > patch works for you as well? > > [1]: > SnapBuildSnapshotExists() > - sprintf(path, "%s/%X-%X.snap", > + sprintf(path, "%s/%08X-%08X.snap", > Sorry, it's my fault. I forgot to update the format in SnapBuildSerialize(). I'd prefer to update the format in SnapBuildSerialize() instead of reverting this change. diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index 8532bfd27e5..db7051170fc 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -1529,7 +1529,7 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn) * unless the user used pg_resetwal or similar. If a user did so, there's * no hope continuing to decode anyway. */ - sprintf(path, "%s/%X-%X.snap", + sprintf(path, "%s/%08X-%08X.snap", PG_LOGICAL_SNAPSHOTS_DIR, LSN_FORMAT_ARGS(lsn)); @@ -1573,7 +1573,7 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn) elog(DEBUG1, "serializing snapshot to %s", path); /* to make sure only we will write to this tempfile, include pid */ - sprintf(tmppath, "%s/%X-%X.snap.%d.tmp", + sprintf(tmppath, "%s/%08X-%08X.snap.%d.tmp", PG_LOGICAL_SNAPSHOTS_DIR, LSN_FORMAT_ARGS(lsn), MyProcPid); -- Best regards, Japin Li ChengDu WenWu Information Technology Co., LTD.
On Tue, Aug 12, 2025 at 4:38 PM Japin Li <japinli@hotmail.com> wrote: > > On Tue, Aug 12, 2025 at 12:24:10PM +0530, shveta malik wrote: > > It looks like commit 2633dae (mistakenly) introduced a change ([1]) in > > SnapBuildSnapshotExists(), altering the format used for snapshot file > > names during the search. However, SnapBuildSerialize still uses the > > old format ("%s/%X-%X.snap"), which led to the slot-sync worker being > > unable to locate existing snapshot files. Thanks for the investigation! > Sorry, it's my fault. I forgot to update the format in SnapBuildSerialize(). > > I'd prefer to update the format in SnapBuildSerialize() instead of reverting > this change. +1. If we make this change, we should also update other places using "%X-%X"? $ git grep -E "%X-%X.(snap|tmp)" contrib/pg_logicalinspect/pg_logicalinspect.c: if (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2) contrib/pg_logicalinspect/pg_logicalinspect.c: sprintf(tmpfname, "%X-%X.snap", hi, lo); src/backend/replication/logical/snapbuild.c: sprintf(path, "%s/%X-%X.snap", src/backend/replication/logical/snapbuild.c: sprintf(tmppath, "%s/%X-%X.snap.%d.tmp", src/backend/replication/logical/snapbuild.c: sprintf(path, "%s/%X-%X.snap", src/backend/replication/logical/snapbuild.c: if (sscanf(snap_de->d_name, "%X-%X.snap", &hi, &lo) != 2) Regards, -- Fujii Masao
On Tue, Aug 12, 2025 at 05:57:45PM +0900, Fujii Masao wrote: > On Tue, Aug 12, 2025 at 4:38 PM Japin Li <japinli@hotmail.com> wrote: > > > > On Tue, Aug 12, 2025 at 12:24:10PM +0530, shveta malik wrote: > > > It looks like commit 2633dae (mistakenly) introduced a change ([1]) in > > > SnapBuildSnapshotExists(), altering the format used for snapshot file > > > names during the search. However, SnapBuildSerialize still uses the > > > old format ("%s/%X-%X.snap"), which led to the slot-sync worker being > > > unable to locate existing snapshot files. > > Thanks for the investigation! > > > > Sorry, it's my fault. I forgot to update the format in SnapBuildSerialize(). > > > > I'd prefer to update the format in SnapBuildSerialize() instead of reverting > > this change. > > +1. > If we make this change, we should also update other places using "%X-%X"? > Agreed. > $ git grep -E "%X-%X.(snap|tmp)" > contrib/pg_logicalinspect/pg_logicalinspect.c: if (sscanf(filename, > "%X-%X.snap", &hi, &lo) != 2) > contrib/pg_logicalinspect/pg_logicalinspect.c: sprintf(tmpfname, > "%X-%X.snap", hi, lo); > src/backend/replication/logical/snapbuild.c: sprintf(path, "%s/%X-%X.snap", > src/backend/replication/logical/snapbuild.c: sprintf(tmppath, > "%s/%X-%X.snap.%d.tmp", > src/backend/replication/logical/snapbuild.c: sprintf(path, "%s/%X-%X.snap", > src/backend/replication/logical/snapbuild.c: if > (sscanf(snap_de->d_name, "%X-%X.snap", &hi, &lo) != 2) > I believe that the format %X-%X also works with sscanf(). However, to maintain consistency, the format for sscanf() has been updated as well. -- Best regards, Japin Li ChengDu WenWu Information Technology Co., LTD.
Вложения
On Tue, Aug 12, 2025 at 6:25 PM Japin Li <japinli@hotmail.com> wrote: > I believe that the format %X-%X also works with sscanf(). However, to maintain > consistency, the format for sscanf() has been updated as well. Yes. Thanks for the patch! Since we're changing the first "%X" in "%X-%X" to "%08X", the example file names in the docs should be updated too. For example: $ git grep "\.snap" | grep pglogicalinspect.sgml doc/src/sgml/pglogicalinspect.sgml:name | 0-40796E18.snap doc/src/sgml/pglogicalinspect.sgml:postgres=# SELECT * FROM pg_get_logical_snapshot_meta('0-40796E18.snap'); doc/src/sgml/pglogicalinspect.sgml:name | 0-40796E18.snap doc/src/sgml/pglogicalinspect.sgml:name | 0-40796E18.snap doc/src/sgml/pglogicalinspect.sgml:postgres=# SELECT * FROM pg_get_logical_snapshot_info('0-40796E18.snap'); doc/src/sgml/pglogicalinspect.sgml:name | 0-40796E18.snap I also noticed that the regression tests for pg_logicalinspect use file names in the old format. This doesn't cause test failures, but should we update them to match the new format? ... contrib/pg_logicalinspect/sql/pg_logicalinspect.sql:SELECT pg_get_logical_snapshot_info('0/40796E18.snap'); contrib/pg_logicalinspect/sql/pg_logicalinspect.sql:SELECT pg_get_logical_snapshot_info('../snapshots/0-40796E18.snap'); contrib/pg_logicalinspect/sql/pg_logicalinspect.sql:SELECT pg_get_logical_snapshot_meta('0-40796E18.foo.snap'); contrib/pg_logicalinspect/sql/pg_logicalinspect.sql:SELECT pg_get_logical_snapshot_meta('0/40796E18.snap'); ... Regards, -- Fujii Masao
On Tue, Aug 12, 2025 at 7:14 PM Fujii Masao <masao.fujii@gmail.com> wrote: > > On Tue, Aug 12, 2025 at 6:25 PM Japin Li <japinli@hotmail.com> wrote: > > I believe that the format %X-%X also works with sscanf(). However, to maintain > > consistency, the format for sscanf() has been updated as well. > > Yes. > Thanks for the patch! With the patch, the excessive LOG messages from the replication slot sync worker are gone. However, it still logged periodically, about two messages every 30 seconds, for example: 2025-08-12 19:49:03.288 JST [sby][slotsync worker] LOG: starting logical decoding for slot "mysub" 2025-08-12 19:49:03.288 JST [sby][slotsync worker] DETAIL: Streaming transactions committing after 0/0FAD3940, reading WAL from 0/0D4C9CE8. 2025-08-12 19:49:03.289 JST [sby][slotsync worker] LOG: logical decoding found consistent point at 0/0D4C9CE8 2025-08-12 19:49:03.289 JST [sby][slotsync worker] DETAIL: Logical decoding will begin using saved snapshot. 2025-08-12 19:49:46.048 JST [sby][slotsync worker] LOG: starting logical decoding for slot "mysub" 2025-08-12 19:49:46.048 JST [sby][slotsync worker] DETAIL: Streaming transactions committing after 0/10A60230, reading WAL from 0/0FAEE120. 2025-08-12 19:49:46.049 JST [sby][slotsync worker] LOG: logical decoding found consistent point at 0/0FAEE120 2025-08-12 19:49:46.049 JST [sby][slotsync worker] DETAIL: Logical decoding will begin using saved snapshot. I'm not sure these messages are useful for end users, and LOG might not be the right level. They seem more like debug information, so should we consider changing them to DEBUG, even with the proposed patch applied? Regards, -- Fujii Masao
On Tue, Aug 12, 2025 at 07:14:38PM +0900, Fujii Masao wrote: > On Tue, Aug 12, 2025 at 6:25 PM Japin Li <japinli@hotmail.com> wrote: > > I believe that the format %X-%X also works with sscanf(). However, to maintain > > consistency, the format for sscanf() has been updated as well. > > Yes. > Thanks for the patch! > > Since we're changing the first "%X" in "%X-%X" to "%08X", the example > file names in the docs should be updated too. For example: > > $ git grep "\.snap" | grep pglogicalinspect.sgml > doc/src/sgml/pglogicalinspect.sgml:name | 0-40796E18.snap > doc/src/sgml/pglogicalinspect.sgml:postgres=# SELECT * FROM > pg_get_logical_snapshot_meta('0-40796E18.snap'); > doc/src/sgml/pglogicalinspect.sgml:name | 0-40796E18.snap > doc/src/sgml/pglogicalinspect.sgml:name | 0-40796E18.snap > doc/src/sgml/pglogicalinspect.sgml:postgres=# SELECT * FROM > pg_get_logical_snapshot_info('0-40796E18.snap'); > doc/src/sgml/pglogicalinspect.sgml:name | 0-40796E18.snap > > I also noticed that the regression tests for pg_logicalinspect use file > names in the old format. This doesn't cause test failures, but should > we update them to match the new format? > > ... > contrib/pg_logicalinspect/sql/pg_logicalinspect.sql:SELECT > pg_get_logical_snapshot_info('0/40796E18.snap'); > contrib/pg_logicalinspect/sql/pg_logicalinspect.sql:SELECT > pg_get_logical_snapshot_info('../snapshots/0-40796E18.snap'); > contrib/pg_logicalinspect/sql/pg_logicalinspect.sql:SELECT > pg_get_logical_snapshot_meta('0-40796E18.foo.snap'); > contrib/pg_logicalinspect/sql/pg_logicalinspect.sql:SELECT > pg_get_logical_snapshot_meta('0/40796E18.snap'); > ... > Update in v3 patch. OTOH, I also update reorder buffer spill file path. $ git grep -E 'xid-.*-lsn.*spill' src/backend/replication/logical/reorderbuffer.c: snprintf(path, MAXPGPATH, "%s/%s/xid-%u-lsn-%X-%X.spill", -- Best regards, Japin Li ChengDu WenWu Information Technology Co., LTD.
Вложения
On Tue, Aug 12, 2025 at 07:58:25PM +0900, Fujii Masao wrote: > On Tue, Aug 12, 2025 at 7:14 PM Fujii Masao <masao.fujii@gmail.com> wrote: > > > > On Tue, Aug 12, 2025 at 6:25 PM Japin Li <japinli@hotmail.com> wrote: > > > I believe that the format %X-%X also works with sscanf(). However, to maintain > > > consistency, the format for sscanf() has been updated as well. > > > > Yes. > > Thanks for the patch! > > With the patch, the excessive LOG messages from the replication slot sync > worker are gone. However, it still logged periodically, about two messages > every 30 seconds, for example: > > 2025-08-12 19:49:03.288 JST [sby][slotsync worker] LOG: starting > logical decoding for slot "mysub" > 2025-08-12 19:49:03.288 JST [sby][slotsync worker] DETAIL: Streaming > transactions committing after 0/0FAD3940, reading WAL from 0/0D4C9CE8. > 2025-08-12 19:49:03.289 JST [sby][slotsync worker] LOG: logical > decoding found consistent point at 0/0D4C9CE8 > 2025-08-12 19:49:03.289 JST [sby][slotsync worker] DETAIL: Logical > decoding will begin using saved snapshot. > 2025-08-12 19:49:46.048 JST [sby][slotsync worker] LOG: starting > logical decoding for slot "mysub" > 2025-08-12 19:49:46.048 JST [sby][slotsync worker] DETAIL: Streaming > transactions committing after 0/10A60230, reading WAL from 0/0FAEE120. > 2025-08-12 19:49:46.049 JST [sby][slotsync worker] LOG: logical > decoding found consistent point at 0/0FAEE120 > 2025-08-12 19:49:46.049 JST [sby][slotsync worker] DETAIL: Logical > decoding will begin using saved snapshot. > > I think this is not related to the logical snapshot name. > > I'm not sure these messages are useful for end users, and LOG might not be > the right level. They seem more like debug information, so should we consider > changing them to DEBUG, even with the proposed patch applied? +1. -- Best regards, Japin Li ChengDu WenWu Information Technology Co., LTD.
On Tue, Aug 12, 2025 at 8:06 PM Japin Li <japinli@hotmail.com> wrote: > > On Tue, Aug 12, 2025 at 07:14:38PM +0900, Fujii Masao wrote: > > On Tue, Aug 12, 2025 at 6:25 PM Japin Li <japinli@hotmail.com> wrote: > > > I believe that the format %X-%X also works with sscanf(). However, to maintain > > > consistency, the format for sscanf() has been updated as well. > > > > Yes. > > Thanks for the patch! > > > > Since we're changing the first "%X" in "%X-%X" to "%08X", the example > > file names in the docs should be updated too. For example: > > > > $ git grep "\.snap" | grep pglogicalinspect.sgml > > doc/src/sgml/pglogicalinspect.sgml:name | 0-40796E18.snap > > doc/src/sgml/pglogicalinspect.sgml:postgres=# SELECT * FROM > > pg_get_logical_snapshot_meta('0-40796E18.snap'); > > doc/src/sgml/pglogicalinspect.sgml:name | 0-40796E18.snap > > doc/src/sgml/pglogicalinspect.sgml:name | 0-40796E18.snap > > doc/src/sgml/pglogicalinspect.sgml:postgres=# SELECT * FROM > > pg_get_logical_snapshot_info('0-40796E18.snap'); > > doc/src/sgml/pglogicalinspect.sgml:name | 0-40796E18.snap > > > > I also noticed that the regression tests for pg_logicalinspect use file > > names in the old format. This doesn't cause test failures, but should > > we update them to match the new format? > > > > ... > > contrib/pg_logicalinspect/sql/pg_logicalinspect.sql:SELECT > > pg_get_logical_snapshot_info('0/40796E18.snap'); > > contrib/pg_logicalinspect/sql/pg_logicalinspect.sql:SELECT > > pg_get_logical_snapshot_info('../snapshots/0-40796E18.snap'); > > contrib/pg_logicalinspect/sql/pg_logicalinspect.sql:SELECT > > pg_get_logical_snapshot_meta('0-40796E18.foo.snap'); > > contrib/pg_logicalinspect/sql/pg_logicalinspect.sql:SELECT > > pg_get_logical_snapshot_meta('0/40796E18.snap'); > > ... > > > > Update in v3 patch. Thanks for updating the patch! > OTOH, I also update reorder buffer spill file path. > > $ git grep -E 'xid-.*-lsn.*spill' > src/backend/replication/logical/reorderbuffer.c: snprintf(path, MAXPGPATH, "%s/%s/xid-%u-lsn-%X-%X.spill", Should we also update LOGICAL_REWRITE_FORMAT to use %08X? That said, the number of places to change is growing, and we're drifting away from the original issue.... So I started thinking it might be better to simply applying Shveta's patch to revert the incorrect format change to fix the problem, and then discussing LSN-based filename standardization in a separate thread. Regards, -- Fujii Masao
On Tue, Aug 12, 2025 at 10:02:32PM +0900, Fujii Masao wrote: > On Tue, Aug 12, 2025 at 8:06 PM Japin Li <japinli@hotmail.com> wrote: > > OTOH, I also update reorder buffer spill file path. > > > > $ git grep -E 'xid-.*-lsn.*spill' > > src/backend/replication/logical/reorderbuffer.c: snprintf(path, MAXPGPATH, "%s/%s/xid-%u-lsn-%X-%X.spill", > > Should we also update LOGICAL_REWRITE_FORMAT to use %08X? > > That said, the number of places to change is growing, and we're drifting > away from the original issue.... So I started thinking it might be better > to simply applying Shveta's patch to revert the incorrect format change > to fix the problem, and then discussing LSN-based filename standardization > in a separate thread. > Agreed. See the new thread [1] for a discussion on this. [1] https://www.postgresql.org/message-id/ME0P300MB0445795E3FF4CA5A1143164FB62AA@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM -- Best regards, Japin Li ChengDu WenWu Information Technology Co., LTD.
On Tue, Aug 12, 2025 at 6:32 PM Fujii Masao <masao.fujii@gmail.com> wrote: > > Should we also update LOGICAL_REWRITE_FORMAT to use %08X? > > That said, the number of places to change is growing, and we're drifting > away from the original issue.... So I started thinking it might be better > to simply applying Shveta's patch to revert the incorrect format change > to fix the problem, and then discussing LSN-based filename standardization > in a separate thread. > +1. -- With Regards, Amit Kapila.
On Tue, Aug 12, 2025 at 4:28 PM Fujii Masao <masao.fujii@gmail.com> wrote: > > I'm not sure these messages are useful for end users, and LOG might not be > the right level. They seem more like debug information, so should we consider > changing them to DEBUG, even with the proposed patch applied? > Yeah, we can do that for slotsync path but it should be okay to keep at LOG level for logical replication/decoding path as there it shouldn't hit as often as in slotsync path. But OTH, there is a argument that it can be DEBUG for logical replication/decoding path as well. From my experience, I can tell that these messages have been helpful in finding BF failures and debugging bugs from user reports, so there is value in keeping them at LOG level. -- With Regards, Amit Kapila.
On Wed, Aug 13, 2025 at 11:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Aug 12, 2025 at 6:32 PM Fujii Masao <masao.fujii@gmail.com> wrote: > > > > Should we also update LOGICAL_REWRITE_FORMAT to use %08X? > > > > That said, the number of places to change is growing, and we're drifting > > away from the original issue.... So I started thinking it might be better > > to simply applying Shveta's patch to revert the incorrect format change > > to fix the problem, and then discussing LSN-based filename standardization > > in a separate thread. > > > > +1. I've pushed Shveta's patch. Thanks! Regards, -- Fujii Masao
On Wed, Aug 13, 2025 at 1:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Aug 12, 2025 at 4:28 PM Fujii Masao <masao.fujii@gmail.com> wrote: > > > > I'm not sure these messages are useful for end users, and LOG might not be > > the right level. They seem more like debug information, so should we consider > > changing them to DEBUG, even with the proposed patch applied? > > > > Yeah, we can do that for slotsync path but it should be okay to keep > at LOG level for logical replication/decoding path as there it > shouldn't hit as often as in slotsync path. It looks like those two LOG messages are also generated by pg_logical_slot_get_binary_changes(). When this function is called every few seconds to capture changes in real time, it can produce a large number of these messages. So shouldn't we consider changing their log level to DEBUG also in such cases? > But OTH, there is a > argument that it can be DEBUG for logical replication/decoding path as > well. From my experience, I can tell that these messages have been > helpful in finding BF failures and debugging bugs from user reports, > so there is value in keeping them at LOG level. I understand these messages are useful for debugging, but if they're now not intended for end users, perhaps they should be DEBUG-level instead? We could also introduce a new GUC to control whether these messages are logged for debugging purposes and use it in regression tests for logical decoding, or just set log_min_messages to DEBUG in those tests. Thought? Regards, -- Fujii Masao
On Thu, Aug 14, 2025 at 9:43 AM Fujii Masao <masao.fujii@gmail.com> wrote: > > On Wed, Aug 13, 2025 at 1:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Aug 12, 2025 at 4:28 PM Fujii Masao <masao.fujii@gmail.com> wrote: > > > > > > I'm not sure these messages are useful for end users, and LOG might not be > > > the right level. They seem more like debug information, so should we consider > > > changing them to DEBUG, even with the proposed patch applied? > > > > > > > Yeah, we can do that for slotsync path but it should be okay to keep > > at LOG level for logical replication/decoding path as there it > > shouldn't hit as often as in slotsync path. > > It looks like those two LOG messages are also generated by > pg_logical_slot_get_binary_changes(). When this function is > called every few seconds to capture changes in real time, > it can produce a large number of these messages. > I am not sure how plugin developers or users are using these functions, but for built-in logical replication, these messages should appear at the start of the walsender once. This area is relatively complex so it becomes easier for us to understand the bug reports with the help of these messages. TBH, I don't know if they are really helpful for users so I see the merits of your point to change the level of these LOGs. However, as a selfish developer who works on various bugs in this area I found these and other logical decoding messages quite useful to analyze the bug reports. I haven't checked all but there is another message as follows which would fall in same category: elog(LOG, "%X/%08X has been already streamed, forwarding to %X/%08X", BTW, I have not seen user reports where they complain about these LOGs. Now, it is possible that users may not have noticed or maybe I missed to spot such reports. -- With Regards, Amit Kapila.