Обсуждение: Records count mismatch with logical replication
Hi Team,
I have publication and subscription servers .So seems data replication running with minimal lag but records count mismatch with more than 10 thousand records between source and destination tables
Could you please help in resolving this issue?
Regards,
Durga Mahesh
On 1/21/25 04:08, Durgamahesh Manne wrote: > Hi Team, > > I have publication and subscription servers .So seems data replication > running with minimal lag but records count mismatch with more than 10 > thousand records between source and destination tables > > Could you please help in resolving this issue? Not without a good deal more information: 1) Postgres version on either side of the replication. 2) The replication configuration. 3) Define how lag is being calculated and what 'minimal' is. 4) Define how the record counts are being derived. 5) The network distance between the servers. > > Regards, > Durga Mahesh > > > -- Adrian Klaver adrian.klaver@aklaver.com
On Tue, Jan 21, 2025 at 9:24 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 1/21/25 04:08, Durgamahesh Manne wrote:
> Hi Team,
>
> I have publication and subscription servers .So seems data replication
> running with minimal lag but records count mismatch with more than 10
> thousand records between source and destination tables
>
> Could you please help in resolving this issue?
Not without a good deal more information:
1) Postgres version on either side of the replication.
2) The replication configuration.
3) Define how lag is being calculated and what 'minimal' is.
4) Define how the record counts are being derived.
5) The network distance between the servers.
>
> Regards,
> Durga Mahesh
>
>
>
--
Adrian Klaver
adrian.klaver@aklaver.com
Hi Adrian Klaver
Really Thanks for your quick response
This happened during repack lag went to more than 350Gb then gradually decreased to minimal lag after running pg_repack
1) Postgres version on either side of the replication.
Source(publication) :16.4
Destination(subscription) : 14.11
2) The replication configuration.
2) The replication configuration.
OLAP workload archiving (32GB 8 Vcpus)
Source : wal_level = logical
wal_sender_timeout = 30s
max_wal_senders = 40
max_replication_slots = 20
max_logical_replication_workers = 4
wal_buffers = 64MB
commit_delay = 2000
commi_siblings = 12
wal_writer_delay = 300
wal_writer_flush_after = 1MB
bgwriter_delay = 20
min_wal_size = 8GB
max_wal_size = 32Gb
Destination : 128GB 32 vcpus
wal_level = logical
wal_receiver_timeout = 30s
max_wal_senders = 40
max_replication_slots = 60
max_logical_replication_workers = 23
wal_buffers = 64MB
commit_delay = default
commi_siblings = default
wal_writer_delay = default
wal_writer_flush_after = default
bgwriter_delay = 20
min_wal_size = 8GB
max_wal_size = 32Gb
3) Define how lag is being calculated and what 'minimal' is.
postgres=> select slot_name,pg_wal_lsn_diff(pg_current_wal_lsn(),restart_lsn) as bytes_behind,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) as behind_size, active from pg_replication_slots;
slot_name | bytes_behind | behind_size | active
-------------+--------------+-------------+--------
cls_eva_msa | 22906216 | 22 MB | t
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) as behind_size, active from pg_replication_slots;
slot_name | bytes_behind | behind_size | active
-------------+--------------+-------------+--------
cls_eva_msa | 22906216 | 22 MB | t
4) Define how the record counts are being derived.
Source : archiving=> select count(*) from archiving.events_archive ;
count
---------
1262908
(1 row)
Destination : archiving=> select count(*) from archiving.events_archive ;
count
---------
1252062
(1 row)
5) The network distance between the servers.
count
---------
1262908
(1 row)
Destination : archiving=> select count(*) from archiving.events_archive ;
count
---------
1252062
(1 row)
5) The network distance between the servers.
Both are under same vpc security groups
Regards
Durga Mahesh
On 1/21/25 09:38, Durgamahesh Manne wrote: > > > > Hi Adrian Klaver > > Really Thanks for your quick response > > This happened during repack lag went to more than 350Gb then gradually > decreased to minimal lag after running pg_repack I don't use pg_repack so I don't know what effect it would have on the process. > > 3) Define how lag is being calculated and what 'minimal' is. > > postgres=> select > slot_name,pg_wal_lsn_diff(pg_current_wal_lsn(),restart_lsn) as bytes_behind, > pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) as > behind_size, active from pg_replication_slots; > slot_name | bytes_behind | behind_size | active > -------------+--------------+-------------+-------- > cls_eva_msa | 22906216 | 22 MB | t > > 4) Define how the record counts are being derived. > Source : archiving=> select count(*) from archiving.events_archive ; > count > --------- > 1262908 > (1 row) > > Destination : archiving=> select count(*) from archiving.events_archive ; > count > --------- > 1252062 > (1 row) 22,906,216 bytes/10,846 rows works out to 2112 bytes per row. Is that a reasonable per row estimate? > > 5) The network distance between the servers. > > Both are under same vpc security groups > > Regards > Durga Mahesh -- Adrian Klaver adrian.klaver@aklaver.com
On Tue, Jan 21, 2025 at 11:26 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 1/21/25 09:38, Durgamahesh Manne wrote:
>
>
>
> Hi Adrian Klaver
>
> Really Thanks for your quick response
>
> This happened during repack lag went to more than 350Gb then gradually
> decreased to minimal lag after running pg_repack
I don't use pg_repack so I don't know what effect it would have on the
process.
>
> 3) Define how lag is being calculated and what 'minimal' is.
>
> postgres=> select
> slot_name,pg_wal_lsn_diff(pg_current_wal_lsn(),restart_lsn) as bytes_behind,
> pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) as
> behind_size, active from pg_replication_slots;
> slot_name | bytes_behind | behind_size | active
> -------------+--------------+-------------+--------
> cls_eva_msa | 22906216 | 22 MB | t
>
> 4) Define how the record counts are being derived.
> Source : archiving=> select count(*) from archiving.events_archive ;
> count
> ---------
> 1262908
> (1 row)
>
> Destination : archiving=> select count(*) from archiving.events_archive ;
> count
> ---------
> 1252062
> (1 row)
22,906,216 bytes/10,846 rows works out to 2112 bytes per row.
Is that a reasonable per row estimate?
>
> 5) The network distance between the servers.
>
> Both are under same vpc security groups
>
> Regards
> Durga Mahesh
--
Adrian Klaver
adrian.klaver@aklaver.com
Hi Adrian Klaver
22,906,216 bytes/10,846 rows works out to 2112 bytes per row.
Is that a reasonable per row estimate?
Is that a reasonable per row estimate?
Yes sometimes would be vary
Regards,
Durga Mahesh
On 1/21/25 10:06 AM, Durgamahesh Manne wrote: > > Hi Adrian Klaver > > 22,906,216 bytes/10,846 rows works out to 2112 bytes per row. > > Is that a reasonable per row estimate? > > Yes sometimes would be vary If I am following the lag went from 350GB behind to 22MB. Is the issue that the lag has stalled at 22MB? > > Regards, > Durga Mahesh > -- Adrian Klaver adrian.klaver@aklaver.com
On Wed, 22 Jan, 2025, 00:22 Adrian Klaver, <adrian.klaver@aklaver.com> wrote:
On 1/21/25 10:06 AM, Durgamahesh Manne wrote:
>
> Hi Adrian Klaver
>
> 22,906,216 bytes/10,846 rows works out to 2112 bytes per row.
>
> Is that a reasonable per row estimate?
>
> Yes sometimes would be vary
If I am following the lag went from 350GB behind to 22MB.
Is the issue that the lag has stalled at 22MB?
>
> Regards,
> Durga Mahesh
>
--
Adrian Klaver
adrian.klaver@aklaver.com
Hi Adrian Klaver
Is the issue that the lag has stalled at 22MB?
Depends on load of source
The lag would be either decrease or increase in Kb 's and Mb's (not Gb's)
It s not constant as Data being replicated to target
But records count varies with difference of more than 10 thousand
How to mitigate this issue in simplest way ?
Regards
Durga Mahesh
On 1/21/25 11:40, Durgamahesh Manne wrote: > > > On Wed, 22 Jan, 2025, 00:22 Adrian Klaver, <adrian.klaver@aklaver.com > <mailto:adrian.klaver@aklaver.com>> wrote: > > > > On 1/21/25 10:06 AM, Durgamahesh Manne wrote: > > > > > Hi Adrian Klaver > > > > 22,906,216 bytes/10,846 rows works out to 2112 bytes per row. > > > > Is that a reasonable per row estimate? > > > > Yes sometimes would be vary > > If I am following the lag went from 350GB behind to 22MB. > > Is the issue that the lag has stalled at 22MB? > > > > > Regards, > > Durga Mahesh > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > > > Hi Adrian Klaver > > Is the issue that the lag has stalled at 22MB? > > Depends on load of source > The lag would be either decrease or increase in Kb 's and Mb's (not Gb's) > It s not constant as Data being replicated to target Previously you stated: "Both are under same vpc security groups" Does this refer to AWS? If so per: https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html "The following diagram shows an example VPC. The VPC has one subnet in each of the Availability Zones in the Region, EC2 instances in each subnet, and an internet gateway to allow communication between the resources in your VPC and the internet." So where are the two Postgres instances physically located relative to each other? > > But records count varies with difference of more than 10 thousand Have you looked at the I/0 statistics between the Postgres instances? > > How to mitigate this issue in simplest way ? Until it is determined what is causing the lag there is no way to deal with it. > > Regards > Durga Mahesh -- Adrian Klaver adrian.klaver@aklaver.com
On Wed, 22 Jan, 2025, 03:11 Adrian Klaver, <adrian.klaver@aklaver.com> wrote:
On 1/21/25 11:40, Durgamahesh Manne wrote:
>
>
> On Wed, 22 Jan, 2025, 00:22 Adrian Klaver, <adrian.klaver@aklaver.com
> <mailto:adrian.klaver@aklaver.com>> wrote:
>
>
>
> On 1/21/25 10:06 AM, Durgamahesh Manne wrote:
>
> >
> > Hi Adrian Klaver
> >
> > 22,906,216 bytes/10,846 rows works out to 2112 bytes per row.
> >
> > Is that a reasonable per row estimate?
> >
> > Yes sometimes would be vary
>
> If I am following the lag went from 350GB behind to 22MB.
>
> Is the issue that the lag has stalled at 22MB?
>
> >
> > Regards,
> > Durga Mahesh
> >
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>
>
> Hi Adrian Klaver
>
> Is the issue that the lag has stalled at 22MB?
>
> Depends on load of source
> The lag would be either decrease or increase in Kb 's and Mb's (not Gb's)
> It s not constant as Data being replicated to target
Previously you stated:
"Both are under same vpc security groups"
Does this refer to AWS?
If so per:
https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html
"The following diagram shows an example VPC. The VPC has one subnet in
each of the Availability Zones in the Region, EC2 instances in each
subnet, and an internet gateway to allow communication between the
resources in your VPC and the internet."
So where are the two Postgres instances physically located relative to
each other?
>
> But records count varies with difference of more than 10 thousand
Have you looked at the I/0 statistics between the Postgres instances?
>
> How to mitigate this issue in simplest way ?
Until it is determined what is causing the lag there is no way to deal
with it.
>
> Regards
> Durga Mahesh
--
Adrian Klaver
adrian.klaver@aklaver.com
Hi
Hi
⁹So where are the two Postgres instances physically located relative to
each other?
Both in lreland under same vpc security groups
>
> But records count varies with difference of more than 10 thousand
Have you looked at the I/0 statistics between the Postgres instances?
Seems everything looks good with pg replication slots
Does this pg logical slot get changes function help to pull pending changes to subscription that can be sync with publication server for real time sync ?
Regards,
Durgamahesh
On 1/22/25 18:53, Durgamahesh Manne wrote: > > > > > But records count varies with difference of more than 10 thousand > > Have you looked at the I/0 statistics between the Postgres instances? > > Seems everything looks good with pg replication slots Except the subscriber is lagging behind the publisher. '... everything looks good' is an opinion not actual data. > > Does this pg logical slot get changes function help to pull pending > changes to subscription that can be sync with publication server for > real time sync ? Are you referring to this?: https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION Though I am not sure you want to do this as from above: "When requesting synchronous replication, each commit of a write transaction will wait until confirmation is received that the commit has been written to the write-ahead log on disk of both the primary and standby server. The only possibility that data can be lost is if both the primary and the standby suffer crashes at the same time. This can provide a much higher level of durability, though only if the sysadmin is cautious about the placement and management of the two servers. Waiting for confirmation increases the user's confidence that the changes will not be lost in the event of server crashes but it also necessarily increases the response time for the requesting transaction. The minimum wait time is the round-trip time between primary and standby." If you are not referring to above then you will need to explain further. > > Regards, > Durgamahesh > -- Adrian Klaver adrian.klaver@aklaver.com
On Thu, Jan 23, 2025 at 10:08 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 1/22/25 18:53, Durgamahesh Manne wrote:
>
>
>
> > But records count varies with difference of more than 10 thousand
>
> Have you looked at the I/0 statistics between the Postgres instances?
>
> Seems everything looks good with pg replication slots
Except the subscriber is lagging behind the publisher.
'... everything looks good' is an opinion not actual data.
>
> Does this pg logical slot get changes function help to pull pending
> changes to subscription that can be sync with publication server for
> real time sync ?
Are you referring to this?:
https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION
Though I am not sure you want to do this as from above:
"When requesting synchronous replication, each commit of a write
transaction will wait until confirmation is received that the commit has
been written to the write-ahead log on disk of both the primary and
standby server. The only possibility that data can be lost is if both
the primary and the standby suffer crashes at the same time. This can
provide a much higher level of durability, though only if the sysadmin
is cautious about the placement and management of the two servers.
Waiting for confirmation increases the user's confidence that the
changes will not be lost in the event of server crashes but it also
necessarily increases the response time for the requesting transaction.
The minimum wait time is the round-trip time between primary and standby."
If you are not referring to above then you will need to explain further.
>
> Regards,
> Durgamahesh
>
--
Adrian Klaver
adrian.klaver@aklaver.com
Hi
Source Publication Side:
archiving=> select * from pg_replication_slots ;
-[ RECORD 1 ]-------+--------------
slot_name | cls_eva_msa
plugin | pgoutput
slot_type | logical
datoid | 16601
database | archiving
temporary | f
active | t
active_pid | 3237
xmin |
catalog_xmin | 2935229621
restart_lsn | 16C8/40CEC600
confirmed_flush_lsn | 16C8/440FFF50
wal_status | reserved
safe_wal_size |
two_phase | f
conflicting | f
-[ RECORD 2 ]-------+--------------
slot_name | cle_clm_mka
plugin | pgoutput
slot_type | logical
datoid | 16601
database | archiving
temporary | f
active | t
active_pid | 3501
xmin |
catalog_xmin | 2935229621
restart_lsn | 16C8/40CEC600
confirmed_flush_lsn | 16C8/440FFF50
wal_status | reserved
safe_wal_size |
two_phase | f
conflicting | f
archiving=> select * from pg_stat_replication;-[ RECORD 1 ]-------+--------------
slot_name | cls_eva_msa
plugin | pgoutput
slot_type | logical
datoid | 16601
database | archiving
temporary | f
active | t
active_pid | 3237
xmin |
catalog_xmin | 2935229621
restart_lsn | 16C8/40CEC600
confirmed_flush_lsn | 16C8/440FFF50
wal_status | reserved
safe_wal_size |
two_phase | f
conflicting | f
-[ RECORD 2 ]-------+--------------
slot_name | cle_clm_mka
plugin | pgoutput
slot_type | logical
datoid | 16601
database | archiving
temporary | f
active | t
active_pid | 3501
xmin |
catalog_xmin | 2935229621
restart_lsn | 16C8/40CEC600
confirmed_flush_lsn | 16C8/440FFF50
wal_status | reserved
safe_wal_size |
two_phase | f
conflicting | f
client_hostname |
client_port | 52506
backend_start | 2025-01-23 16:58:04.697304+00
backend_xmin |
state | streaming
sent_lsn | 16C7/BDE4BB48
write_lsn | 16C7/BDE4BB48
flush_lsn | 16C7/BDE4BB48
replay_lsn | 16C7/BDE4BB48
write_lag | 00:00:00.002271
flush_lag | 00:00:00.002271
replay_lag | 00:00:00.002271
sync_priority | 0
sync_state | async
reply_time | 2025-01-23 17:34:39.901979+00
-[ RECORD 2 ]----+------------------------------
pid | 3501
usesysid | 14604130
usename | archiving
application_name | cle_clm_mka
client_addr | 10.80.0.168
client_hostname |
client_port | 55412
backend_start | 2025-01-22 09:31:11.83963+00
backend_xmin |
state | streaming
sent_lsn | 16C7/BDE4BB48
write_lsn | 16C7/BDE4BB48
flush_lsn | 16C7/BDE4BB48
replay_lsn | 16C7/BDE4BB48
write_lag | 00:00:00.001642
flush_lag | 00:00:00.023143
replay_lag | 00:00:00.001642
sync_priority | 0
sync_state | async
reply_time | 2025-01-23 17:34:39.903052+00
Subscription Side : archiving=> select * from pg_stat_subscription where subname = 'cls_eva_msa';
-[ RECORD 1 ]---------+------------------------------subid | 1936652827
subname | cls_eva_msa
pid | 18746
relid |
received_lsn | 16C7/FB48DFE0
last_msg_send_time | 2025-01-23 17:41:11.924562+00
last_msg_receipt_time | 2025-01-23 17:41:11.933344+00
latest_end_lsn | 16C7/FB48DFE0
latest_end_time | 2025-01-23 17:41:11.924562+00
archiving=> select * from pg_stat_subscription where subname = 'cle_clm_mka';
-[ RECORD 1 ]---------+------------------------------
subid | 1892055116
subname | cle_clm_mka
pid | 507
relid |
received_lsn | 16C7/FB8CDF68
last_msg_send_time | 2025-01-23 17:41:17.375879+00
last_msg_receipt_time | 2025-01-23 17:41:17.378932+00
latest_end_lsn | 16C7/FB8CDF68
latest_end_time | 2025-01-23 17:41:17.375879+00
If you need to see more stats based on your info will give you
Your response in this regard is valuable
Are you referring to this?:
https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION
No.I am using logical replication in asynchronous mode
Except the subscriber is lagging behind the publisher.
'... everything looks good' is an opinion not actual data.
'... everything looks good' is an opinion not actual data.
Correct
Regards
Durga Mahesh
On Thu, Jan 23, 2025 at 11:24 PM Durgamahesh Manne <maheshpostgres9@gmail.com> wrote:
On Thu, Jan 23, 2025 at 10:08 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:On 1/22/25 18:53, Durgamahesh Manne wrote:
>
>
>
> > But records count varies with difference of more than 10 thousand
>
> Have you looked at the I/0 statistics between the Postgres instances?
>
> Seems everything looks good with pg replication slots
Except the subscriber is lagging behind the publisher.
'... everything looks good' is an opinion not actual data.
>
> Does this pg logical slot get changes function help to pull pending
> changes to subscription that can be sync with publication server for
> real time sync ?
Are you referring to this?:
https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION
Though I am not sure you want to do this as from above:
"When requesting synchronous replication, each commit of a write
transaction will wait until confirmation is received that the commit has
been written to the write-ahead log on disk of both the primary and
standby server. The only possibility that data can be lost is if both
the primary and the standby suffer crashes at the same time. This can
provide a much higher level of durability, though only if the sysadmin
is cautious about the placement and management of the two servers.
Waiting for confirmation increases the user's confidence that the
changes will not be lost in the event of server crashes but it also
necessarily increases the response time for the requesting transaction.
The minimum wait time is the round-trip time between primary and standby."
If you are not referring to above then you will need to explain further.
>
> Regards,
> Durgamahesh
>
--
Adrian Klaver
adrian.klaver@aklaver.comHiSource Publication Side:archiving=> select * from pg_replication_slots ;archiving=> select * from pg_stat_replication;
-[ RECORD 1 ]-------+--------------
slot_name | cls_eva_msa
plugin | pgoutput
slot_type | logical
datoid | 16601
database | archiving
temporary | f
active | t
active_pid | 3237
xmin |
catalog_xmin | 2935229621
restart_lsn | 16C8/40CEC600
confirmed_flush_lsn | 16C8/440FFF50
wal_status | reserved
safe_wal_size |
two_phase | f
conflicting | f
-[ RECORD 2 ]-------+--------------
slot_name | cle_clm_mka
plugin | pgoutput
slot_type | logical
datoid | 16601
database | archiving
temporary | f
active | t
active_pid | 3501
xmin |
catalog_xmin | 2935229621
restart_lsn | 16C8/40CEC600
confirmed_flush_lsn | 16C8/440FFF50
wal_status | reserved
safe_wal_size |
two_phase | f
conflicting | f
client_hostname |
client_port | 52506
backend_start | 2025-01-23 16:58:04.697304+00
backend_xmin |
state | streaming
sent_lsn | 16C7/BDE4BB48
write_lsn | 16C7/BDE4BB48
flush_lsn | 16C7/BDE4BB48
replay_lsn | 16C7/BDE4BB48
write_lag | 00:00:00.002271
flush_lag | 00:00:00.002271
replay_lag | 00:00:00.002271
sync_priority | 0
sync_state | async
reply_time | 2025-01-23 17:34:39.901979+00
-[ RECORD 2 ]----+------------------------------
pid | 3501
usesysid | 14604130
usename | archiving
application_name | cle_clm_mka
client_addr | 10.80.0.168
client_hostname |
client_port | 55412
backend_start | 2025-01-22 09:31:11.83963+00
backend_xmin |
state | streaming
sent_lsn | 16C7/BDE4BB48
write_lsn | 16C7/BDE4BB48
flush_lsn | 16C7/BDE4BB48
replay_lsn | 16C7/BDE4BB48
write_lag | 00:00:00.001642
flush_lag | 00:00:00.023143
replay_lag | 00:00:00.001642
sync_priority | 0
sync_state | async
reply_time | 2025-01-23 17:34:39.903052+00Subscription Side : archiving=> select * from pg_stat_subscription where subname = 'cls_eva_msa';-[ RECORD 1 ]---------+------------------------------
subid | 1936652827
subname | cls_eva_msa
pid | 18746
relid |
received_lsn | 16C7/FB48DFE0
last_msg_send_time | 2025-01-23 17:41:11.924562+00
last_msg_receipt_time | 2025-01-23 17:41:11.933344+00
latest_end_lsn | 16C7/FB48DFE0
latest_end_time | 2025-01-23 17:41:11.924562+00
archiving=> select * from pg_stat_subscription where subname = 'cle_clm_mka';
-[ RECORD 1 ]---------+------------------------------
subid | 1892055116
subname | cle_clm_mka
pid | 507
relid |
received_lsn | 16C7/FB8CDF68
last_msg_send_time | 2025-01-23 17:41:17.375879+00
last_msg_receipt_time | 2025-01-23 17:41:17.378932+00
latest_end_lsn | 16C7/FB8CDF68
latest_end_time | 2025-01-23 17:41:17.375879+00If you need to see more stats based on your info will give youYour response in this regard is valuable
Are you referring to this?:
https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATIONNo.I am using logical replication in asynchronous modeExcept the subscriber is lagging behind the publisher.
'... everything looks good' is an opinion not actual data.CorrectRegardsDurga Mahesh
Hi Adrian Klaver
At subscription side
getting this error repeatedly So Do we have any solution without resync full data again from scratch
2025-01-23 18:11:46 UTC::@:[507]:DEBUG: logical replication did not find row to be updated in replication target relation's partition "markets_archive_p20250118"
Regards,
Durga Mahesh
On 1/23/25 09:54, Durgamahesh Manne wrote: See comments in line below. > > Source Publication Side: > archiving=> select * from pg_stat_replication; There is missing information here. Am I right in assuming this is for slot cls_eva_msa? And that it going to same client_addr 10.80.0.168? > client_hostname | > client_port | 52506 > backend_start | 2025-01-23 16:58:04.697304+00 > backend_xmin | > state | streaming > sent_lsn | 16C7/BDE4BB48 > write_lsn | 16C7/BDE4BB48 > flush_lsn | 16C7/BDE4BB48 > replay_lsn | 16C7/BDE4BB48 > write_lag | 00:00:00.002271 > flush_lag | 00:00:00.002271 > replay_lag | 00:00:00.002271 > sync_priority | 0 > sync_state | async > reply_time | 2025-01-23 17:34:39.901979+00 > -[ RECORD 2 ]----+------------------------------ > pid | 3501 > usesysid | 14604130 > usename | archiving > application_name | cle_clm_mka > client_addr | 10.80.0.168 > client_hostname | > client_port | 55412 > backend_start | 2025-01-22 09:31:11.83963+00 > backend_xmin | > state | streaming > sent_lsn | 16C7/BDE4BB48 > write_lsn | 16C7/BDE4BB48 > flush_lsn | 16C7/BDE4BB48 > replay_lsn | 16C7/BDE4BB48 > write_lag | 00:00:00.001642 > flush_lag | 00:00:00.023143 > replay_lag | 00:00:00.001642 > sync_priority | 0 > sync_state | async > reply_time | 2025-01-23 17:34:39.903052+00 The lag times are minimal. Where the queries done below done at later time then those above? > Subscription Side : archiving=> select * from pg_stat_subscription where > subname = 'cls_eva_msa'; > -[ RECORD 1 ]---------+------------------------------ > subid | 1936652827 > subname | cls_eva_msa > pid | 18746 > relid | > received_lsn | 16C7/FB48DFE0 > last_msg_send_time | 2025-01-23 17:41:11.924562+00 > last_msg_receipt_time | 2025-01-23 17:41:11.933344+00 > latest_end_lsn | 16C7/FB48DFE0 > latest_end_time | 2025-01-23 17:41:11.924562+00 > > archiving=> select * from pg_stat_subscription where subname = > 'cle_clm_mka'; > -[ RECORD 1 ]---------+------------------------------ > subid | 1892055116 > subname | cle_clm_mka > pid | 507 > relid | > received_lsn | 16C7/FB8CDF68 > last_msg_send_time | 2025-01-23 17:41:17.375879+00 > last_msg_receipt_time | 2025-01-23 17:41:17.378932+00 > latest_end_lsn | 16C7/FB8CDF68 > latest_end_time | 2025-01-23 17:41:17.375879+00 > > '... everything looks good' is an opinion not actual data. > Correct So what does the AWS dashboard show for the I/0 between the servers? > > Regards > Durga Mahesh -- Adrian Klaver adrian.klaver@aklaver.com
On 1/23/25 10:21, Durgamahesh Manne wrote: > At subscription side > getting this error repeatedly So Do we have any solution without resync > full data again from scratch > > * |2025-01-23 18:11:46 UTC::@:[507]:DEBUG: logical replication did not > find row to be updated in replication target relation's partition > "markets_archive_p20250118"| That looks like a different issue. On the subscriber side is someone/something changing the data? Are there more log reports related to this at something above the DEBUG level? I'm thinking this is related to from here: https://www.postgresql.org/docs/current/logical-replication-conflicts.html "When replicating UPDATE or DELETE operations, missing data will not produce a conflict and such operations will simply be skipped." I don't have time at the moment to dig any further. > > Regards, > Durga Mahesh -- Adrian Klaver adrian.klaver@aklaver.com