Обсуждение: Records count mismatch with logical replication

Поиск
Список
Период
Сортировка

Records count mismatch with logical replication

От
Durgamahesh Manne
Дата:
Hi  Team,

I have publication and subscription servers .So seems data replication running with minimal lag but records count mismatch with more than 10 thousand records between source and destination tables 

Could you please help in resolving this issue?

Regards,
Durga Mahesh 



Re: Records count mismatch with logical replication

От
Adrian Klaver
Дата:
On 1/21/25 04:08, Durgamahesh Manne wrote:
> Hi  Team,
> 
> I have publication and subscription servers .So seems data replication 
> running with minimal lag but records count mismatch with more than 10 
> thousand records between source and destination tables
> 
> Could you please help in resolving this issue?

Not without a good deal more information:

1) Postgres version on either side of the replication.

2) The replication configuration.

3) Define how lag is being calculated and what 'minimal' is.

4) Define how the record counts are being derived.

5) The network distance between the servers.

> 
> Regards,
> Durga Mahesh
> 
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Records count mismatch with logical replication

От
Durgamahesh Manne
Дата:


On Tue, Jan 21, 2025 at 9:24 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 1/21/25 04:08, Durgamahesh Manne wrote:
> Hi  Team,
>
> I have publication and subscription servers .So seems data replication
> running with minimal lag but records count mismatch with more than 10
> thousand records between source and destination tables
>
> Could you please help in resolving this issue?

Not without a good deal more information:

1) Postgres version on either side of the replication.

2) The replication configuration.

3) Define how lag is being calculated and what 'minimal' is.

4) Define how the record counts are being derived.

5) The network distance between the servers.

>
> Regards,
> Durga Mahesh
>
>
>

--
Adrian Klaver
adrian.klaver@aklaver.com

Hi Adrian Klaver 

Really Thanks for your quick response 

This happened during repack lag went to more than 350Gb then gradually decreased to minimal lag after running pg_repack 

1) Postgres version on either side of the replication.

 Source(publication) :16.4
 Destination(subscription) : 14.11

2) The replication configuration.
OLAP workload archiving (32GB 8 Vcpus)
 Source : wal_level = logical
wal_sender_timeout = 30s
 max_wal_senders = 40
max_replication_slots = 20
max_logical_replication_workers = 4
wal_buffers = 64MB 
commit_delay = 2000
commi_siblings = 12
wal_writer_delay = 300
wal_writer_flush_after = 1MB
bgwriter_delay = 20 
min_wal_size = 8GB
max_wal_size = 32Gb

Destination :  128GB 32 vcpus 
wal_level = logical
wal_receiver_timeout = 30s
 max_wal_senders = 40
max_replication_slots = 60
max_logical_replication_workers = 23
wal_buffers = 64MB 
commit_delay = default
commi_siblings = default
wal_writer_delay = default
wal_writer_flush_after = default
bgwriter_delay = 20 
min_wal_size = 8GB
max_wal_size = 32Gb




3) Define how lag is being calculated and what 'minimal' is.

postgres=> select slot_name,pg_wal_lsn_diff(pg_current_wal_lsn(),restart_lsn) as bytes_behind,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) as behind_size, active from pg_replication_slots;
  slot_name  | bytes_behind | behind_size | active
-------------+--------------+-------------+--------
 cls_eva_msa |     22906216 | 22 MB       | t

4) Define how the record counts are being derived.
  Source : archiving=> select count(*) from archiving.events_archive ;
  count
---------
 1262908
(1 row)

Destination : archiving=> select count(*) from archiving.events_archive ;
  count
---------
 1252062
(1 row)

5) The network distance between the servers.

 Both are under same vpc security groups

Regards
Durga Mahesh
 

Re: Records count mismatch with logical replication

От
Adrian Klaver
Дата:
On 1/21/25 09:38, Durgamahesh Manne wrote:
> 
> 

> 
> Hi Adrian Klaver
> 
> Really Thanks for your quick response
> 
> This happened during repack lag went to more than 350Gb then gradually 
> decreased to minimal lag after running pg_repack

I don't use pg_repack so I don't know what effect it would have on the 
process.
> 

> 3) Define how lag is being calculated and what 'minimal' is.
> 
> postgres=> select 
> slot_name,pg_wal_lsn_diff(pg_current_wal_lsn(),restart_lsn) as bytes_behind,
> pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) as 
> behind_size, active from pg_replication_slots;
>    slot_name  | bytes_behind | behind_size | active
> -------------+--------------+-------------+--------
>   cls_eva_msa |     22906216 | 22 MB       | t
> 
> 4) Define how the record counts are being derived.
>    Source : archiving=> select count(*) from archiving.events_archive ;
>    count
> ---------
>   1262908
> (1 row)
> 
> Destination : archiving=> select count(*) from archiving.events_archive ;
>    count
> ---------
>   1252062
> (1 row)

22,906,216 bytes/10,846 rows  works out to 2112 bytes per row.

Is that a reasonable per row estimate?

> 
> 5) The network distance between the servers.
> 
>   Both are under same vpc security groups
> 
> Regards
> Durga Mahesh

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Records count mismatch with logical replication

От
Durgamahesh Manne
Дата:


On Tue, Jan 21, 2025 at 11:26 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 1/21/25 09:38, Durgamahesh Manne wrote:
>
>

>
> Hi Adrian Klaver
>
> Really Thanks for your quick response
>
> This happened during repack lag went to more than 350Gb then gradually
> decreased to minimal lag after running pg_repack

I don't use pg_repack so I don't know what effect it would have on the
process.
>

> 3) Define how lag is being calculated and what 'minimal' is.
>
> postgres=> select
> slot_name,pg_wal_lsn_diff(pg_current_wal_lsn(),restart_lsn) as bytes_behind,
> pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) as
> behind_size, active from pg_replication_slots;
>    slot_name  | bytes_behind | behind_size | active
> -------------+--------------+-------------+--------
>   cls_eva_msa |     22906216 | 22 MB       | t
>
> 4) Define how the record counts are being derived.
>    Source : archiving=> select count(*) from archiving.events_archive ;
>    count
> ---------
>   1262908
> (1 row)
>
> Destination : archiving=> select count(*) from archiving.events_archive ;
>    count
> ---------
>   1252062
> (1 row)

22,906,216 bytes/10,846 rows  works out to 2112 bytes per row.

Is that a reasonable per row estimate?

>
> 5) The network distance between the servers.
>
>   Both are under same vpc security groups
>
> Regards
> Durga Mahesh

--
Adrian Klaver
adrian.klaver@aklaver.com

 
Hi Adrian Klaver 

22,906,216 bytes/10,846 rows  works out to 2112 bytes per row.

Is that a reasonable per row estimate?

Yes  sometimes would be vary 

Regards,
Durga Mahesh

 

Re: Records count mismatch with logical replication

От
Adrian Klaver
Дата:

On 1/21/25 10:06 AM, Durgamahesh Manne wrote:

> 
> Hi Adrian Klaver
> 
> 22,906,216 bytes/10,846 rows  works out to 2112 bytes per row.
> 
> Is that a reasonable per row estimate?
> 
> Yes  sometimes would be vary

If I am following the lag went from 350GB behind to 22MB.

Is the issue that the lag has stalled at 22MB?

> 
> Regards,
> Durga Mahesh
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Records count mismatch with logical replication

От
Durgamahesh Manne
Дата:


On Wed, 22 Jan, 2025, 00:22 Adrian Klaver, <adrian.klaver@aklaver.com> wrote:


On 1/21/25 10:06 AM, Durgamahesh Manne wrote:

>
> Hi Adrian Klaver
>
> 22,906,216 bytes/10,846 rows  works out to 2112 bytes per row.
>
> Is that a reasonable per row estimate?
>
> Yes  sometimes would be vary

If I am following the lag went from 350GB behind to 22MB.

Is the issue that the lag has stalled at 22MB?

>
> Regards,
> Durga Mahesh
>

--
Adrian Klaver
adrian.klaver@aklaver.com

Hi Adrian Klaver

Is the issue that the lag has stalled at 22MB?

Depends on load of source 
The lag would be either decrease or increase in Kb 's and Mb's (not Gb's)
It s not constant as Data being replicated to target

But records count varies with difference of more than 10 thousand 

How to mitigate this issue in simplest way ?

Regards 
Durga Mahesh 

Re: Records count mismatch with logical replication

От
Adrian Klaver
Дата:
On 1/21/25 11:40, Durgamahesh Manne wrote:
> 
> 
> On Wed, 22 Jan, 2025, 00:22 Adrian Klaver, <adrian.klaver@aklaver.com 
> <mailto:adrian.klaver@aklaver.com>> wrote:
> 
> 
> 
>     On 1/21/25 10:06 AM, Durgamahesh Manne wrote:
> 
>      >
>      > Hi Adrian Klaver
>      >
>      > 22,906,216 bytes/10,846 rows  works out to 2112 bytes per row.
>      >
>      > Is that a reasonable per row estimate?
>      >
>      > Yes  sometimes would be vary
> 
>     If I am following the lag went from 350GB behind to 22MB.
> 
>     Is the issue that the lag has stalled at 22MB?
> 
>      >
>      > Regards,
>      > Durga Mahesh
>      >
> 
>     -- 
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
> 
> 
> Hi Adrian Klaver
> 
> Is the issue that the lag has stalled at 22MB?
> 
> Depends on load of source
> The lag would be either decrease or increase in Kb 's and Mb's (not Gb's)
> It s not constant as Data being replicated to target

Previously you stated:

"Both are under same vpc security groups"

Does this refer to AWS?

If so per:

https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html

"The following diagram shows an example VPC. The VPC has one subnet in 
each of the Availability Zones in the Region, EC2 instances in each 
subnet, and an internet gateway to allow communication between the 
resources in your VPC and the internet."

So where are the two Postgres instances physically located relative to 
each other?

> 
> But records count varies with difference of more than 10 thousand

Have you looked at the I/0 statistics between the Postgres instances?

> 
> How to mitigate this issue in simplest way ?

Until it is determined what is causing the lag there is no way to deal 
with it.

> 
> Regards
> Durga Mahesh

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Records count mismatch with logical replication

От
Durgamahesh Manne
Дата:



On Wed, 22 Jan, 2025, 03:11 Adrian Klaver, <adrian.klaver@aklaver.com> wrote:
On 1/21/25 11:40, Durgamahesh Manne wrote:
>
>
> On Wed, 22 Jan, 2025, 00:22 Adrian Klaver, <adrian.klaver@aklaver.com
> <mailto:adrian.klaver@aklaver.com>> wrote:
>
>
>
>     On 1/21/25 10:06 AM, Durgamahesh Manne wrote:
>
>      >
>      > Hi Adrian Klaver
>      >
>      > 22,906,216 bytes/10,846 rows  works out to 2112 bytes per row.
>      >
>      > Is that a reasonable per row estimate?
>      >
>      > Yes  sometimes would be vary
>
>     If I am following the lag went from 350GB behind to 22MB.
>
>     Is the issue that the lag has stalled at 22MB?
>
>      >
>      > Regards,
>      > Durga Mahesh
>      >
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>
>
> Hi Adrian Klaver
>
> Is the issue that the lag has stalled at 22MB?
>
> Depends on load of source
> The lag would be either decrease or increase in Kb 's and Mb's (not Gb's)
> It s not constant as Data being replicated to target

Previously you stated:

"Both are under same vpc security groups"

Does this refer to AWS?

If so per:

https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html

"The following diagram shows an example VPC. The VPC has one subnet in
each of the Availability Zones in the Region, EC2 instances in each
subnet, and an internet gateway to allow communication between the
resources in your VPC and the internet."

So where are the two Postgres instances physically located relative to
each other?

>
> But records count varies with difference of more than 10 thousand

Have you looked at the I/0 statistics between the Postgres instances?

>
> How to mitigate this issue in simplest way ?

Until it is determined what is causing the lag there is no way to deal
with it.

>
> Regards
> Durga Mahesh

--
Adrian Klaver
adrian.klaver@aklaver.com



Hi
Hi
⁹So where are the two Postgres instances physically located relative to
each other?
Both in lreland under same vpc security groups 


>
> But records count varies with difference of more than 10 thousand

Have you looked at the I/0 statistics between the Postgres instances?

Seems everything looks good with pg replication slots 

Does this pg logical slot get changes function help to pull pending changes to subscription that can be sync with publication server for real time sync ?

Regards,
Durgamahesh 

Re: Records count mismatch with logical replication

От
Adrian Klaver
Дата:
On 1/22/25 18:53, Durgamahesh Manne wrote:
> 
> 
> 

>  > But records count varies with difference of more than 10 thousand
> 
> Have you looked at the I/0 statistics between the Postgres instances?
> 
> Seems everything looks good with pg replication slots

Except the subscriber is lagging behind the publisher.

'... everything looks good' is an opinion not actual data.

> 
> Does this pg logical slot get changes function help to pull pending 
> changes to subscription that can be sync with publication server for 
> real time sync ?

Are you referring to this?:

https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION

Though I am not sure you want to do this as from above:

"When requesting synchronous replication, each commit of a write 
transaction will wait until confirmation is received that the commit has 
been written to the write-ahead log on disk of both the primary and 
standby server. The only possibility that data can be lost is if both 
the primary and the standby suffer crashes at the same time. This can 
provide a much higher level of durability, though only if the sysadmin 
is cautious about the placement and management of the two servers. 
Waiting for confirmation increases the user's confidence that the 
changes will not be lost in the event of server crashes but it also 
necessarily increases the response time for the requesting transaction. 
The minimum wait time is the round-trip time between primary and standby."

If you are not referring to above then you will need to explain further.

> 
> Regards,
> Durgamahesh
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Records count mismatch with logical replication

От
Durgamahesh Manne
Дата:


On Thu, Jan 23, 2025 at 10:08 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 1/22/25 18:53, Durgamahesh Manne wrote:
>
>
>

>  > But records count varies with difference of more than 10 thousand
>
> Have you looked at the I/0 statistics between the Postgres instances?
>
> Seems everything looks good with pg replication slots

Except the subscriber is lagging behind the publisher.

'... everything looks good' is an opinion not actual data.

>
> Does this pg logical slot get changes function help to pull pending
> changes to subscription that can be sync with publication server for
> real time sync ?

Are you referring to this?:

https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION

Though I am not sure you want to do this as from above:

"When requesting synchronous replication, each commit of a write
transaction will wait until confirmation is received that the commit has
been written to the write-ahead log on disk of both the primary and
standby server. The only possibility that data can be lost is if both
the primary and the standby suffer crashes at the same time. This can
provide a much higher level of durability, though only if the sysadmin
is cautious about the placement and management of the two servers.
Waiting for confirmation increases the user's confidence that the
changes will not be lost in the event of server crashes but it also
necessarily increases the response time for the requesting transaction.
The minimum wait time is the round-trip time between primary and standby."

If you are not referring to above then you will need to explain further.

>
> Regards,
> Durgamahesh
>

--
Adrian Klaver
adrian.klaver@aklaver.com


Hi 

Source Publication Side:
archiving=> select * from pg_replication_slots ;
-[ RECORD 1 ]-------+--------------
slot_name           | cls_eva_msa
plugin              | pgoutput
slot_type           | logical
datoid              | 16601
database            | archiving
temporary           | f
active              | t
active_pid          | 3237
xmin                |
catalog_xmin        | 2935229621
restart_lsn         | 16C8/40CEC600
confirmed_flush_lsn | 16C8/440FFF50
wal_status          | reserved
safe_wal_size       |
two_phase           | f
conflicting         | f
-[ RECORD 2 ]-------+--------------
slot_name           | cle_clm_mka
plugin              | pgoutput
slot_type           | logical
datoid              | 16601
database            | archiving
temporary           | f
active              | t
active_pid          | 3501
xmin                |
catalog_xmin        | 2935229621
restart_lsn         | 16C8/40CEC600
confirmed_flush_lsn | 16C8/440FFF50
wal_status          | reserved
safe_wal_size       |
two_phase           | f
conflicting         | f
archiving=> select * from pg_stat_replication;
client_hostname  |
client_port      | 52506
backend_start    | 2025-01-23 16:58:04.697304+00
backend_xmin     |
state            | streaming
sent_lsn         | 16C7/BDE4BB48
write_lsn        | 16C7/BDE4BB48
flush_lsn        | 16C7/BDE4BB48
replay_lsn       | 16C7/BDE4BB48
write_lag        | 00:00:00.002271
flush_lag        | 00:00:00.002271
replay_lag       | 00:00:00.002271
sync_priority    | 0
sync_state       | async
reply_time       | 2025-01-23 17:34:39.901979+00
-[ RECORD 2 ]----+------------------------------
pid              | 3501
usesysid         | 14604130
usename          | archiving
application_name | cle_clm_mka
client_addr      | 10.80.0.168
client_hostname  |
client_port      | 55412
backend_start    | 2025-01-22 09:31:11.83963+00
backend_xmin     |
state            | streaming
sent_lsn         | 16C7/BDE4BB48
write_lsn        | 16C7/BDE4BB48
flush_lsn        | 16C7/BDE4BB48
replay_lsn       | 16C7/BDE4BB48
write_lag        | 00:00:00.001642
flush_lag        | 00:00:00.023143
replay_lag       | 00:00:00.001642
sync_priority    | 0
sync_state       | async
reply_time       | 2025-01-23 17:34:39.903052+00
 
Subscription Side : archiving=> select * from pg_stat_subscription where subname = 'cls_eva_msa';
-[ RECORD 1 ]---------+------------------------------
subid                 | 1936652827
subname               | cls_eva_msa
pid                   | 18746
relid                 |
received_lsn          | 16C7/FB48DFE0
last_msg_send_time    | 2025-01-23 17:41:11.924562+00
last_msg_receipt_time | 2025-01-23 17:41:11.933344+00
latest_end_lsn        | 16C7/FB48DFE0
latest_end_time       | 2025-01-23 17:41:11.924562+00

archiving=> select * from pg_stat_subscription where subname = 'cle_clm_mka';
-[ RECORD 1 ]---------+------------------------------
subid                 | 1892055116
subname               | cle_clm_mka
pid                   | 507
relid                 |
received_lsn          | 16C7/FB8CDF68
last_msg_send_time    | 2025-01-23 17:41:17.375879+00
last_msg_receipt_time | 2025-01-23 17:41:17.378932+00
latest_end_lsn        | 16C7/FB8CDF68
latest_end_time       | 2025-01-23 17:41:17.375879+00

If you need to see more stats based on your info will give you 
Your response in this regard is valuable 


No.I am using logical replication in asynchronous mode 

Except the subscriber is lagging behind the publisher.

'... everything looks good' is an opinion not actual data.
Correct 

Regards
Durga Mahesh

Re: Records count mismatch with logical replication

От
Durgamahesh Manne
Дата:


On Thu, Jan 23, 2025 at 11:24 PM Durgamahesh Manne <maheshpostgres9@gmail.com> wrote:


On Thu, Jan 23, 2025 at 10:08 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 1/22/25 18:53, Durgamahesh Manne wrote:
>
>
>

>  > But records count varies with difference of more than 10 thousand
>
> Have you looked at the I/0 statistics between the Postgres instances?
>
> Seems everything looks good with pg replication slots

Except the subscriber is lagging behind the publisher.

'... everything looks good' is an opinion not actual data.

>
> Does this pg logical slot get changes function help to pull pending
> changes to subscription that can be sync with publication server for
> real time sync ?

Are you referring to this?:

https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION

Though I am not sure you want to do this as from above:

"When requesting synchronous replication, each commit of a write
transaction will wait until confirmation is received that the commit has
been written to the write-ahead log on disk of both the primary and
standby server. The only possibility that data can be lost is if both
the primary and the standby suffer crashes at the same time. This can
provide a much higher level of durability, though only if the sysadmin
is cautious about the placement and management of the two servers.
Waiting for confirmation increases the user's confidence that the
changes will not be lost in the event of server crashes but it also
necessarily increases the response time for the requesting transaction.
The minimum wait time is the round-trip time between primary and standby."

If you are not referring to above then you will need to explain further.

>
> Regards,
> Durgamahesh
>

--
Adrian Klaver
adrian.klaver@aklaver.com


Hi 

Source Publication Side:
archiving=> select * from pg_replication_slots ;
-[ RECORD 1 ]-------+--------------
slot_name           | cls_eva_msa
plugin              | pgoutput
slot_type           | logical
datoid              | 16601
database            | archiving
temporary           | f
active              | t
active_pid          | 3237
xmin                |
catalog_xmin        | 2935229621
restart_lsn         | 16C8/40CEC600
confirmed_flush_lsn | 16C8/440FFF50
wal_status          | reserved
safe_wal_size       |
two_phase           | f
conflicting         | f
-[ RECORD 2 ]-------+--------------
slot_name           | cle_clm_mka
plugin              | pgoutput
slot_type           | logical
datoid              | 16601
database            | archiving
temporary           | f
active              | t
active_pid          | 3501
xmin                |
catalog_xmin        | 2935229621
restart_lsn         | 16C8/40CEC600
confirmed_flush_lsn | 16C8/440FFF50
wal_status          | reserved
safe_wal_size       |
two_phase           | f
conflicting         | f
archiving=> select * from pg_stat_replication;
client_hostname  |
client_port      | 52506
backend_start    | 2025-01-23 16:58:04.697304+00
backend_xmin     |
state            | streaming
sent_lsn         | 16C7/BDE4BB48
write_lsn        | 16C7/BDE4BB48
flush_lsn        | 16C7/BDE4BB48
replay_lsn       | 16C7/BDE4BB48
write_lag        | 00:00:00.002271
flush_lag        | 00:00:00.002271
replay_lag       | 00:00:00.002271
sync_priority    | 0
sync_state       | async
reply_time       | 2025-01-23 17:34:39.901979+00
-[ RECORD 2 ]----+------------------------------
pid              | 3501
usesysid         | 14604130
usename          | archiving
application_name | cle_clm_mka
client_addr      | 10.80.0.168
client_hostname  |
client_port      | 55412
backend_start    | 2025-01-22 09:31:11.83963+00
backend_xmin     |
state            | streaming
sent_lsn         | 16C7/BDE4BB48
write_lsn        | 16C7/BDE4BB48
flush_lsn        | 16C7/BDE4BB48
replay_lsn       | 16C7/BDE4BB48
write_lag        | 00:00:00.001642
flush_lag        | 00:00:00.023143
replay_lag       | 00:00:00.001642
sync_priority    | 0
sync_state       | async
reply_time       | 2025-01-23 17:34:39.903052+00
 
Subscription Side : archiving=> select * from pg_stat_subscription where subname = 'cls_eva_msa';
-[ RECORD 1 ]---------+------------------------------
subid                 | 1936652827
subname               | cls_eva_msa
pid                   | 18746
relid                 |
received_lsn          | 16C7/FB48DFE0
last_msg_send_time    | 2025-01-23 17:41:11.924562+00
last_msg_receipt_time | 2025-01-23 17:41:11.933344+00
latest_end_lsn        | 16C7/FB48DFE0
latest_end_time       | 2025-01-23 17:41:11.924562+00

archiving=> select * from pg_stat_subscription where subname = 'cle_clm_mka';
-[ RECORD 1 ]---------+------------------------------
subid                 | 1892055116
subname               | cle_clm_mka
pid                   | 507
relid                 |
received_lsn          | 16C7/FB8CDF68
last_msg_send_time    | 2025-01-23 17:41:17.375879+00
last_msg_receipt_time | 2025-01-23 17:41:17.378932+00
latest_end_lsn        | 16C7/FB8CDF68
latest_end_time       | 2025-01-23 17:41:17.375879+00

If you need to see more stats based on your info will give you 
Your response in this regard is valuable 


No.I am using logical replication in asynchronous mode 

Except the subscriber is lagging behind the publisher.

'... everything looks good' is an opinion not actual data.
Correct 

Regards
Durga Mahesh

Hi Adrian Klaver

At subscription side 
getting this error repeatedly  So Do we have any solution without resync full data again from scratch 
  • 2025-01-23 18:11:46 UTC::@:[507]:DEBUG: logical replication did not find row to be updated in replication target relation's partition "markets_archive_p20250118"
 Regards,
Durga Mahesh

Re: Records count mismatch with logical replication

От
Adrian Klaver
Дата:
On 1/23/25 09:54, Durgamahesh Manne wrote:


See comments in line below.

> 
> Source Publication Side:

> archiving=> select * from pg_stat_replication;

There is missing information here.

Am I right in assuming this is for slot cls_eva_msa?

And that it going to same client_addr 10.80.0.168?

> client_hostname  |
> client_port      | 52506
> backend_start    | 2025-01-23 16:58:04.697304+00
> backend_xmin     |
> state            | streaming
> sent_lsn         | 16C7/BDE4BB48
> write_lsn        | 16C7/BDE4BB48
> flush_lsn        | 16C7/BDE4BB48
> replay_lsn       | 16C7/BDE4BB48
> write_lag        | 00:00:00.002271
> flush_lag        | 00:00:00.002271
> replay_lag       | 00:00:00.002271
> sync_priority    | 0
> sync_state       | async
> reply_time       | 2025-01-23 17:34:39.901979+00
> -[ RECORD 2 ]----+------------------------------
> pid              | 3501
> usesysid         | 14604130
> usename          | archiving
> application_name | cle_clm_mka
> client_addr      | 10.80.0.168
> client_hostname  |
> client_port      | 55412
> backend_start    | 2025-01-22 09:31:11.83963+00
> backend_xmin     |
> state            | streaming
> sent_lsn         | 16C7/BDE4BB48
> write_lsn        | 16C7/BDE4BB48
> flush_lsn        | 16C7/BDE4BB48
> replay_lsn       | 16C7/BDE4BB48
> write_lag        | 00:00:00.001642
> flush_lag        | 00:00:00.023143
> replay_lag       | 00:00:00.001642
> sync_priority    | 0
> sync_state       | async
> reply_time       | 2025-01-23 17:34:39.903052+00

The lag times are minimal.

Where the queries done below done at later time then those above?

> Subscription Side : archiving=> select * from pg_stat_subscription where 
> subname = 'cls_eva_msa';
> -[ RECORD 1 ]---------+------------------------------
> subid                 | 1936652827
> subname               | cls_eva_msa
> pid                   | 18746
> relid                 |
> received_lsn          | 16C7/FB48DFE0
> last_msg_send_time    | 2025-01-23 17:41:11.924562+00
> last_msg_receipt_time | 2025-01-23 17:41:11.933344+00
> latest_end_lsn        | 16C7/FB48DFE0
> latest_end_time       | 2025-01-23 17:41:11.924562+00
> 
> archiving=> select * from pg_stat_subscription where subname = 
> 'cle_clm_mka';
> -[ RECORD 1 ]---------+------------------------------
> subid                 | 1892055116
> subname               | cle_clm_mka
> pid                   | 507
> relid                 |
> received_lsn          | 16C7/FB8CDF68
> last_msg_send_time    | 2025-01-23 17:41:17.375879+00
> last_msg_receipt_time | 2025-01-23 17:41:17.378932+00
> latest_end_lsn        | 16C7/FB8CDF68
> latest_end_time       | 2025-01-23 17:41:17.375879+00
> 

> '... everything looks good' is an opinion not actual data.
> Correct

So what does the AWS dashboard show for the I/0 between the servers?

> 
> Regards
> Durga Mahesh

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Records count mismatch with logical replication

От
Adrian Klaver
Дата:
On 1/23/25 10:21, Durgamahesh Manne wrote:

> At subscription side
> getting this error repeatedly  So Do we have any solution without resync 
> full data again from scratch
> 
>   * |2025-01-23 18:11:46 UTC::@:[507]:DEBUG: logical replication did not
>     find row to be updated in replication target relation's partition
>     "markets_archive_p20250118"|

That looks like a different issue.

On the subscriber side is someone/something changing the data?

Are there more log reports related to this at something above the DEBUG 
level?

I'm thinking this is related to from here:

https://www.postgresql.org/docs/current/logical-replication-conflicts.html

"When replicating UPDATE or DELETE operations, missing data will not 
produce a conflict and such operations will simply be skipped."

I don't have time at the moment to dig any further.

> 
>   Regards,
> Durga Mahesh

-- 
Adrian Klaver
adrian.klaver@aklaver.com