Обсуждение: Data Out of Sync with Physical Streaming Replication
Hi All,
We have run into an issue where the data for a primary and standby Postgres instances are out of sync. We are using streaming physical replication. We have encountered this issue several times over the last year on different Postgres Clusters. Postgres indicates everything is in sync but we can clearly see that is not the case.
The essentials:
PG Verion: 14.7
OS: Rocky Linux 8.10 -> Linux 4.18.0-553.30.1.el8_10.x86_64
Server: VMWare VM
Storage: Enterprise level SSD
We are starting to push Postgres 17 but it will be a while before we upgrade our existing PG 14 DB’s, therefore, I need to figure out why this occurs in PG 14. I plan on at a minimum, upgrading to PG 14.15.
I have a cluster setup to show this. We are using the default asynchronous replication. The primary and standby are in the same data center. I haven’t been able to reproduce this on demand. Initially, I thought it might be a storage issue, but we are not seeing any errors in dmesg, kernel log, journalctl, or on the storage controller.
I’m hoping someone has seen something similar and has some guidance on further troubleshooting this.
On the PRIMARY, we can see the replication slot is running:
-- DB is in read-write mode:
postgres=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
f
-- We have single replication slot that shows the lsn written, sent, flushed, and replayed is all the same:
postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid | 2457681
usesysid | 16384
usename | xxxxxxxxxx
application_name | xxxxxxxxxxxxxxxx
client_addr | yyy.yyy.yyy.yyy
client_hostname |
client_port | 58522
backend_start | 2025-01-15 11:03:26.99902-06
backend_xmin |
state | streaming
sent_lsn | 0/1E0001B8
write_lsn | 0/1E0001B8
flush_lsn | 0/1E0001B8
replay_lsn | 0/1E0001B8
write_lag |
flush_lag |
replay_lag |
sync_priority | 0
sync_state | async
reply_time | 2025-01-15 11:04:27.149521-06
-- Primary shows no database conflicts:
postgres=# select * from pg_stat_database_conflicts;
-[ RECORD 1 ]----+----------
datid | 1
datname | template1
confl_tablespace | 0
confl_lock | 0
confl_snapshot | 0
confl_bufferpin | 0
confl_deadlock | 0
-[ RECORD 2 ]----+----------
datid | 13747
datname | template0
confl_tablespace | 0
confl_lock | 0
confl_snapshot | 0
confl_bufferpin | 0
confl_deadlock | 0
-[ RECORD 3 ]----+----------
datid | 16390
datname | issuedb
confl_tablespace | 0
confl_lock | 0
confl_snapshot | 0
confl_bufferpin | 0
confl_deadlock | 0
-[ RECORD 4 ]----+----------
datid | 13748
datname | postgres
confl_tablespace | 0
confl_lock | 0
confl_snapshot | 0
confl_bufferpin | 0
confl_deadlock | 0
On the STANDBY server, we can see that there is no lag:
-- Instance is in recovery mode:
postgres=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
-- Received and replayed lsn are the same. Note, these also match the lsn from the primary.
postgres=# select pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn();
-[ RECORD 1 ]-----------+-----------
pg_last_wal_receive_lsn | 0/1E0001B8
pg_last_wal_replay_lsn | 0/1E0001B8
-- no conflicts:
postgres=# select * from pg_stat_database_conflicts;
-[ RECORD 1 ]----+----------
datid | 1
datname | template1
confl_tablespace | 0
confl_lock | 0
confl_snapshot | 0
confl_bufferpin | 0
confl_deadlock | 0
-[ RECORD 2 ]----+----------
datid | 13747
datname | template0
confl_tablespace | 0
confl_lock | 0
confl_snapshot | 0
confl_bufferpin | 0
confl_deadlock | 0
-[ RECORD 3 ]----+----------
datid | 16390
datname | issuedb
confl_tablespace | 0
confl_lock | 0
confl_snapshot | 0
confl_bufferpin | 0
confl_deadlock | 0
-[ RECORD 4 ]----+----------
datid | 13748
datname | postgres
confl_tablespace | 0
confl_lock | 0
confl_snapshot | 0
confl_bufferpin | 0
confl_deadlock | 0
I've performed a switchover multiple times meaning the standby has become the primary and the primary has become the standby.
Not all tables are out of sync. In this case, it is three of them:
PRIMARY:
issuedb=# select count(1) from issue;
count
-------
2
(1 row)
issuedb=# select count(1) from comment;
count
-------
1
(1 row)
issuedb=# select count(1) from troubleshoot_issue;
count
-------
2
(1 row)
STANDBY:
issuedb=# select count(1) from issue;
count
-------
3
(1 row)
issuedb=# select count(1) from comment;
count
-------
4
(1 row)
issuedb=# select count(1) from troubleshoot_issue;
count
-------
3
(1 row)
Both are on timeline 21:
PRIMARY:
postgres=# SELECT timeline_id FROM pg_control_checkpoint();
timeline_id
-------------
21
STANDBY:
postgres=# SELECT timeline_id FROM pg_control_checkpoint();
timeline_id
-------------
21
Archive mode is on. I ran pg_amcheck and everything came back clean. I know synchronous replication is an option, but it feels like something else is going on and would like to get to the bottom of it.
Any ideas on what could be causing this? I’m planning on turning on data_checksums and running pg_checksums to see if any corruption is found.
Thanks for the time.
Best Regards,
Tim
On 1/15/25 19:50, Tim Gerber wrote: > Hi All, > >... > > Archive mode is on. I ran pg_amcheck and everything came back clean. I > know synchronous replication is an option, but it feels like something > else is going on and would like to get to the bottom of it. > > Any ideas on what could be causing this? I’m planning on turning on > data_checksums and running pg_checksums to see if any corruption is found. > No idea. If the storage has some issues (which is the only thing data checksums could catch, in some cases), then anything is possible. I think it'd be interesting to compare the tuples returned on primary vs. standby, including system columns etc. Ideally using pageinspect. That might tell you what exactly is the difference, and perhaps also say which XID to look at. regards -- Tomas Vondra
Thanks for the thoughts, Tomas.
I suspected storage only because data was missing. Everything I've looked at is not showing any indication of this.
Here's the tuple data for each. Appears to look normal to me.
issuedb=#=# SELECT ctid, xmin, xmax, issue_id as issue_id from issue;
ctid | xmin | xmax | issue_id
-------+------+------+--------------------------------------
(0,1) | 3439 | 3443 | 0192d4c1-7220-7780-be76-e67f955a719a
(0,2) | 3439 | 3439 | 0192d4c1-7233-725c-a97a-0acc82fa2c2a
(0,5) | 3724 | 3724 | 01934472-b052-7ce2-9c7d-35a809bcb3f6
(3 rows)
issuedb=#=# SELECT t_xmin, t_xmax, tuple_data_split('issue'::regclass, t_data, t_infomask, t_infomask2, t_bits)
issuedb=#-# FROM heap_page_items(get_raw_page('issue', 0));
t_xmin | t_xmax | tuple_data_split
--------+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
3439 | 3443 | {"\\x0192d4c172207780be76e67f955a719a","\\x0d3534333231","\\x1b3031322d31313131312d3031","\\x01",NULL,NULL,"\\x01",NULL,NULL,NULL,"\\xbc6d6ed88dc80200","\\x1d62756b6174796e6963686f6c61",NULL,NULL}
3439 | 3439 | {"\\x0192d4c17233725ca97a0acc82fa2c2a","\\x0f363534333231","\\x1b3031322d31313131312d3032","\\x01",NULL,NULL,"\\x00",NULL,NULL,NULL,"\\xb5986ed88dc80200","\\x1d62756b6174796e6963686f6c61",NULL,NULL}
3698 | 3711 | {"\\x019342bde8217af59ebbb1315860c475","\\x0531","\\x0532","\\x01",NULL,NULL,"\\x00","\\x01",NULL,NULL,"\\xf5f19a7a3bca0200","\\x1767696c6c696c616e646a",NULL,NULL}
3717 | 3719 | {"\\x0193446fa7de79c6a6c4bc508ab31f30","\\x15334231323334353637","\\x1b3031322d30303030302d3030","\\x01",NULL,NULL,"\\x00","\\x01",NULL,NULL,"\\x66eeef1842ca0200","\\x1767696c6c696c616e646a",NULL,NULL}
3724 | 3724 | {"\\x01934472b0527ce29c7d35a809bcb3f6","\\x133141313233343536","\\x1b3031322d30303030302d3030","\\x01",NULL,NULL,"\\x00","\\x01",NULL,NULL,"\\x67f1c82442ca0200","\\x1767696c6c696c616e646a",NULL,NULL}
(5 rows)
STANDBY:
issuedb=# SELECT ctid, xmin, xmax, issue_id as issue_id from issue;
-------+------+------+--------------------------------------
(0,1) | 3439 | 3443 | 0192d4c1-7220-7780-be76-e67f955a719a
(0,2) | 3439 | 3439 | 0192d4c1-7233-725c-a97a-0acc82fa2c2a
(2 rows)
issuedb=# SELECT t_xmin, t_xmax, tuple_data_split('issue'::regclass, t_data, t_infomask, t_infomask2, t_bits)
issuedb-# FROM heap_page_items(get_raw_page('issue', 0));
t_xmin | t_xmax | tuple_data_split
--------+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
3439 | 3443 | {"\\x0192d4c172207780be76e67f955a719a","\\x0d3534333231","\\x1b3031322d31313131312d3031","\\x01",NULL,NULL,"\\x01",NULL,NULL,NULL,"\\xbc6d6ed88dc80200","\\x1d62756b6174796e6963686f6c61",NULL,NULL}
3439 | 3439 | {"\\x0192d4c17233725ca97a0acc82fa2c2a","\\x0f363534333231","\\x1b3031322d31313131312d3032","\\x01",NULL,NULL,"\\x00",NULL,NULL,NULL,"\\xb5986ed88dc80200","\\x1d62756b6174796e6963686f6c61",NULL,NULL}
(2 rows)
ctid | xmin | xmax | comment_id
-------+------+------+--------------------------------------
(0,1) | 3448 | 0 | 0192d528-f0f8-7ab9-b709-035bf31b5a0c
(0,2) | 3737 | 0 | 019345d3-7451-7483-a8e4-ec60a4c12442
(0,3) | 3738 | 0 | 019345d4-2a00-7e18-874c-1c8b9e4f36f9
(0,4) | 3739 | 0 | 019345d5-2c5a-7aac-8d58-614272fe0e60
(4 rows)
issuedb=# SELECT t_xmin, t_xmax, tuple_data_split('comment'::regclass, t_data, t_infomask, t_infomask2, t_bits)
issuedb-# FROM heap_page_items(get_raw_page('comment', 0));
t_xmin | t_xmax | tuple_data_split
--------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------
3448 | 0 | {"\\x0192d528f0f87ab9b709035bf31b5a0c","\\x2554686973206973206120636f6d6d656e74","\\x4b30313932633535612d646533372d373064642d626263312d353337366232346130373036","\\x0b554e4954","\\x1d62756b6174796e6963686f6c61","
\\xc1acb56c8fc80200",NULL}
3737 | 0 | {"\\x019345d374517483a8e4ec60a4c12442","\\x3954686973206973206120646966666572656e7420636f6d6d656e74","\\x4b30313932633535612d646533372d373064642d626263312d353337366232346130373036","\\x0b554e4954","\\x1d62756b617
4796e6963686f6c61","\\x5a8ac68647ca0200",NULL}
3738 | 0 | {"\\x019345d42a007e18874c1c8b9e4f36f9","\\x3d54686973206973206120636f6d6d656e74206f6e206120726570616972","\\x4b30313932643463312d373232302d373738302d626537362d653637663935356137313961","\\x21415353454d424c595f524
550414952","\\x1d62756b6174796e6963686f6c61","\\x52328c8947ca0200",NULL}
3739 | 0 | {"\\x019345d52c5a7aac8d58614272fe0e60","\\x4954686973206973206120636f6d6d656e74206f6e20612074726f75626c6573686f6f74","\\x4b30313932643364332d313131662d376330302d623031382d366161636639363739386336","\\x1b54524f554
24c4553484f4f54","\\x1d62756b6174796e6963686f6c61","\\xa76c7d8d47ca0200",NULL}
(4 rows)
STANDBY:
ctid | xmin | xmax | comment_id
-------+------+------+--------------------------------------
(0,1) | 3448 | 0 | 0192d528-f0f8-7ab9-b709-035bf31b5a0c
(1 row)
issuedb=# SELECT t_xmin, t_xmax, tuple_data_split('comment'::regclass, t_data, t_infomask, t_infomask2, t_bits)
issuedb-# FROM heap_page_items(get_raw_page('comment', 0));
t_xmin | t_xmax | tuple_data_split
--------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------
3448 | 0 | {"\\x0192d528f0f87ab9b709035bf31b5a0c","\\x2554686973206973206120636f6d6d656e74","\\x4b30313932633535612d646533372d373064642d626263312d353337366232346130373036","\\x0b554e4954","\\x1d62756b6174796e6963686f6c61","
\\xc1acb56c8fc80200",NULL}
(1 row)
-------+------+------+--------------------------------------
(0,1) | 3439 | 0 | d80a394a-9568-11ef-90aa-5bb5a149f782
(0,2) | 3439 | 0 | d80a5dda-9568-11ef-90aa-5b4cba555710
(0,5) | 3724 | 0 | d38a3624-a673-11ef-bb77-c3e784e70250
(3 rows)
issuedb=# SELECT t_xmin, t_xmax, tuple_data_split('troubleshoot_issue'::regclass, t_data, t_infomask, t_infomask2, t_bits)
issuedb-# FROM heap_page_items(get_raw_page('troubleshoot_issue', 0));
t_xmin | t_xmax | tuple_data_split
--------+--------+---------------------------------------------------------------------------------------------------------------------
3439 | 0 | {"\\xd80a394a956811ef90aa5bb5a149f782","\\x0192d3d3111f7c00b0186aacf96798c6","\\x0192d4c172207780be76e67f955a719a"}
3439 | 0 | {"\\xd80a5dda956811ef90aa5b4cba555710","\\x0192d3d3111f7c00b0186aacf96798c6","\\x0192d4c17233725ca97a0acc82fa2c2a"}
3698 | 3711 | {"\\x2dbecb16a63111efa2a30f7da92c7fb4","\\x019342bdc3617d28ae08db06da805986","\\x019342bde8217af59ebbb1315860c475"}
3717 | 3719 | {"\\x5d106810a67311ef84c19fcba7f0ea8d","\\x0193446c0c297714a349af8702c16dd5","\\x0193446fa7de79c6a6c4bc508ab31f30"}
3724 | 0 | {"\\xd38a3624a67311efbb77c3e784e70250","\\x019344723f7778b6b34305157fb9a339","\\x01934472b0527ce29c7d35a809bcb3f6"}
(5 rows)
ctid | xmin | xmax | troubleshoot_issue_id
-------+------+------+--------------------------------------
(0,1) | 3439 | 0 | d80a394a-9568-11ef-90aa-5bb5a149f782
(0,2) | 3439 | 0 | d80a5dda-9568-11ef-90aa-5b4cba555710
(2 rows)
issuedb=# SELECT t_xmin, t_xmax, tuple_data_split('troubleshoot_issue'::regclass, t_data, t_infomask, t_infomask2, t_bits)
issuedb-# FROM heap_page_items(get_raw_page('troubleshoot_issue', 0));
t_xmin | t_xmax | tuple_data_split
--------+--------+---------------------------------------------------------------------------------------------------------------------
3439 | 0 | {"\\xd80a394a956811ef90aa5bb5a149f782","\\x0192d3d3111f7c00b0186aacf96798c6","\\x0192d4c172207780be76e67f955a719a"}
3439 | 0 | {"\\xd80a5dda956811ef90aa5b4cba555710","\\x0192d3d3111f7c00b0186aacf96798c6","\\x0192d4c17233725ca97a0acc82fa2c2a"}
(2 rows)
On 1/15/25 19:50, Tim Gerber wrote:
> Hi All,
>
>...
>
> Archive mode is on. I ran pg_amcheck and everything came back clean. I
> know synchronous replication is an option, but it feels like something
> else is going on and would like to get to the bottom of it.
>
> Any ideas on what could be causing this? I’m planning on turning on
> data_checksums and running pg_checksums to see if any corruption is found.
>
No idea. If the storage has some issues (which is the only thing data
checksums could catch, in some cases), then anything is possible.
I think it'd be interesting to compare the tuples returned on primary
vs. standby, including system columns etc. Ideally using pageinspect.
That might tell you what exactly is the difference, and perhaps also say
which XID to look at.
regards
--
Tomas Vondra