Обсуждение: replication consistency checking
In the MySQL world there is the percona-toolkit with pt-table-checksum that does this job:
https://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html
Thank you
After setting up streaming replication, is it possible to check whether the slave has the same data as the master?
On Thu, Jun 4, 2015 at 2:49 PM, hydra <hydrapolic@gmail.com> wrote:After setting up streaming replication, is it possible to check whether the slave has the same data as the master?Yes. There are quite a few ways to check the sync status between master and standby. Which version of PostgreSQL are you using by the way ?Generally, you can easily build some scripts and schedule them in cron to check the status. You can use functions or catalog views to monitor the status, this depends on the version of PG you are using.You can also use tools like "repmgr" to monitor and manage replication. I am not 100% sure how good or efficient it is.Regards,Venkata Balaji NFujitsu Australia
If you're using Streaming Replication, then the internal PostgreSQL code ensures consistency... but if you wanted to make sure that your standby is applying logs, then you can use something like:postgres@HOST2:~$ psql -c 'select pg_last_xlog_receive_location() "receive_location", pg_last_xlog_replay_location() "replay_location", pg_is_in_recovery() "recovery_status";'
receive_location | replay_location | recovery_status
------------------+-----------------+-----------------
7/68002388 | 7/68002388 | t
(1 row)
... or as suggested previously, use repmgr, which has a "repmgrd" daemon that keeps things monitored for you:postgres@HOST1:~$ psql repmgr repmgr -c 'select * from repmgr_test.repl_monitor;'
primary_node | standby_node | last_monitor_time | last_apply_time | last_wal_primary_location | last_wal_standby_location | replication_lag | apply_lag
--------------+--------------+-------------------------------+-------------------------------+---------------------------+---------------------------+-----------------+-----------
1 | 2 | 2015-06-04 09:09:37.920809+00 | 2015-06-04 09:09:37.712235+00 | 30/F85D7280 | 30/F85D7280 | 0 | 0
1 | 2 | 2015-06-04 09:09:40.45123+00 | 2015-06-04 09:09:38.251625+00 | 30/F85D73C0 | 30/F85D73C0 | 0 | 0
1 | 2 | 2015-06-04 09:09:42.966728+00 | 2015-06-04 09:09:40.75668+00 | 30/F85D74C0 | 30/F85D74C0 | 0 | 0
1 | 2 | 2015-06-04 09:09:45.480471+00 | 2015-06-04 09:09:43.273428+00 | 30/F85D75C0 | 30/F85D75C0 | 0 | 0
1 | 2 | 2015-06-04 09:09:47.997739+00 | 2015-06-04 09:09:45.785806+00 | 30/F85D76C0 | 30/F85D76C0 | 0 | 0
1 | 2 | 2015-06-04 09:09:50.515236+00 | 2015-06-04 09:09:48.303822+00 | 30/F85D77C0 | 30/F85D77C0 | 0 | 0
Greg Clough
Database Administrator | CWT Digital
t. 0845 456 0070 w. cwtdigital.com
Units A/B, Level 8 North, New England House,
New England Street, Brighton, East Sussex, BN1 4GHOn 4 June 2015 at 09:47, hydra <hydrapolic@gmail.com> wrote:On Thu, Jun 4, 2015 at 8:27 AM, Venkata Balaji N <nag1010@gmail.com> wrote:On Thu, Jun 4, 2015 at 2:49 PM, hydra <hydrapolic@gmail.com> wrote:After setting up streaming replication, is it possible to check whether the slave has the same data as the master?Yes. There are quite a few ways to check the sync status between master and standby. Which version of PostgreSQL are you using by the way ?Generally, you can easily build some scripts and schedule them in cron to check the status. You can use functions or catalog views to monitor the status, this depends on the version of PG you are using.You can also use tools like "repmgr" to monitor and manage replication. I am not 100% sure how good or efficient it is.Regards,Venkata Balaji NFujitsu AustraliaThanks for the answer, however I'm looking for a data consistency check.That means, you deploy your master standby replication, you're happily running it and after 3 months you decide to do a failover. How can one be sure the data are the same?I'm running PostgreSQL 9.4.
Thanks Greg, this looks nice.However my original question still remains. You know, every software has bugs, every bits and pieces can break, hardware can be misbehaving. Really, checking the data and counting the checksum is the only way to be sure.
On Thu, Jun 4, 2015 at 11:12 AM, Greg Clough <greg.clough@cwtdigital.co.uk> wrote:If you're using Streaming Replication, then the internal PostgreSQL code ensures consistency... but if you wanted to make sure that your standby is applying logs, then you can use something like:postgres@HOST2:~$ psql -c 'select pg_last_xlog_receive_location() "receive_location", pg_last_xlog_replay_location() "replay_location", pg_is_in_recovery() "recovery_status";'
receive_location | replay_location | recovery_status
------------------+-----------------+-----------------
7/68002388 | 7/68002388 | t
(1 row)
... or as suggested previously, use repmgr, which has a "repmgrd" daemon that keeps things monitored for you:postgres@HOST1:~$ psql repmgr repmgr -c 'select * from repmgr_test.repl_monitor;'
primary_node | standby_node | last_monitor_time | last_apply_time | last_wal_primary_location | last_wal_standby_location | replication_lag | apply_lag
--------------+--------------+-------------------------------+-------------------------------+---------------------------+---------------------------+-----------------+-----------
1 | 2 | 2015-06-04 09:09:37.920809+00 | 2015-06-04 09:09:37.712235+00 | 30/F85D7280 | 30/F85D7280 | 0 | 0
1 | 2 | 2015-06-04 09:09:40.45123+00 | 2015-06-04 09:09:38.251625+00 | 30/F85D73C0 | 30/F85D73C0 | 0 | 0
1 | 2 | 2015-06-04 09:09:42.966728+00 | 2015-06-04 09:09:40.75668+00 | 30/F85D74C0 | 30/F85D74C0 | 0 | 0
1 | 2 | 2015-06-04 09:09:45.480471+00 | 2015-06-04 09:09:43.273428+00 | 30/F85D75C0 | 30/F85D75C0 | 0 | 0
1 | 2 | 2015-06-04 09:09:47.997739+00 | 2015-06-04 09:09:45.785806+00 | 30/F85D76C0 | 30/F85D76C0 | 0 | 0
1 | 2 | 2015-06-04 09:09:50.515236+00 | 2015-06-04 09:09:48.303822+00 | 30/F85D77C0 | 30/F85D77C0 | 0 | 0
Greg Clough
Database Administrator | CWT Digital
t. 0845 456 0070 w. cwtdigital.com
Units A/B, Level 8 North, New England House,
New England Street, Brighton, East Sussex, BN1 4GHOn 4 June 2015 at 09:47, hydra <hydrapolic@gmail.com> wrote:On Thu, Jun 4, 2015 at 8:27 AM, Venkata Balaji N <nag1010@gmail.com> wrote:On Thu, Jun 4, 2015 at 2:49 PM, hydra <hydrapolic@gmail.com> wrote:After setting up streaming replication, is it possible to check whether the slave has the same data as the master?Yes. There are quite a few ways to check the sync status between master and standby. Which version of PostgreSQL are you using by the way ?Generally, you can easily build some scripts and schedule them in cron to check the status. You can use functions or catalog views to monitor the status, this depends on the version of PG you are using.You can also use tools like "repmgr" to monitor and manage replication. I am not 100% sure how good or efficient it is.Regards,Venkata Balaji NFujitsu AustraliaThanks for the answer, however I'm looking for a data consistency check.That means, you deploy your master standby replication, you're happily running it and after 3 months you decide to do a failover. How can one be sure the data are the same?I'm running PostgreSQL 9.4.
--
2015-06-04 13:03 GMT+02:00 hydra <hydrapolic@gmail.com>:Thanks Greg, this looks nice.However my original question still remains. You know, every software has bugs, every bits and pieces can break, hardware can be misbehaving. Really, checking the data and counting the checksum is the only way to be sure.There is no such tool available as far as I know. Writing one that does that should not be quite difficult. The main issue, AFAICT, would be to stop writing on both while you do the check. I know many users wouldn't be happy with this. And if you do not stop them from writing, you'll get quite a lot of false positives on a busy system.
On Thu, Jun 4, 2015 at 11:12 AM, Greg Clough <greg.clough@cwtdigital.co.uk> wrote:If you're using Streaming Replication, then the internal PostgreSQL code ensures consistency... but if you wanted to make sure that your standby is applying logs, then you can use something like:postgres@HOST2:~$ psql -c 'select pg_last_xlog_receive_location() "receive_location", pg_last_xlog_replay_location() "replay_location", pg_is_in_recovery() "recovery_status";'
receive_location | replay_location | recovery_status
------------------+-----------------+-----------------
7/68002388 | 7/68002388 | t
(1 row)
... or as suggested previously, use repmgr, which has a "repmgrd" daemon that keeps things monitored for you:postgres@HOST1:~$ psql repmgr repmgr -c 'select * from repmgr_test.repl_monitor;'
primary_node | standby_node | last_monitor_time | last_apply_time | last_wal_primary_location | last_wal_standby_location | replication_lag | apply_lag
--------------+--------------+-------------------------------+-------------------------------+---------------------------+---------------------------+-----------------+-----------
1 | 2 | 2015-06-04 09:09:37.920809+00 | 2015-06-04 09:09:37.712235+00 | 30/F85D7280 | 30/F85D7280 | 0 | 0
1 | 2 | 2015-06-04 09:09:40.45123+00 | 2015-06-04 09:09:38.251625+00 | 30/F85D73C0 | 30/F85D73C0 | 0 | 0
1 | 2 | 2015-06-04 09:09:42.966728+00 | 2015-06-04 09:09:40.75668+00 | 30/F85D74C0 | 30/F85D74C0 | 0 | 0
1 | 2 | 2015-06-04 09:09:45.480471+00 | 2015-06-04 09:09:43.273428+00 | 30/F85D75C0 | 30/F85D75C0 | 0 | 0
1 | 2 | 2015-06-04 09:09:47.997739+00 | 2015-06-04 09:09:45.785806+00 | 30/F85D76C0 | 30/F85D76C0 | 0 | 0
1 | 2 | 2015-06-04 09:09:50.515236+00 | 2015-06-04 09:09:48.303822+00 | 30/F85D77C0 | 30/F85D77C0 | 0 | 0
Greg Clough
Database Administrator | CWT Digital
t. 0845 456 0070 w. cwtdigital.com
Units A/B, Level 8 North, New England House,
New England Street, Brighton, East Sussex, BN1 4GHOn 4 June 2015 at 09:47, hydra <hydrapolic@gmail.com> wrote:On Thu, Jun 4, 2015 at 8:27 AM, Venkata Balaji N <nag1010@gmail.com> wrote:On Thu, Jun 4, 2015 at 2:49 PM, hydra <hydrapolic@gmail.com> wrote:After setting up streaming replication, is it possible to check whether the slave has the same data as the master?Yes. There are quite a few ways to check the sync status between master and standby. Which version of PostgreSQL are you using by the way ?Generally, you can easily build some scripts and schedule them in cron to check the status. You can use functions or catalog views to monitor the status, this depends on the version of PG you are using.You can also use tools like "repmgr" to monitor and manage replication. I am not 100% sure how good or efficient it is.Regards,Venkata Balaji NFujitsu AustraliaThanks for the answer, however I'm looking for a data consistency check.That means, you deploy your master standby replication, you're happily running it and after 3 months you decide to do a failover. How can one be sure the data are the same?I'm running PostgreSQL 9.4.
--
On Thu, Jun 4, 2015 at 1:41 PM, Guillaume Lelarge <guillaume@lelarge.info> wrote:2015-06-04 13:03 GMT+02:00 hydra <hydrapolic@gmail.com>:Thanks Greg, this looks nice.However my original question still remains. You know, every software has bugs, every bits and pieces can break, hardware can be misbehaving. Really, checking the data and counting the checksum is the only way to be sure.There is no such tool available as far as I know. Writing one that does that should not be quite difficult. The main issue, AFAICT, would be to stop writing on both while you do the check. I know many users wouldn't be happy with this. And if you do not stop them from writing, you'll get quite a lot of false positives on a busy system.I have little experience with PostgreSQL, but that is not true for MySQL. You can have your system running and doing consistency checking online.
On Thu, Jun 4, 2015 at 11:12 AM, Greg Clough <greg.clough@cwtdigital.co.uk> wrote:If you're using Streaming Replication, then the internal PostgreSQL code ensures consistency... but if you wanted to make sure that your standby is applying logs, then you can use something like:postgres@HOST2:~$ psql -c 'select pg_last_xlog_receive_location() "receive_location", pg_last_xlog_replay_location() "replay_location", pg_is_in_recovery() "recovery_status";'
receive_location | replay_location | recovery_status
------------------+-----------------+-----------------
7/68002388 | 7/68002388 | t
(1 row)
... or as suggested previously, use repmgr, which has a "repmgrd" daemon that keeps things monitored for you:postgres@HOST1:~$ psql repmgr repmgr -c 'select * from repmgr_test.repl_monitor;'
primary_node | standby_node | last_monitor_time | last_apply_time | last_wal_primary_location | last_wal_standby_location | replication_lag | apply_lag
--------------+--------------+-------------------------------+-------------------------------+---------------------------+---------------------------+-----------------+-----------
1 | 2 | 2015-06-04 09:09:37.920809+00 | 2015-06-04 09:09:37.712235+00 | 30/F85D7280 | 30/F85D7280 | 0 | 0
1 | 2 | 2015-06-04 09:09:40.45123+00 | 2015-06-04 09:09:38.251625+00 | 30/F85D73C0 | 30/F85D73C0 | 0 | 0
1 | 2 | 2015-06-04 09:09:42.966728+00 | 2015-06-04 09:09:40.75668+00 | 30/F85D74C0 | 30/F85D74C0 | 0 | 0
1 | 2 | 2015-06-04 09:09:45.480471+00 | 2015-06-04 09:09:43.273428+00 | 30/F85D75C0 | 30/F85D75C0 | 0 | 0
1 | 2 | 2015-06-04 09:09:47.997739+00 | 2015-06-04 09:09:45.785806+00 | 30/F85D76C0 | 30/F85D76C0 | 0 | 0
1 | 2 | 2015-06-04 09:09:50.515236+00 | 2015-06-04 09:09:48.303822+00 | 30/F85D77C0 | 30/F85D77C0 | 0 | 0
Greg Clough
Database Administrator | CWT Digital
t. 0845 456 0070 w. cwtdigital.com
Units A/B, Level 8 North, New England House,
New England Street, Brighton, East Sussex, BN1 4GHOn 4 June 2015 at 09:47, hydra <hydrapolic@gmail.com> wrote:On Thu, Jun 4, 2015 at 8:27 AM, Venkata Balaji N <nag1010@gmail.com> wrote:On Thu, Jun 4, 2015 at 2:49 PM, hydra <hydrapolic@gmail.com> wrote:After setting up streaming replication, is it possible to check whether the slave has the same data as the master?Yes. There are quite a few ways to check the sync status between master and standby. Which version of PostgreSQL are you using by the way ?Generally, you can easily build some scripts and schedule them in cron to check the status. You can use functions or catalog views to monitor the status, this depends on the version of PG you are using.You can also use tools like "repmgr" to monitor and manage replication. I am not 100% sure how good or efficient it is.Regards,Venkata Balaji NFujitsu AustraliaThanks for the answer, however I'm looking for a data consistency check.That means, you deploy your master standby replication, you're happily running it and after 3 months you decide to do a failover. How can one be sure the data are the same?I'm running PostgreSQL 9.4.
--
--
On Thu, Jun 4, 2015 at 01:03:00PM +0200, hydra wrote: > Thanks Greg, this looks nice. > > However my original question still remains. You know, every software has bugs, > every bits and pieces can break, hardware can be misbehaving. Really, checking > the data and counting the checksum is the only way to be sure. I believe MySQL needed such a tool because it had known replication synchronization problems --- Postgres does not, so has no such tool. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Thu, Jun 4, 2015 at 01:03:00PM +0200, hydra wrote:
> Thanks Greg, this looks nice.
>
> However my original question still remains. You know, every software has bugs,
> every bits and pieces can break, hardware can be misbehaving. Really, checking
> the data and counting the checksum is the only way to be sure.
I believe MySQL needed such a tool because it had known replication
synchronization problems --- Postgres does not, so has no such tool.
> I believe MySQL needed such a tool because it had known replication > synchronization problems --- Postgres does not, so has no such tool. I agree with Bruce here. In DB2, for HADR (similar to PG replication) there is no tool which checks whether primary and standby are same so that there is no surprise after a failover. I am pretty same is true for Oracle Dataguard. If a RDBMS is offering a tool to check for consistency, there is a high likelihood it is because of inherent problems.
postgres@HOST2:~$ psql -c 'select pg_last_xlog_receive_location() "receive_location", pg_last_xlog_replay_location() "replay_location", pg_is_in_recovery() "recovery_status";'
receive_location | replay_location | recovery_status
------------------+-----------------+-----------------
7/68002388 | 7/68002388 | t
(1 row)
postgres@HOST1:~$ psql repmgr repmgr -c 'select * from repmgr_test.repl_monitor;'
primary_node | standby_node | last_monitor_time | last_apply_time | last_wal_primary_location | last_wal_standby_location | replication_lag | apply_lag
--------------+--------------+-------------------------------+-------------------------------+---------------------------+---------------------------+-----------------+-----------
1 | 2 | 2015-06-04 09:09:37.920809+00 | 2015-06-04 09:09:37.712235+00 | 30/F85D7280 | 30/F85D7280 | 0 | 0
1 | 2 | 2015-06-04 09:09:40.45123+00 | 2015-06-04 09:09:38.251625+00 | 30/F85D73C0 | 30/F85D73C0 | 0 | 0
1 | 2 | 2015-06-04 09:09:42.966728+00 | 2015-06-04 09:09:40.75668+00 | 30/F85D74C0 | 30/F85D74C0 | 0 | 0
1 | 2 | 2015-06-04 09:09:45.480471+00 | 2015-06-04 09:09:43.273428+00 | 30/F85D75C0 | 30/F85D75C0 | 0 | 0
1 | 2 | 2015-06-04 09:09:47.997739+00 | 2015-06-04 09:09:45.785806+00 | 30/F85D76C0 | 30/F85D76C0 | 0 | 0
1 | 2 | 2015-06-04 09:09:50.515236+00 | 2015-06-04 09:09:48.303822+00 | 30/F85D77C0 | 30/F85D77C0 | 0 | 0
Greg Clough
Database Administrator | CWT Digital
t. 0845 456 0070 w. cwtdigital.com
Units A/B, Level 8 North, New England House,
New England Street, Brighton, East Sussex, BN1 4GH
On Thu, Jun 4, 2015 at 8:27 AM, Venkata Balaji N <nag1010@gmail.com> wrote:On Thu, Jun 4, 2015 at 2:49 PM, hydra <hydrapolic@gmail.com> wrote:After setting up streaming replication, is it possible to check whether the slave has the same data as the master?Yes. There are quite a few ways to check the sync status between master and standby. Which version of PostgreSQL are you using by the way ?Generally, you can easily build some scripts and schedule them in cron to check the status. You can use functions or catalog views to monitor the status, this depends on the version of PG you are using.You can also use tools like "repmgr" to monitor and manage replication. I am not 100% sure how good or efficient it is.Regards,Venkata Balaji NFujitsu AustraliaThanks for the answer, however I'm looking for a data consistency check.That means, you deploy your master standby replication, you're happily running it and after 3 months you decide to do a failover. How can one be sure the data are the same?I'm running PostgreSQL 9.4.
Вложения
On Wed, Jun 3, 2015 at 9:49 PM, hydra <hydrapolic@gmail.com> wrote: > After setting up streaming replication, is it possible to check whether the > slave has the same data as the master? > > In the MySQL world there is the percona-toolkit with pt-table-checksum that > does this job: > https://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html I believe you only need http://www.postgresql.org/docs/9.4/static/app-initdb.html#APP-INITDB-DATA-CHECKSUMS. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (499) 346-7196, +7 (988) 888-1979 gray.ru@gmail.com
I believe you only needOn Wed, Jun 3, 2015 at 9:49 PM, hydra <hydrapolic@gmail.com> wrote:
> After setting up streaming replication, is it possible to check whether the
> slave has the same data as the master?
>
> In the MySQL world there is the percona-toolkit with pt-table-checksum that
> does this job:
> https://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html
http://www.postgresql.org/docs/9.4/static/app-initdb.html#APP-INITDB-DATA-CHECKSUMS.
--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA
http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (499) 346-7196, +7 (988) 888-1979
gray.ru@gmail.com
while looking for replication information I found this:
http://thebuild.com/presentations/worst-day-fosdem-2014.pdf
https://wiki.postgresql.org/wiki/Nov2013ReplicationIssue
- are present on the master, but missing on the replica
- have been deleted on the master still appear to be visible on the replica
- have been updated, and their old versions appear alongside the new, updated versions on the replica
There is no known way to identify that the issue has affected a standby in the past but comparing the data from the primary with the standby.
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of hydra
Sent: Friday, June 05, 2015 12:33 AM
To: pgsql-admin
Subject: Re: [ADMIN] replication consistency checking
On Fri, Jun 5, 2015 at 4:40 AM, Sergey Konoplev <gray.ru@gmail.com> wrote:
On Wed, Jun 3, 2015 at 9:49 PM, hydra <hydrapolic@gmail.com> wrote:
> After setting up streaming replication, is it possible to check whether the
> slave has the same data as the master?
>
> In the MySQL world there is the percona-toolkit with pt-table-checksum that
> does this job:
> https://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.htmlI believe you only need
http://www.postgresql.org/docs/9.4/static/app-initdb.html#APP-INITDB-DATA-CHECKSUMS.
--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA
http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (499) 346-7196, +7 (988) 888-1979
gray.ru@gmail.com
Thank you all for replies,
while looking for replication information I found this:
http://thebuild.com/presentations/worst-day-fosdem-2014.pdf
It's a real life experience of hitting this replication bug:
https://wiki.postgresql.org/wiki/Nov2013ReplicationIssue
The primary symptom of this corruption is rows that:
- are present on the master, but missing on the replica
- have been deleted on the master still appear to be visible on the replica
- have been updated, and their old versions appear alongside the new, updated versions on the replica
How can I verify whether I already have this corruption?
There is no known way to identify that the issue has affected a standby in the past but comparing the data from the primary with the standby.
So hands up who still think PostgreSQL doesn't need some way of checking the data consistency between master-standby? :)
My hand is up.
From the wiki page that you referenced you forgot to quote this:
“This is an issue, discovered Nov. 18, 2013., which can cause data corruption on a Hot-Standby replica when it is (re-)started, by marking committed transactions as uncommitted. This issue is fixed in the December 5th 2013 update releases.”
So, what’s your point? Yes, any software can (and does) have bugs. What’s important is how quickly the bug is discovered and fixed.
You wish to write a utility that compares data on 2 different clusters – sure, by all means, but I believe time could be better spent on something else.
But, how is to say that this utility will be 100% error-free?
Regards,
Igor Neyman
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of hydra
Sent: Friday, June 05, 2015 12:33 AM
To: pgsql-admin
Subject: Re: [ADMIN] replication consistency checking
On Fri, Jun 5, 2015 at 4:40 AM, Sergey Konoplev <gray.ru@gmail.com> wrote:
On Wed, Jun 3, 2015 at 9:49 PM, hydra <hydrapolic@gmail.com> wrote:
> After setting up streaming replication, is it possible to check whether the
> slave has the same data as the master?
>
> In the MySQL world there is the percona-toolkit with pt-table-checksum that
> does this job:
> https://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.htmlI believe you only need
http://www.postgresql.org/docs/9.4/static/app-initdb.html#APP-INITDB-DATA-CHECKSUMS.
--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA
http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (499) 346-7196, +7 (988) 888-1979
gray.ru@gmail.com
Thank you all for replies,
while looking for replication information I found this:
http://thebuild.com/presentations/worst-day-fosdem-2014.pdfIt's a real life experience of hitting this replication bug:
https://wiki.postgresql.org/wiki/Nov2013ReplicationIssueThe primary symptom of this corruption is rows that:
- are present on the master, but missing on the replica
- have been deleted on the master still appear to be visible on the replica
- have been updated, and their old versions appear alongside the new, updated versions on the replica
How can I verify whether I already have this corruption?
There is no known way to identify that the issue has affected a standby in the past but comparing the data from the primary with the standby.
So hands up who still think PostgreSQL doesn't need some way of checking the data consistency between master-standby? :)
My hand is up.
From the wiki page that you referenced you forgot to quote this:
“This is an issue, discovered Nov. 18, 2013., which can cause data corruption on a Hot-Standby replica when it is (re-)started, by marking committed transactions as uncommitted. This issue is fixed in the December 5th 2013 update releases.”
So, what’s your point? Yes, any software can (and does) have bugs. What’s important is how quickly the bug is discovered and fixed.
You wish to write a utility that compares data on 2 different clusters – sure, by all means, but I believe time could be better spent on something else.
But, how is to say that this utility will be 100% error-free?
Regards,
Igor Neyman
My point was to show that such a tool would be beneficial even for PostgreSQL. Yes, even the checking utility can have bugs, but at least you are better than now. Because now everybody relies on the fact that it should be ok. But as we see from the bug report, such problems exist here also. So from the situation where you believe everything should be ok you would have a tool reporting "checksums ok" and would allow your to trust your standby data more.
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of hydra
Sent: Friday, June 05, 2015 10:25 AM
To: pgsql-admin
Subject: Re: [ADMIN] replication consistency checking
On Fri, Jun 5, 2015 at 3:42 PM, Igor Neyman <ineyman@perceptron.com> wrote:
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of hydra
Sent: Friday, June 05, 2015 12:33 AM
To: pgsql-admin
Subject: Re: [ADMIN] replication consistency checking
On Fri, Jun 5, 2015 at 4:40 AM, Sergey Konoplev <gray.ru@gmail.com> wrote:
On Wed, Jun 3, 2015 at 9:49 PM, hydra <hydrapolic@gmail.com> wrote:
> After setting up streaming replication, is it possible to check whether the
> slave has the same data as the master?
>
> In the MySQL world there is the percona-toolkit with pt-table-checksum that
> does this job:
> https://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.htmlI believe you only need
http://www.postgresql.org/docs/9.4/static/app-initdb.html#APP-INITDB-DATA-CHECKSUMS.
--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA
http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (499) 346-7196, +7 (988) 888-1979
gray.ru@gmail.com
Thank you all for replies,
while looking for replication information I found this:
http://thebuild.com/presentations/worst-day-fosdem-2014.pdfIt's a real life experience of hitting this replication bug:
https://wiki.postgresql.org/wiki/Nov2013ReplicationIssueThe primary symptom of this corruption is rows that:
- are present on the master, but missing on the replica
- have been deleted on the master still appear to be visible on the replica
- have been updated, and their old versions appear alongside the new, updated versions on the replica
How can I verify whether I already have this corruption?
There is no known way to identify that the issue has affected a standby in the past but comparing the data from the primary with the standby.
So hands up who still think PostgreSQL doesn't need some way of checking the data consistency between master-standby? :)
My hand is up.
From the wiki page that you referenced you forgot to quote this:
“This is an issue, discovered Nov. 18, 2013., which can cause data corruption on a Hot-Standby replica when it is (re-)started, by marking committed transactions as uncommitted. This issue is fixed in the December 5th 2013 update releases.”
So, what’s your point? Yes, any software can (and does) have bugs. What’s important is how quickly the bug is discovered and fixed.
You wish to write a utility that compares data on 2 different clusters – sure, by all means, but I believe time could be better spent on something else.
But, how is to say that this utility will be 100% error-free?
Regards,
Igor Neyman
Hello Igor,
no code is without bugs, yes, we are humans. That is why I'm so shocked that everybody blindly trusts the replication (which is code again, plus network and hardware).
My point was to show that such a tool would be beneficial even for PostgreSQL. Yes, even the checking utility can have bugs, but at least you are better than now. Because now everybody relies on the fact that it should be ok. But as we see from the bug report, such problems exist here also. So from the situation where you believe everything should be ok you would have a tool reporting "checksums ok" and would allow your to trust your standby data more.
The problem I see with “checksum utility” is that for it to work both compared servers should be “static”: not transactions while it does its job.
Igor Neyman
On Jun 5, 2015, at 8:42 AM, Igor Neyman <ineyman@perceptron.com> wrote: > > The problem I see with “checksum utility” is that for it to work both compared servers should be “static”: not transactionswhile it does its job. Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to comparedatabases without interfering with updates? One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same snapshot. (Whichwould require repeatable reads.) But for an external utility, that's quite a bit harder. One suggestion which would involve minimal interruption to processing:if you have a snapshottable file system, shut down master, take snapshot, shut down replica, bring master backup, snapshot replica, bring it back up. You *still* have the issue of making sure that at the moment you take masterdown all changes have been streamed to replica, and the (easier) issue of making sure replica has applied them beforetaking its snapshot... But if you can manage to pull that off, then you can checksum & compare as much as you want,*IF* you can actually pull that off correctly ;-) -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ https://www.linkedin.com/in/scottribe/ (303) 722-0567 voice
On Jun 5, 2015, at 8:42 AM, Igor Neyman <ineyman@perceptron.com> wrote:
>
> The problem I see with “checksum utility” is that for it to work both compared servers should be “static”: not transactions while it does its job.
Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to compare databases without interfering with updates?
One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same snapshot. (Which would require repeatable reads.)
But for an external utility, that's quite a bit harder. One suggestion which would involve minimal interruption to processing: if you have a snapshottable file system, shut down master, take snapshot, shut down replica, bring master back up, snapshot replica, bring it back up. You *still* have the issue of making sure that at the moment you take master down all changes have been streamed to replica, and the (easier) issue of making sure replica has applied them before taking its snapshot... But if you can manage to pull that off, then you can checksum & compare as much as you want, *IF* you can actually pull that off correctly ;-)
Hello,
I wasn't talking about a static check, indeed I was referring to an online tool.
I haven't read the sources for the MySQL tool yet, however it computes the checksum on the master, writes the operation to the binary log and while using statement replication, the slave computes the checksum - then those can be compared. Not all data in the table are checksummed at once, but smaller chunks are used instead.
As Igor mentioned before, that tool can also fail and is not 100%. But I suppose if it would be misbehaving, chances are the checksums will differ and you will notice it. It will probably not by accident compute the same crc/md5 whatever.
I have done some tests with it, comparing data after setting up replication, all went fine, then I changed some integer on the slave, it really computed different checkum as expected. I also did a logical dump compare, it really seems to be doing its job fine.
--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
https://www.linkedin.com/in/scottribe/
(303) 722-0567 voice
On Fri, Jun 5, 2015 at 4:56 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:On Jun 5, 2015, at 8:42 AM, Igor Neyman <ineyman@perceptron.com> wrote:
>
> The problem I see with “checksum utility” is that for it to work both compared servers should be “static”: not transactions while it does its job.
Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to compare databases without interfering with updates?
One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same snapshot. (Which would require repeatable reads.)
But for an external utility, that's quite a bit harder. One suggestion which would involve minimal interruption to processing: if you have a snapshottable file system, shut down master, take snapshot, shut down replica, bring master back up, snapshot replica, bring it back up. You *still* have the issue of making sure that at the moment you take master down all changes have been streamed to replica, and the (easier) issue of making sure replica has applied them before taking its snapshot... But if you can manage to pull that off, then you can checksum & compare as much as you want, *IF* you can actually pull that off correctly ;-)
Hello,
I wasn't talking about a static check, indeed I was referring to an online tool.
I haven't read the sources for the MySQL tool yet, however it computes the checksum on the master, writes the operation to the binary log and while using statement replication, the slave computes the checksum - then those can be compared. Not all data in the table are checksummed at once, but smaller chunks are used instead.
As Igor mentioned before, that tool can also fail and is not 100%. But I suppose if it would be misbehaving, chances are the checksums will differ and you will notice it. It will probably not by accident compute the same crc/md5 whatever.
I have done some tests with it, comparing data after setting up replication, all went fine, then I changed some integer on the slave, it really computed different checkum as expected. I also did a logical dump compare, it really seems to be doing its job fine.
--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
https://www.linkedin.com/in/scottribe/
(303) 722-0567 voice
Ang Wei Shan
9 июня 2015 г., в 7:06, Wei Shan <weishan.ang@gmail.com> написал(а):Hi,Usually, for failover, we don't do data consistency check. because it's unplanned failover! In my opinion, it's quite safe to trust pg_stat_replication because it's a binary level replication so it's hard to get row missing unless it's a bug. You got higher chance of getting corruption. However, for switchover, we will do the following;1. Checking pg_stat_replication views2. Have a script that do a row count on all the data tables for a period of time. We should see that the 2 database have have very similar row count and maybe a slight delay of 1s?
I know that this is not perfect, but at least it's something to start with. The only tool out that that's doing this is Oracle VeriData but I'm not too sure how they are doing this.Cheers.On 6 June 2015 at 12:43, hydra <hydrapolic@gmail.com> wrote:On Fri, Jun 5, 2015 at 4:56 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:On Jun 5, 2015, at 8:42 AM, Igor Neyman <ineyman@perceptron.com> wrote:
>
> The problem I see with “checksum utility” is that for it to work both compared servers should be “static”: not transactions while it does its job.
Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to compare databases without interfering with updates?
One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same snapshot. (Which would require repeatable reads.)
But for an external utility, that's quite a bit harder. One suggestion which would involve minimal interruption to processing: if you have a snapshottable file system, shut down master, take snapshot, shut down replica, bring master back up, snapshot replica, bring it back up. You *still* have the issue of making sure that at the moment you take master down all changes have been streamed to replica, and the (easier) issue of making sure replica has applied them before taking its snapshot... But if you can manage to pull that off, then you can checksum & compare as much as you want, *IF* you can actually pull that off correctly ;-)
Hello,
I wasn't talking about a static check, indeed I was referring to an online tool.
I haven't read the sources for the MySQL tool yet, however it computes the checksum on the master, writes the operation to the binary log and while using statement replication, the slave computes the checksum - then those can be compared. Not all data in the table are checksummed at once, but smaller chunks are used instead.
As Igor mentioned before, that tool can also fail and is not 100%. But I suppose if it would be misbehaving, chances are the checksums will differ and you will notice it. It will probably not by accident compute the same crc/md5 whatever.
I have done some tests with it, comparing data after setting up replication, all went fine, then I changed some integer on the slave, it really computed different checkum as expected. I also did a logical dump compare, it really seems to be doing its job fine.
--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
https://www.linkedin.com/in/scottribe/
(303) 722-0567 voice--Regards,
Ang Wei Shan
Hi,Usually, for failover, we don't do data consistency check. because it's unplanned failover! In my opinion, it's quite safe to trust pg_stat_replication because it's a binary level replication so it's hard to get row missing unless it's a bug. You got higher chance of getting corruption. However, for switchover, we will do the following;1. Checking pg_stat_replication views2. Have a script that do a row count on all the data tables for a period of time. We should see that the 2 database have have very similar row count and maybe a slight delay of 1s?I know that this is not perfect, but at least it's something to start with. The only tool out that that's doing this is Oracle VeriData but I'm not too sure how they are doing this.Cheers.
--On 6 June 2015 at 12:43, hydra <hydrapolic@gmail.com> wrote:On Fri, Jun 5, 2015 at 4:56 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:On Jun 5, 2015, at 8:42 AM, Igor Neyman <ineyman@perceptron.com> wrote:
>
> The problem I see with “checksum utility” is that for it to work both compared servers should be “static”: not transactions while it does its job.
Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to compare databases without interfering with updates?
One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same snapshot. (Which would require repeatable reads.)
But for an external utility, that's quite a bit harder. One suggestion which would involve minimal interruption to processing: if you have a snapshottable file system, shut down master, take snapshot, shut down replica, bring master back up, snapshot replica, bring it back up. You *still* have the issue of making sure that at the moment you take master down all changes have been streamed to replica, and the (easier) issue of making sure replica has applied them before taking its snapshot... But if you can manage to pull that off, then you can checksum & compare as much as you want, *IF* you can actually pull that off correctly ;-)
Hello,
I wasn't talking about a static check, indeed I was referring to an online tool.
I haven't read the sources for the MySQL tool yet, however it computes the checksum on the master, writes the operation to the binary log and while using statement replication, the slave computes the checksum - then those can be compared. Not all data in the table are checksummed at once, but smaller chunks are used instead.
As Igor mentioned before, that tool can also fail and is not 100%. But I suppose if it would be misbehaving, chances are the checksums will differ and you will notice it. It will probably not by accident compute the same crc/md5 whatever.
I have done some tests with it, comparing data after setting up replication, all went fine, then I changed some integer on the slave, it really computed different checkum as expected. I also did a logical dump compare, it really seems to be doing its job fine.
--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
https://www.linkedin.com/in/scottribe/
(303) 722-0567 voiceRegards,
Ang Wei Shan
On Fri, Jun 5, 2015 at 4:56 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:On Jun 5, 2015, at 8:42 AM, Igor Neyman <ineyman@perceptron.com> wrote:
>
> The problem I see with “checksum utility” is that for it to work both compared servers should be “static”: not transactions while it does its job.
Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to compare databases without interfering with updates?
One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same snapshot. (Which would require repeatable reads.)
But for an external utility, that's quite a bit harder. One suggestion which would involve minimal interruption to processing: if you have a snapshottable file system, shut down master, take snapshot, shut down replica, bring master back up, snapshot replica, bring it back up. You *still* have the issue of making sure that at the moment you take master down all changes have been streamed to replica, and the (easier) issue of making sure replica has applied them before taking its snapshot... But if you can manage to pull that off, then you can checksum & compare as much as you want, *IF* you can actually pull that off correctly ;-)
Hello,
I wasn't talking about a static check, indeed I was referring to an online tool.
I haven't read the sources for the MySQL tool yet, however it computes the checksum on the master, writes the operation to the binary log and while using statement replication, the slave computes the checksum - then those can be compared. Not all data in the table are checksummed at once, but smaller chunks are used instead.
As Igor mentioned before, that tool can also fail and is not 100%. But I suppose if it would be misbehaving, chances are the checksums will differ and you will notice it. It will probably not by accident compute the same crc/md5 whatever.
I have done some tests with it, comparing data after setting up replication, all went fine, then I changed some integer on the slave, it really computed different checkum as expected. I also did a logical dump compare, it really seems to be doing its job fine.
> Am 05.06.2015 um 16:56 schrieb Scott Ribe <scott_ribe@elevated-dev.com>: > >> On Jun 5, 2015, at 8:42 AM, Igor Neyman <ineyman@perceptron.com> wrote: >> >> The problem I see with “checksum utility” is that for it to work both compared servers should be “static”: not transactionswhile it does its job. > > Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to comparedatabases without interfering with updates? > Also, if I remember the Postgres SR bug correctly, this kind of check that Percona provides would not have helped with thiskind of bug. The corruption did not occur *during* replication but only if you restarted the slave because transactionswere falsely marked as commited or non-commited when the slave came up again. You might have noticed the corruptionearlier, though. > One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same snapshot.(Which would require repeatable reads.) I actually think this would a need thing to have (for pre-production) test environments, like alpha or beta testing. Jan
After setting up streaming replication, is it possible to check whether the slave has the same data as the master?
In the MySQL world there is the percona-toolkit with pt-table-checksum that does this job:
https://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html
Thank you
InnoDB
tables and the transaction isolation level is READ COMMITTED
or READ UNCOMMITTED
, only row-based logging can be used. On Tue, Jun 9, 2015 at 6:06 AM, Wei Shan <weishan.ang@gmail.com> wrote:Hi,Usually, for failover, we don't do data consistency check. because it's unplanned failover! In my opinion, it's quite safe to trust pg_stat_replication because it's a binary level replication so it's hard to get row missing unless it's a bug. You got higher chance of getting corruption. However, for switchover, we will do the following;1. Checking pg_stat_replication views2. Have a script that do a row count on all the data tables for a period of time. We should see that the 2 database have have very similar row count and maybe a slight delay of 1s?I know that this is not perfect, but at least it's something to start with. The only tool out that that's doing this is Oracle VeriData but I'm not too sure how they are doing this.Cheers.Hello,there are situation you wish to do a failover and not just because it's unplanned. For example when migrating to a better hardware.
--On 6 June 2015 at 12:43, hydra <hydrapolic@gmail.com> wrote:On Fri, Jun 5, 2015 at 4:56 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:On Jun 5, 2015, at 8:42 AM, Igor Neyman <ineyman@perceptron.com> wrote:
>
> The problem I see with “checksum utility” is that for it to work both compared servers should be “static”: not transactions while it does its job.
Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to compare databases without interfering with updates?
One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same snapshot. (Which would require repeatable reads.)
But for an external utility, that's quite a bit harder. One suggestion which would involve minimal interruption to processing: if you have a snapshottable file system, shut down master, take snapshot, shut down replica, bring master back up, snapshot replica, bring it back up. You *still* have the issue of making sure that at the moment you take master down all changes have been streamed to replica, and the (easier) issue of making sure replica has applied them before taking its snapshot... But if you can manage to pull that off, then you can checksum & compare as much as you want, *IF* you can actually pull that off correctly ;-)
Hello,
I wasn't talking about a static check, indeed I was referring to an online tool.
I haven't read the sources for the MySQL tool yet, however it computes the checksum on the master, writes the operation to the binary log and while using statement replication, the slave computes the checksum - then those can be compared. Not all data in the table are checksummed at once, but smaller chunks are used instead.
As Igor mentioned before, that tool can also fail and is not 100%. But I suppose if it would be misbehaving, chances are the checksums will differ and you will notice it. It will probably not by accident compute the same crc/md5 whatever.
I have done some tests with it, comparing data after setting up replication, all went fine, then I changed some integer on the slave, it really computed different checkum as expected. I also did a logical dump compare, it really seems to be doing its job fine.
--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
https://www.linkedin.com/in/scottribe/
(303) 722-0567 voiceRegards,
Ang Wei Shan
Ang Wei Shan
I am entering this discussion a bit late, so maybe I am missing the point. But SR is using xlog and there is a crc32 checksum on each xlog record. So why would you need to compare the whole thing again when each record has been approved during replication ?On Fri, Jun 5, 2015 at 4:56 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:On Jun 5, 2015, at 8:42 AM, Igor Neyman <ineyman@perceptron.com> wrote:
>
> The problem I see with “checksum utility” is that for it to work both compared servers should be “static”: not transactions while it does its job.
Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to compare databases without interfering with updates?
One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same snapshot. (Which would require repeatable reads.)
But for an external utility, that's quite a bit harder. One suggestion which would involve minimal interruption to processing: if you have a snapshottable file system, shut down master, take snapshot, shut down replica, bring master back up, snapshot replica, bring it back up. You *still* have the issue of making sure that at the moment you take master down all changes have been streamed to replica, and the (easier) issue of making sure replica has applied them before taking its snapshot... But if you can manage to pull that off, then you can checksum & compare as much as you want, *IF* you can actually pull that off correctly ;-)
Hello,
I wasn't talking about a static check, indeed I was referring to an online tool.
I haven't read the sources for the MySQL tool yet, however it computes the checksum on the master, writes the operation to the binary log and while using statement replication, the slave computes the checksum - then those can be compared. Not all data in the table are checksummed at once, but smaller chunks are used instead.
As Igor mentioned before, that tool can also fail and is not 100%. But I suppose if it would be misbehaving, chances are the checksums will differ and you will notice it. It will probably not by accident compute the same crc/md5 whatever.
I have done some tests with it, comparing data after setting up replication, all went fine, then I changed some integer on the slave, it really computed different checkum as expected. I also did a logical dump compare, it really seems to be doing its job fine.
> Am 05.06.2015 um 16:56 schrieb Scott Ribe <scott_ribe@elevated-dev.com>:
>
>> On Jun 5, 2015, at 8:42 AM, Igor Neyman <ineyman@perceptron.com> wrote:
>>
>> The problem I see with “checksum utility” is that for it to work both compared servers should be “static”: not transactions while it does its job.
>
> Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to compare databases without interfering with updates?
>
Also, if I remember the Postgres SR bug correctly, this kind of check that Percona provides would not have helped with this kind of bug. The corruption did not occur *during* replication but only if you restarted the slave because transactions were falsely marked as commited or non-commited when the slave came up again. You might have noticed the corruption earlier, though.
> One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same snapshot. (Which would require repeatable reads.)
I actually think this would a need thing to have (for pre-production) test environments, like alpha or beta testing.
Jan
After setting up streaming replication, is it possible to check whether the slave has the same data as the master?
In the MySQL world there is the percona-toolkit with pt-table-checksum that does this job:
https://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html
Thank youI have not digged to deep, but as far as I can see from your link:pt-table-checksum requires statement-based replication, and it sets binlog_format=STATEMENT on the masterAnd the mysql manual saysIf you are usingInnoDB
tables and the transaction isolation level isREAD COMMITTED
orREAD UNCOMMITTED
, only row-based logging can be used.So, if I am not misinterpreting, this plays in a different league.Jan
That's called switchover my friend...
So for you it's like this:
https://blog.mozilla.org/it/2013/12/16/upgrading-from-mysql-5-1-to-mariadb-5-5/
On 10 June 2015 at 02:57, hydra <hydrapolic@gmail.com> wrote:On Tue, Jun 9, 2015 at 6:06 AM, Wei Shan <weishan.ang@gmail.com> wrote:Hi,Usually, for failover, we don't do data consistency check. because it's unplanned failover! In my opinion, it's quite safe to trust pg_stat_replication because it's a binary level replication so it's hard to get row missing unless it's a bug. You got higher chance of getting corruption. However, for switchover, we will do the following;1. Checking pg_stat_replication views2. Have a script that do a row count on all the data tables for a period of time. We should see that the 2 database have have very similar row count and maybe a slight delay of 1s?I know that this is not perfect, but at least it's something to start with. The only tool out that that's doing this is Oracle VeriData but I'm not too sure how they are doing this.Cheers.Hello,there are situation you wish to do a failover and not just because it's unplanned. For example when migrating to a better hardware.
--On 6 June 2015 at 12:43, hydra <hydrapolic@gmail.com> wrote:On Fri, Jun 5, 2015 at 4:56 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:On Jun 5, 2015, at 8:42 AM, Igor Neyman <ineyman@perceptron.com> wrote:
>
> The problem I see with “checksum utility” is that for it to work both compared servers should be “static”: not transactions while it does its job.
Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to compare databases without interfering with updates?
One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same snapshot. (Which would require repeatable reads.)
But for an external utility, that's quite a bit harder. One suggestion which would involve minimal interruption to processing: if you have a snapshottable file system, shut down master, take snapshot, shut down replica, bring master back up, snapshot replica, bring it back up. You *still* have the issue of making sure that at the moment you take master down all changes have been streamed to replica, and the (easier) issue of making sure replica has applied them before taking its snapshot... But if you can manage to pull that off, then you can checksum & compare as much as you want, *IF* you can actually pull that off correctly ;-)
Hello,
I wasn't talking about a static check, indeed I was referring to an online tool.
I haven't read the sources for the MySQL tool yet, however it computes the checksum on the master, writes the operation to the binary log and while using statement replication, the slave computes the checksum - then those can be compared. Not all data in the table are checksummed at once, but smaller chunks are used instead.
As Igor mentioned before, that tool can also fail and is not 100%. But I suppose if it would be misbehaving, chances are the checksums will differ and you will notice it. It will probably not by accident compute the same crc/md5 whatever.
I have done some tests with it, comparing data after setting up replication, all went fine, then I changed some integer on the slave, it really computed different checkum as expected. I also did a logical dump compare, it really seems to be doing its job fine.
--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
https://www.linkedin.com/in/scottribe/
(303) 722-0567 voiceRegards,
Ang Wei Shan--Regards,
Ang Wei Shan
On Thu, Jun 11, 2015 at 07:12:36AM +0200, hydra wrote: > I am entering this discussion a bit late, so maybe I am missing the point. > But SR is using xlog and there is a crc32 checksum on each xlog record. So > why would you need to compare the whole thing again when each record has > been approved during replication ? > > > > Hello Jan, > you don't have it if you don't want to. However I'd like to have the > possibility to do so and thus I was asking - is it possible? Are you guys doing > it? > > The reasons are mentioned above, but still: > - bugs can appear anywhere, > - the bug report mentioned before also states to check the data, but does not > give and hints how to do it, that's why I asked here The answer is "no". -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Thu, Jun 11, 2015 at 08:39:05AM -0400, Bruce Momjian wrote: > On Thu, Jun 11, 2015 at 07:12:36AM +0200, hydra wrote: > > I am entering this discussion a bit late, so maybe I am missing the point. > > But SR is using xlog and there is a crc32 checksum on each xlog record. So > > why would you need to compare the whole thing again when each record has > > been approved during replication ? > > > > > > > > Hello Jan, > > you don't have it if you don't want to. However I'd like to have the > > possibility to do so and thus I was asking - is it possible? Are you guys doing > > it? > > > > The reasons are mentioned above, but still: > > - bugs can appear anywhere, > > - the bug report mentioned before also states to check the data, but does not > > give and hints how to do it, that's why I asked here > > The answer is "no". You are falling into a trap I see often. There is a flaw in another database product, and you want Postgres to fix it or monitor for it. Postgres just doesn't have that problem to the same level. We have different problems, and we allocate resources to add features based on our own problems, not the problems of other database products. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
El jue, 11-06-2015 a las 07:14 +0200, hydra escribió: > > > On Tue, Jun 9, 2015 at 9:47 PM, Jan Lentfer <Jan.Lentfer@web.de> > wrote: > > > > > > Am 05.06.2015 um 16:56 schrieb Scott Ribe > <scott_ribe@elevated-dev.com>: > > > >> On Jun 5, 2015, at 8:42 AM, Igor Neyman > <ineyman@perceptron.com> wrote: > >> > >> The problem I see with “checksum utility” is that for it to > work both compared servers should be “static”: not > transactions while it does its job. > > > > Indeed, and that was brought up before and OP seems to be > ignoring it. What magic does MySQL (supposedly) use to compare > databases without interfering with updates? > > > Also, if I remember the Postgres SR bug correctly, this kind > of check that Percona provides would not have helped with this > kind of bug. The corruption did not occur *during* replication > but only if you restarted the slave because transactions were > falsely marked as commited or non-commited when the slave came > up again. You might have noticed the corruption earlier, > though. > > > > Ok but we do restart our slaves from time to time (upgrades) so sooner > or later you would discover if that would be the problem. But maybe it > will discover bugs/problems that occur *during* replication. Are you or your company considering to fund "this tool"?, if yes, maybe you get some feedback at pgsql-hackers@postgresql.org > > > > > One could imagine a built-in feature in PG which depends on > using MVCC and having both sides look at the same snapshot. > (Which would require repeatable reads.) > I actually think this would a need thing to have (for > pre-production) test environments, like alpha or beta testing. > > Jan > >
On Thu, Jun 11, 2015 at 08:39:05AM -0400, Bruce Momjian wrote:
> On Thu, Jun 11, 2015 at 07:12:36AM +0200, hydra wrote:
> > I am entering this discussion a bit late, so maybe I am missing the point.
> > But SR is using xlog and there is a crc32 checksum on each xlog record. So
> > why would you need to compare the whole thing again when each record has
> > been approved during replication ?
> >
> >
> >
> > Hello Jan,
> > you don't have it if you don't want to. However I'd like to have the
> > possibility to do so and thus I was asking - is it possible? Are you guys doing
> > it?
> >
> > The reasons are mentioned above, but still:
> > - bugs can appear anywhere,
> > - the bug report mentioned before also states to check the data, but does not
> > give and hints how to do it, that's why I asked here
>
> The answer is "no".
You are falling into a trap I see often. There is a flaw in another
database product, and you want Postgres to fix it or monitor for it.
Postgres just doesn't have that problem to the same level. We have
different problems, and we allocate resources to add features based on
our own problems, not the problems of other database products.
I didn't bring up this topic because I would have problem with data replication consistency on MySQL. Indeed, I've been using different kinds of replication and after setting it up correctly (which took me some time), I've managed to live 3 years (while doing constant upgrades on the databases) without the data being different. It really helped me to know it's ok and many of the bugs that could have caused the data to differ could be spotted and monitored. Thankfully no data corruption occurred yet (to my best knowledge).
I've posted a real life experience of having a hard time doing a switchover and realizing something was wrong (data missing, old data present, etc.) with PostgreSQL. But it's general, not PostgreSQL specific, because:
- every software has bugs,
- replication involves software and hardware, both vulnerable to data corruption,
- data in the database is one of the most critical parts in IT.
Please don't blindly close your eyes saying this does not affect PostgreSQL. Besides, you are not checking, how can you know?
It's like and old joke:
A: "We never had a security incident"
B: "Are you monitoring it?"
A: "Err.. no."
:)
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ Everyone has their own god. +
El jue, 11-06-2015 a las 07:14 +0200, hydra escribió:
>
>
> On Tue, Jun 9, 2015 at 9:47 PM, Jan Lentfer <Jan.Lentfer@web.de>
> wrote:
>
>
>
>
> > Am 05.06.2015 um 16:56 schrieb Scott Ribe
> <scott_ribe@elevated-dev.com>:
> >
> >> On Jun 5, 2015, at 8:42 AM, Igor Neyman
> <ineyman@perceptron.com> wrote:
> >>
> >> The problem I see with “checksum utility” is that for it to
> work both compared servers should be “static”: not
> transactions while it does its job.
> >
> > Indeed, and that was brought up before and OP seems to be
> ignoring it. What magic does MySQL (supposedly) use to compare
> databases without interfering with updates?
> >
> Also, if I remember the Postgres SR bug correctly, this kind
> of check that Percona provides would not have helped with this
> kind of bug. The corruption did not occur *during* replication
> but only if you restarted the slave because transactions were
> falsely marked as commited or non-commited when the slave came
> up again. You might have noticed the corruption earlier,
> though.
>
>
>
> Ok but we do restart our slaves from time to time (upgrades) so sooner
> or later you would discover if that would be the problem. But maybe it
> will discover bugs/problems that occur *during* replication.
Are you or your company considering to fund "this tool"?, if yes, maybe
you get some feedback at pgsql-hackers@postgresql.org
http://www.cri.ensmp.fr/~coelho/pg_comparator/pg_comparator.html
>
>
>
> > One could imagine a built-in feature in PG which depends on
> using MVCC and having both sides look at the same snapshot.
> (Which would require repeatable reads.)
> I actually think this would a need thing to have (for
> pre-production) test environments, like alpha or beta testing.
>
> Jan
>
>