Обсуждение: replication consistency checking

Поиск

Список

Период

Сортировка

replication consistency checking

От

hydra

Дата:

04 июня 2015 г., 07:49:36

After setting up streaming replication, is it possible to check whether the slave has the same data as the master?

In the MySQL world there is the percona-toolkit with pt-table-checksum that does this job:
https://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html

Thank you

Re: replication consistency checking

От

Venkata Balaji N

Дата:

04 июня 2015 г., 09:27:54

On Thu, Jun 4, 2015 at 2:49 PM, hydra <hydrapolic@gmail.com> wrote:

After setting up streaming replication, is it possible to check whether the slave has the same data as the master?

Yes. There are quite a few ways to check the sync status between master and standby. Which version of PostgreSQL are you using by the way ?

Generally, you can easily build some scripts and schedule them in cron to check the status. You can use functions or catalog views to monitor the status, this depends on the version of PG you are using.

You can also use tools like "repmgr" to monitor and manage replication. I am not 100% sure how good or efficient it is.

Regards,

Venkata Balaji N

Fujitsu Australia

Re: replication consistency checking

От

hydra

Дата:

04 июня 2015 г., 11:47:09

On Thu, Jun 4, 2015 at 8:27 AM, Venkata Balaji N <nag1010@gmail.com> wrote:

On Thu, Jun 4, 2015 at 2:49 PM, hydra <hydrapolic@gmail.com> wrote:
After setting up streaming replication, is it possible to check whether the slave has the same data as the master?

Yes. There are quite a few ways to check the sync status between master and standby. Which version of PostgreSQL are you using by the way ?

Generally, you can easily build some scripts and schedule them in cron to check the status. You can use functions or catalog views to monitor the status, this depends on the version of PG you are using.

You can also use tools like "repmgr" to monitor and manage replication. I am not 100% sure how good or efficient it is.

Regards,
Venkata Balaji N

Fujitsu Australia

Thanks for the answer, however I'm looking for a data consistency check.

That means, you deploy your master standby replication, you're happily running it and after 3 months you decide to do a failover. How can one be sure the data are the same?

I'm running PostgreSQL 9.4.

Re: replication consistency checking

От

hydra

Дата:

04 июня 2015 г., 14:03:07

Thanks Greg, this looks nice.

However my original question still remains. You know, every software has bugs, every bits and pieces can break, hardware can be misbehaving. Really, checking the data and counting the checksum is the only way to be sure.

On Thu, Jun 4, 2015 at 11:12 AM, Greg Clough <greg.clough@cwtdigital.co.uk> wrote:

If you're using Streaming Replication, then the internal PostgreSQL code ensures consistency... but if you wanted to make sure that your standby is applying logs, then you can use something like:

postgres@HOST2:~$ psql -c 'select pg_last_xlog_receive_location() "receive_location", pg_last_xlog_replay_location() "replay_location", pg_is_in_recovery() "recovery_status";'
receive_location | replay_location | recovery_status
------------------+-----------------+-----------------
7/68002388       | 7/68002388      | t
(1 row)

... or as suggested previously, use repmgr, which has a "repmgrd" daemon that keeps things monitored for you:

postgres@HOST1:~$ psql repmgr repmgr -c 'select * from repmgr_test.repl_monitor;'
primary_node | standby_node |       last_monitor_time       |        last_apply_time        | last_wal_primary_location | last_wal_standby_location | replication_lag | apply_lag
--------------+--------------+-------------------------------+-------------------------------+---------------------------+---------------------------+-----------------+-----------
           1 |            2 | 2015-06-04 09:09:37.920809+00 | 2015-06-04 09:09:37.712235+00 | 30/F85D7280               | 30/F85D7280               |               0 |         0
           1 |            2 | 2015-06-04 09:09:40.45123+00 | 2015-06-04 09:09:38.251625+00 | 30/F85D73C0               | 30/F85D73C0               |               0 |         0
           1 |            2 | 2015-06-04 09:09:42.966728+00 | 2015-06-04 09:09:40.75668+00 | 30/F85D74C0               | 30/F85D74C0               |               0 |         0
           1 |            2 | 2015-06-04 09:09:45.480471+00 | 2015-06-04 09:09:43.273428+00 | 30/F85D75C0               | 30/F85D75C0               |               0 |         0
           1 |            2 | 2015-06-04 09:09:47.997739+00 | 2015-06-04 09:09:45.785806+00 | 30/F85D76C0               | 30/F85D76C0               |               0 |         0
           1 |            2 | 2015-06-04 09:09:50.515236+00 | 2015-06-04 09:09:48.303822+00 | 30/F85D77C0               | 30/F85D77C0               |               0 |         0

Greg Clough
Database Administrator | CWT Digital
t. 0845 456 0070 w. cwtdigital.com

Units A/B, Level 8 North, New England House,
New England Street, Brighton, East Sussex, BN1 4GH


On 4 June 2015 at 09:47, hydra <hydrapolic@gmail.com> wrote:

On Thu, Jun 4, 2015 at 8:27 AM, Venkata Balaji N <nag1010@gmail.com> wrote:

On Thu, Jun 4, 2015 at 2:49 PM, hydra <hydrapolic@gmail.com> wrote:
After setting up streaming replication, is it possible to check whether the slave has the same data as the master?

Yes. There are quite a few ways to check the sync status between master and standby. Which version of PostgreSQL are you using by the way ?

Generally, you can easily build some scripts and schedule them in cron to check the status. You can use functions or catalog views to monitor the status, this depends on the version of PG you are using.

You can also use tools like "repmgr" to monitor and manage replication. I am not 100% sure how good or efficient it is.

Regards,
Venkata Balaji N

Fujitsu Australia

Thanks for the answer, however I'm looking for a data consistency check.

That means, you deploy your master standby replication, you're happily running it and after 3 months you decide to do a failover. How can one be sure the data are the same?

I'm running PostgreSQL 9.4.

Re: replication consistency checking

От

Guillaume Lelarge

Дата:

04 июня 2015 г., 14:41:17

2015-06-04 13:03 GMT+02:00 hydra <hydrapolic@gmail.com>:

Thanks Greg, this looks nice.

However my original question still remains. You know, every software has bugs, every bits and pieces can break, hardware can be misbehaving. Really, checking the data and counting the checksum is the only way to be sure.

There is no such tool available as far as I know. Writing one that does that should not be quite difficult. The main issue, AFAICT, would be to stop writing on both while you do the check. I know many users wouldn't be happy with this. And if you do not stop them from writing, you'll get quite a lot of false positives on a busy system.

On Thu, Jun 4, 2015 at 11:12 AM, Greg Clough <greg.clough@cwtdigital.co.uk> wrote:
If you're using Streaming Replication, then the internal PostgreSQL code ensures consistency... but if you wanted to make sure that your standby is applying logs, then you can use something like:

postgres@HOST2:~$ psql -c 'select pg_last_xlog_receive_location() "receive_location", pg_last_xlog_replay_location() "replay_location", pg_is_in_recovery() "recovery_status";'
receive_location | replay_location | recovery_status
------------------+-----------------+-----------------
7/68002388       | 7/68002388      | t
(1 row)

... or as suggested previously, use repmgr, which has a "repmgrd" daemon that keeps things monitored for you:

postgres@HOST1:~$ psql repmgr repmgr -c 'select * from repmgr_test.repl_monitor;'
primary_node | standby_node |       last_monitor_time       |        last_apply_time        | last_wal_primary_location | last_wal_standby_location | replication_lag | apply_lag
--------------+--------------+-------------------------------+-------------------------------+---------------------------+---------------------------+-----------------+-----------
           1 |            2 | 2015-06-04 09:09:37.920809+00 | 2015-06-04 09:09:37.712235+00 | 30/F85D7280               | 30/F85D7280               |               0 |         0
           1 |            2 | 2015-06-04 09:09:40.45123+00 | 2015-06-04 09:09:38.251625+00 | 30/F85D73C0               | 30/F85D73C0               |               0 |         0
           1 |            2 | 2015-06-04 09:09:42.966728+00 | 2015-06-04 09:09:40.75668+00 | 30/F85D74C0               | 30/F85D74C0               |               0 |         0
           1 |            2 | 2015-06-04 09:09:45.480471+00 | 2015-06-04 09:09:43.273428+00 | 30/F85D75C0               | 30/F85D75C0               |               0 |         0
           1 |            2 | 2015-06-04 09:09:47.997739+00 | 2015-06-04 09:09:45.785806+00 | 30/F85D76C0               | 30/F85D76C0               |               0 |         0
           1 |            2 | 2015-06-04 09:09:50.515236+00 | 2015-06-04 09:09:48.303822+00 | 30/F85D77C0               | 30/F85D77C0               |               0 |         0

Greg Clough
Database Administrator | CWT Digital
t. 0845 456 0070 w. cwtdigital.com

Units A/B, Level 8 North, New England House,
New England Street, Brighton, East Sussex, BN1 4GH


On 4 June 2015 at 09:47, hydra <hydrapolic@gmail.com> wrote:

On Thu, Jun 4, 2015 at 8:27 AM, Venkata Balaji N <nag1010@gmail.com> wrote:

On Thu, Jun 4, 2015 at 2:49 PM, hydra <hydrapolic@gmail.com> wrote:
After setting up streaming replication, is it possible to check whether the slave has the same data as the master?

Yes. There are quite a few ways to check the sync status between master and standby. Which version of PostgreSQL are you using by the way ?

Generally, you can easily build some scripts and schedule them in cron to check the status. You can use functions or catalog views to monitor the status, this depends on the version of PG you are using.

You can also use tools like "repmgr" to monitor and manage replication. I am not 100% sure how good or efficient it is.

Regards,
Venkata Balaji N

Fujitsu Australia

Thanks for the answer, however I'm looking for a data consistency check.

That means, you deploy your master standby replication, you're happily running it and after 3 months you decide to do a failover. How can one be sure the data are the same?

I'm running PostgreSQL 9.4.

--

Guillaume.

http://blog.guillaume.lelarge.info

http://www.dalibo.com

Re: replication consistency checking

От

hydra

Дата:

04 июня 2015 г., 15:04:34

On Thu, Jun 4, 2015 at 1:41 PM, Guillaume Lelarge <guillaume@lelarge.info> wrote:

2015-06-04 13:03 GMT+02:00 hydra <hydrapolic@gmail.com>:
Thanks Greg, this looks nice.

However my original question still remains. You know, every software has bugs, every bits and pieces can break, hardware can be misbehaving. Really, checking the data and counting the checksum is the only way to be sure.

There is no such tool available as far as I know. Writing one that does that should not be quite difficult. The main issue, AFAICT, would be to stop writing on both while you do the check. I know many users wouldn't be happy with this. And if you do not stop them from writing, you'll get quite a lot of false positives on a busy system.

I have little experience with PostgreSQL, but that is not true for MySQL. You can have your system running and doing consistency checking online.

On Thu, Jun 4, 2015 at 11:12 AM, Greg Clough <greg.clough@cwtdigital.co.uk> wrote:
If you're using Streaming Replication, then the internal PostgreSQL code ensures consistency... but if you wanted to make sure that your standby is applying logs, then you can use something like:

postgres@HOST2:~$ psql -c 'select pg_last_xlog_receive_location() "receive_location", pg_last_xlog_replay_location() "replay_location", pg_is_in_recovery() "recovery_status";'
receive_location | replay_location | recovery_status
------------------+-----------------+-----------------
7/68002388       | 7/68002388      | t
(1 row)

... or as suggested previously, use repmgr, which has a "repmgrd" daemon that keeps things monitored for you:

postgres@HOST1:~$ psql repmgr repmgr -c 'select * from repmgr_test.repl_monitor;'
primary_node | standby_node |       last_monitor_time       |        last_apply_time        | last_wal_primary_location | last_wal_standby_location | replication_lag | apply_lag
--------------+--------------+-------------------------------+-------------------------------+---------------------------+---------------------------+-----------------+-----------
           1 |            2 | 2015-06-04 09:09:37.920809+00 | 2015-06-04 09:09:37.712235+00 | 30/F85D7280               | 30/F85D7280               |               0 |         0
           1 |            2 | 2015-06-04 09:09:40.45123+00 | 2015-06-04 09:09:38.251625+00 | 30/F85D73C0               | 30/F85D73C0               |               0 |         0
           1 |            2 | 2015-06-04 09:09:42.966728+00 | 2015-06-04 09:09:40.75668+00 | 30/F85D74C0               | 30/F85D74C0               |               0 |         0
           1 |            2 | 2015-06-04 09:09:45.480471+00 | 2015-06-04 09:09:43.273428+00 | 30/F85D75C0               | 30/F85D75C0               |               0 |         0
           1 |            2 | 2015-06-04 09:09:47.997739+00 | 2015-06-04 09:09:45.785806+00 | 30/F85D76C0               | 30/F85D76C0               |               0 |         0
           1 |            2 | 2015-06-04 09:09:50.515236+00 | 2015-06-04 09:09:48.303822+00 | 30/F85D77C0               | 30/F85D77C0               |               0 |         0

Greg Clough
Database Administrator | CWT Digital
t. 0845 456 0070 w. cwtdigital.com

Units A/B, Level 8 North, New England House,
New England Street, Brighton, East Sussex, BN1 4GH


On 4 June 2015 at 09:47, hydra <hydrapolic@gmail.com> wrote:

On Thu, Jun 4, 2015 at 8:27 AM, Venkata Balaji N <nag1010@gmail.com> wrote:

On Thu, Jun 4, 2015 at 2:49 PM, hydra <hydrapolic@gmail.com> wrote:
After setting up streaming replication, is it possible to check whether the slave has the same data as the master?

Yes. There are quite a few ways to check the sync status between master and standby. Which version of PostgreSQL are you using by the way ?

Generally, you can easily build some scripts and schedule them in cron to check the status. You can use functions or catalog views to monitor the status, this depends on the version of PG you are using.

You can also use tools like "repmgr" to monitor and manage replication. I am not 100% sure how good or efficient it is.

Regards,
Venkata Balaji N

Fujitsu Australia

Thanks for the answer, however I'm looking for a data consistency check.

That means, you deploy your master standby replication, you're happily running it and after 3 months you decide to do a failover. How can one be sure the data are the same?

I'm running PostgreSQL 9.4.

--
Guillaume.
http://blog.guillaume.lelarge.info
http://www.dalibo.com

Re: replication consistency checking

От

Guillaume Lelarge

Дата:

04 июня 2015 г., 15:19:39

2015-06-04 14:04 GMT+02:00 hydra <hydrapolic@gmail.com>:

On Thu, Jun 4, 2015 at 1:41 PM, Guillaume Lelarge <guillaume@lelarge.info> wrote:
2015-06-04 13:03 GMT+02:00 hydra <hydrapolic@gmail.com>:
Thanks Greg, this looks nice.

However my original question still remains. You know, every software has bugs, every bits and pieces can break, hardware can be misbehaving. Really, checking the data and counting the checksum is the only way to be sure.

There is no such tool available as far as I know. Writing one that does that should not be quite difficult. The main issue, AFAICT, would be to stop writing on both while you do the check. I know many users wouldn't be happy with this. And if you do not stop them from writing, you'll get quite a lot of false positives on a busy system.

I have little experience with PostgreSQL, but that is not true for MySQL. You can have your system running and doing consistency checking online.

I was only speaking about PostgreSQL. I have no experience with MySQL.

On Thu, Jun 4, 2015 at 11:12 AM, Greg Clough <greg.clough@cwtdigital.co.uk> wrote:
If you're using Streaming Replication, then the internal PostgreSQL code ensures consistency... but if you wanted to make sure that your standby is applying logs, then you can use something like:

postgres@HOST2:~$ psql -c 'select pg_last_xlog_receive_location() "receive_location", pg_last_xlog_replay_location() "replay_location", pg_is_in_recovery() "recovery_status";'
receive_location | replay_location | recovery_status
------------------+-----------------+-----------------
7/68002388       | 7/68002388      | t
(1 row)

... or as suggested previously, use repmgr, which has a "repmgrd" daemon that keeps things monitored for you:

postgres@HOST1:~$ psql repmgr repmgr -c 'select * from repmgr_test.repl_monitor;'
primary_node | standby_node |       last_monitor_time       |        last_apply_time        | last_wal_primary_location | last_wal_standby_location | replication_lag | apply_lag
--------------+--------------+-------------------------------+-------------------------------+---------------------------+---------------------------+-----------------+-----------
           1 |            2 | 2015-06-04 09:09:37.920809+00 | 2015-06-04 09:09:37.712235+00 | 30/F85D7280               | 30/F85D7280               |               0 |         0
           1 |            2 | 2015-06-04 09:09:40.45123+00 | 2015-06-04 09:09:38.251625+00 | 30/F85D73C0               | 30/F85D73C0               |               0 |         0
           1 |            2 | 2015-06-04 09:09:42.966728+00 | 2015-06-04 09:09:40.75668+00 | 30/F85D74C0               | 30/F85D74C0               |               0 |         0
           1 |            2 | 2015-06-04 09:09:45.480471+00 | 2015-06-04 09:09:43.273428+00 | 30/F85D75C0               | 30/F85D75C0               |               0 |         0
           1 |            2 | 2015-06-04 09:09:47.997739+00 | 2015-06-04 09:09:45.785806+00 | 30/F85D76C0               | 30/F85D76C0               |               0 |         0
           1 |            2 | 2015-06-04 09:09:50.515236+00 | 2015-06-04 09:09:48.303822+00 | 30/F85D77C0               | 30/F85D77C0               |               0 |         0

Greg Clough
Database Administrator | CWT Digital
t. 0845 456 0070 w. cwtdigital.com

Units A/B, Level 8 North, New England House,
New England Street, Brighton, East Sussex, BN1 4GH


On 4 June 2015 at 09:47, hydra <hydrapolic@gmail.com> wrote:

On Thu, Jun 4, 2015 at 8:27 AM, Venkata Balaji N <nag1010@gmail.com> wrote:

On Thu, Jun 4, 2015 at 2:49 PM, hydra <hydrapolic@gmail.com> wrote:
After setting up streaming replication, is it possible to check whether the slave has the same data as the master?

Yes. There are quite a few ways to check the sync status between master and standby. Which version of PostgreSQL are you using by the way ?

Generally, you can easily build some scripts and schedule them in cron to check the status. You can use functions or catalog views to monitor the status, this depends on the version of PG you are using.

You can also use tools like "repmgr" to monitor and manage replication. I am not 100% sure how good or efficient it is.

Regards,
Venkata Balaji N

Fujitsu Australia

Thanks for the answer, however I'm looking for a data consistency check.

That means, you deploy your master standby replication, you're happily running it and after 3 months you decide to do a failover. How can one be sure the data are the same?

I'm running PostgreSQL 9.4.

--
Guillaume.
http://blog.guillaume.lelarge.info
http://www.dalibo.com

--

Guillaume.

http://blog.guillaume.lelarge.info

http://www.dalibo.com

Re: replication consistency checking

От

Bruce Momjian

Дата:

04 июня 2015 г., 15:32:07

On Thu, Jun  4, 2015 at 01:03:00PM +0200, hydra wrote:
> Thanks Greg, this looks nice.
>
> However my original question still remains. You know, every software has bugs,
> every bits and pieces can break, hardware can be misbehaving. Really, checking
> the data and counting the checksum is the only way to be sure.

I believe MySQL needed such a tool because it had known replication
synchronization problems  ---  Postgres does not, so has no such tool.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Re: replication consistency checking

От

"David G. Johnston"

Дата:

04 июня 2015 г., 15:36:32

On Thu, Jun 4, 2015 at 8:32 AM, Bruce Momjian <bruce@momjian.us> wrote:

On Thu, Jun 4, 2015 at 01:03:00PM +0200, hydra wrote:
> Thanks Greg, this looks nice.
>
> However my original question still remains. You know, every software has bugs,
> every bits and pieces can break, hardware can be misbehaving. Really, checking
> the data and counting the checksum is the only way to be sure.

I believe MySQL needed such a tool because it had known replication
synchronization problems --- Postgres does not, so has no such tool.

Its those unknown replication synchronization problems that this tool would be able to catch...

How to page checksums factor into this?

David J.

Re: replication consistency checking

От

Ravi Krishna

Дата:

04 июня 2015 г., 15:37:51

> I believe MySQL needed such a tool because it had known replication
> synchronization problems  ---  Postgres does not, so has no such tool.

I agree with Bruce here. In DB2, for HADR (similar to PG replication)
there is no tool which checks whether primary and standby are same so
that there is no surprise after a failover. I am pretty same is true
for Oracle Dataguard.

If a RDBMS is offering a tool to check for consistency, there is a
high likelihood it is because of inherent problems.

Re: replication consistency checking

От

Greg Clough

Дата:

04 июня 2015 г., 18:40:56

If you're using Streaming Replication, then the internal PostgreSQL code ensures consistency... but if you wanted to make sure that your standby is applying logs, then you can use something like:

postgres@HOST2:~$ psql -c 'select pg_last_xlog_receive_location() "receive_location", pg_last_xlog_replay_location() "replay_location", pg_is_in_recovery() "recovery_status";'

receive_location | replay_location | recovery_status

------------------+-----------------+-----------------

7/68002388 | 7/68002388 | t

(1 row)

... or as suggested previously, use repmgr, which has a "repmgrd" daemon that keeps things monitored for you:

postgres@HOST1:~$ psql repmgr repmgr -c 'select * from repmgr_test.repl_monitor;'

primary_node | standby_node | last_monitor_time | last_apply_time | last_wal_primary_location | last_wal_standby_location | replication_lag | apply_lag

--------------+--------------+-------------------------------+-------------------------------+---------------------------+---------------------------+-----------------+-----------

1 | 2 | 2015-06-04 09:09:37.920809+00 | 2015-06-04 09:09:37.712235+00 | 30/F85D7280 | 30/F85D7280 | 0 | 0

1 | 2 | 2015-06-04 09:09:40.45123+00 | 2015-06-04 09:09:38.251625+00 | 30/F85D73C0 | 30/F85D73C0 | 0 | 0

1 | 2 | 2015-06-04 09:09:42.966728+00 | 2015-06-04 09:09:40.75668+00 | 30/F85D74C0 | 30/F85D74C0 | 0 | 0

1 | 2 | 2015-06-04 09:09:45.480471+00 | 2015-06-04 09:09:43.273428+00 | 30/F85D75C0 | 30/F85D75C0 | 0 | 0

1 | 2 | 2015-06-04 09:09:47.997739+00 | 2015-06-04 09:09:45.785806+00 | 30/F85D76C0 | 30/F85D76C0 | 0 | 0

1 | 2 | 2015-06-04 09:09:50.515236+00 | 2015-06-04 09:09:48.303822+00 | 30/F85D77C0 | 30/F85D77C0 | 0 | 0

Greg Clough

Database Administrator | CWT Digital

t. 0845 456 0070 w. cwtdigital.com

Units A/B, Level 8 North, New England House,
New England Street, Brighton, East Sussex, BN1 4GH

On 4 June 2015 at 09:47, hydra <hydrapolic@gmail.com> wrote:

On Thu, Jun 4, 2015 at 8:27 AM, Venkata Balaji N <nag1010@gmail.com> wrote:

On Thu, Jun 4, 2015 at 2:49 PM, hydra <hydrapolic@gmail.com> wrote:
After setting up streaming replication, is it possible to check whether the slave has the same data as the master?

Yes. There are quite a few ways to check the sync status between master and standby. Which version of PostgreSQL are you using by the way ?

Generally, you can easily build some scripts and schedule them in cron to check the status. You can use functions or catalog views to monitor the status, this depends on the version of PG you are using.

You can also use tools like "repmgr" to monitor and manage replication. I am not 100% sure how good or efficient it is.

Regards,
Venkata Balaji N

Fujitsu Australia

Thanks for the answer, however I'm looking for a data consistency check.

That means, you deploy your master standby replication, you're happily running it and after 3 months you decide to do a failover. How can one be sure the data are the same?

I'm running PostgreSQL 9.4.

Вложения

Re: replication consistency checking

От

Sergey Konoplev

Дата:

05 июня 2015 г., 05:50:22

On Wed, Jun 3, 2015 at 9:49 PM, hydra <hydrapolic@gmail.com> wrote:
> After setting up streaming replication, is it possible to check whether the
> slave has the same data as the master?
>
> In the MySQL world there is the percona-toolkit with pt-table-checksum that
> does this job:
> https://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html

I believe you only need
http://www.postgresql.org/docs/9.4/static/app-initdb.html#APP-INITDB-DATA-CHECKSUMS.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (499) 346-7196, +7 (988) 888-1979
gray.ru@gmail.com

Re: replication consistency checking

От

hydra

Дата:

05 июня 2015 г., 07:33:28

On Fri, Jun 5, 2015 at 4:40 AM, Sergey Konoplev <gray.ru@gmail.com> wrote:

On Wed, Jun 3, 2015 at 9:49 PM, hydra <hydrapolic@gmail.com> wrote:
> After setting up streaming replication, is it possible to check whether the
> slave has the same data as the master?
>
> In the MySQL world there is the percona-toolkit with pt-table-checksum that
> does this job:
> https://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html

I believe you only need
http://www.postgresql.org/docs/9.4/static/app-initdb.html#APP-INITDB-DATA-CHECKSUMS.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (499) 346-7196, +7 (988) 888-1979
gray.ru@gmail.com

Thank you all for replies,
while looking for replication information I found this:
http://thebuild.com/presentations/worst-day-fosdem-2014.pdf

It's a real life experience of hitting this replication bug:
https://wiki.postgresql.org/wiki/Nov2013ReplicationIssue

The primary symptom of this corruption is rows that:

are present on the master, but missing on the replica
have been deleted on the master still appear to be visible on the replica
have been updated, and their old versions appear alongside the new, updated versions on the replica

How can I verify whether I already have this corruption?
There is no known way to identify that the issue has affected a standby in the past but comparing the data from the primary with the standby.

So hands up who still think PostgreSQL doesn't need some way of checking the data consistency between master-standby? :)

Re: replication consistency checking

От

Igor Neyman

Дата:

05 июня 2015 г., 16:43:13

From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of hydra
Sent: Friday, June 05, 2015 12:33 AM
To: pgsql-admin
Subject: Re: [ADMIN] replication consistency checking

On Fri, Jun 5, 2015 at 4:40 AM, Sergey Konoplev <gray.ru@gmail.com> wrote:

On Wed, Jun 3, 2015 at 9:49 PM, hydra <hydrapolic@gmail.com> wrote:
> After setting up streaming replication, is it possible to check whether the
> slave has the same data as the master?
>
> In the MySQL world there is the percona-toolkit with pt-table-checksum that
> does this job:
> https://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html
I believe you only need
http://www.postgresql.org/docs/9.4/static/app-initdb.html#APP-INITDB-DATA-CHECKSUMS.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (499) 346-7196, +7 (988) 888-1979
gray.ru@gmail.com

Thank you all for replies,
while looking for replication information I found this:
http://thebuild.com/presentations/worst-day-fosdem-2014.pdf

It's a real life experience of hitting this replication bug:
https://wiki.postgresql.org/wiki/Nov2013ReplicationIssue

The primary symptom of this corruption is rows that:

are present on the master, but missing on the replica
have been deleted on the master still appear to be visible on the replica
have been updated, and their old versions appear alongside the new, updated versions on the replica

How can I verify whether I already have this corruption?
There is no known way to identify that the issue has affected a standby in the past but comparing the data from the primary with the standby.

So hands up who still think PostgreSQL doesn't need some way of checking the data consistency between master-standby? :)

My hand is up.

From the wiki page that you referenced you forgot to quote this:

“This is an issue, discovered Nov. 18, 2013., which can cause data corruption on a Hot-Standby replica when it is (re-)started, by marking committed transactions as uncommitted. This issue is fixed in the December 5th 2013 update releases.”

So, what’s your point? Yes, any software can (and does) have bugs. What’s important is how quickly the bug is discovered and fixed.

You wish to write a utility that compares data on 2 different clusters – sure, by all means, but I believe time could be better spent on something else.

But, how is to say that this utility will be 100% error-free?

Regards,

Igor Neyman

Re: replication consistency checking

От

hydra

Дата:

05 июня 2015 г., 17:25:08

On Fri, Jun 5, 2015 at 3:42 PM, Igor Neyman <ineyman@perceptron.com> wrote:

From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of hydra
Sent: Friday, June 05, 2015 12:33 AM
To: pgsql-admin
Subject: Re: [ADMIN] replication consistency checking

On Fri, Jun 5, 2015 at 4:40 AM, Sergey Konoplev <gray.ru@gmail.com> wrote:
On Wed, Jun 3, 2015 at 9:49 PM, hydra <hydrapolic@gmail.com> wrote:
> After setting up streaming replication, is it possible to check whether the
> slave has the same data as the master?
>
> In the MySQL world there is the percona-toolkit with pt-table-checksum that
> does this job:
> https://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html
I believe you only need
http://www.postgresql.org/docs/9.4/static/app-initdb.html#APP-INITDB-DATA-CHECKSUMS.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (499) 346-7196, +7 (988) 888-1979
gray.ru@gmail.com

Thank you all for replies,
while looking for replication information I found this:
http://thebuild.com/presentations/worst-day-fosdem-2014.pdf
It's a real life experience of hitting this replication bug:
https://wiki.postgresql.org/wiki/Nov2013ReplicationIssue
The primary symptom of this corruption is rows that:
are present on the master, but missing on the replica
have been deleted on the master still appear to be visible on the replica
have been updated, and their old versions appear alongside the new, updated versions on the replica
How can I verify whether I already have this corruption?
There is no known way to identify that the issue has affected a standby in the past but comparing the data from the primary with the standby.

So hands up who still think PostgreSQL doesn't need some way of checking the data consistency between master-standby? :)

My hand is up.
From the wiki page that you referenced you forgot to quote this:

“This is an issue, discovered Nov. 18, 2013., which can cause data corruption on a Hot-Standby replica when it is (re-)started, by marking committed transactions as uncommitted. This issue is fixed in the December 5th 2013 update releases.”

So, what’s your point? Yes, any software can (and does) have bugs. What’s important is how quickly the bug is discovered and fixed.

You wish to write a utility that compares data on 2 different clusters – sure, by all means, but I believe time could be better spent on something else.
But, how is to say that this utility will be 100% error-free?

Regards,
Igor Neyman

Hello Igor,

no code is without bugs, yes, we are humans. That is why I'm so shocked that everybody blindly trusts the replication (which is code again, plus network and hardware).

My point was to show that such a tool would be beneficial even for PostgreSQL. Yes, even the checking utility can have bugs, but at least you are better than now. Because now everybody relies on the fact that it should be ok. But as we see from the bug report, such problems exist here also. So from the situation where you believe everything should be ok you would have a tool reporting "checksums ok" and would allow your to trust your standby data more.

Re: replication consistency checking

От

Igor Neyman

Дата:

05 июня 2015 г., 17:42:59

From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of hydra
Sent: Friday, June 05, 2015 10:25 AM
To: pgsql-admin
Subject: Re: [ADMIN] replication consistency checking

On Fri, Jun 5, 2015 at 3:42 PM, Igor Neyman <ineyman@perceptron.com> wrote:

From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of hydra
Sent: Friday, June 05, 2015 12:33 AM
To: pgsql-admin
Subject: Re: [ADMIN] replication consistency checking

On Fri, Jun 5, 2015 at 4:40 AM, Sergey Konoplev <gray.ru@gmail.com> wrote:
On Wed, Jun 3, 2015 at 9:49 PM, hydra <hydrapolic@gmail.com> wrote:
> After setting up streaming replication, is it possible to check whether the
> slave has the same data as the master?
>
> In the MySQL world there is the percona-toolkit with pt-table-checksum that
> does this job:
> https://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html
I believe you only need
http://www.postgresql.org/docs/9.4/static/app-initdb.html#APP-INITDB-DATA-CHECKSUMS.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (499) 346-7196, +7 (988) 888-1979
gray.ru@gmail.com

Thank you all for replies,
while looking for replication information I found this:
http://thebuild.com/presentations/worst-day-fosdem-2014.pdf
It's a real life experience of hitting this replication bug:
https://wiki.postgresql.org/wiki/Nov2013ReplicationIssue
The primary symptom of this corruption is rows that:
are present on the master, but missing on the replica
have been deleted on the master still appear to be visible on the replica
have been updated, and their old versions appear alongside the new, updated versions on the replica
How can I verify whether I already have this corruption?
There is no known way to identify that the issue has affected a standby in the past but comparing the data from the primary with the standby.

So hands up who still think PostgreSQL doesn't need some way of checking the data consistency between master-standby? :)

My hand is up.
From the wiki page that you referenced you forgot to quote this:

“This is an issue, discovered Nov. 18, 2013., which can cause data corruption on a Hot-Standby replica when it is (re-)started, by marking committed transactions as uncommitted. This issue is fixed in the December 5th 2013 update releases.”

So, what’s your point? Yes, any software can (and does) have bugs. What’s important is how quickly the bug is discovered and fixed.

You wish to write a utility that compares data on 2 different clusters – sure, by all means, but I believe time could be better spent on something else.
But, how is to say that this utility will be 100% error-free?

Regards,
Igor Neyman

Hello Igor,

no code is without bugs, yes, we are humans. That is why I'm so shocked that everybody blindly trusts the replication (which is code again, plus network and hardware).

My point was to show that such a tool would be beneficial even for PostgreSQL. Yes, even the checking utility can have bugs, but at least you are better than now. Because now everybody relies on the fact that it should be ok. But as we see from the bug report, such problems exist here also. So from the situation where you believe everything should be ok you would have a tool reporting "checksums ok" and would allow your to trust your standby data more.

The problem I see with “checksum utility” is that for it to work both compared servers should be “static”: not transactions while it does its job.

Igor Neyman

Re: replication consistency checking

От

Scott Ribe

Дата:

05 июня 2015 г., 17:57:07

On Jun 5, 2015, at 8:42 AM, Igor Neyman <ineyman@perceptron.com> wrote:
>
> The problem I see with “checksum utility” is that for it to work both compared servers should be “static”:  not
transactionswhile it does its job. 

Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to
comparedatabases without interfering with updates? 

One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same snapshot.
(Whichwould require repeatable reads.) 

But for an external utility, that's quite a bit harder. One suggestion which would involve minimal interruption to
processing:if you have a snapshottable file system, shut down master, take snapshot, shut down replica, bring master
backup, snapshot replica, bring it back up. You *still* have the issue of making sure that at the moment you take
masterdown all changes have been streamed to replica, and the (easier) issue of making sure replica has applied them
beforetaking its snapshot... But if you can manage to pull that off, then you can checksum & compare as much as you
want,*IF* you can actually pull that off correctly ;-) 

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
https://www.linkedin.com/in/scottribe/
(303) 722-0567 voice

Re: replication consistency checking

От

hydra

Дата:

06 июня 2015 г., 07:43:18

On Fri, Jun 5, 2015 at 4:56 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:

On Jun 5, 2015, at 8:42 AM, Igor Neyman <ineyman@perceptron.com> wrote:
>
> The problem I see with “checksum utility” is that for it to work both compared servers should be “static”: not transactions while it does its job.

Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to compare databases without interfering with updates?

One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same snapshot. (Which would require repeatable reads.)

But for an external utility, that's quite a bit harder. One suggestion which would involve minimal interruption to processing: if you have a snapshottable file system, shut down master, take snapshot, shut down replica, bring master back up, snapshot replica, bring it back up. You *still* have the issue of making sure that at the moment you take master down all changes have been streamed to replica, and the (easier) issue of making sure replica has applied them before taking its snapshot... But if you can manage to pull that off, then you can checksum & compare as much as you want, *IF* you can actually pull that off correctly ;-)

Hello,
I wasn't talking about a static check, indeed I was referring to an online tool.

I haven't read the sources for the MySQL tool yet, however it computes the checksum on the master, writes the operation to the binary log and while using statement replication, the slave computes the checksum - then those can be compared. Not all data in the table are checksummed at once, but smaller chunks are used instead.

As Igor mentioned before, that tool can also fail and is not 100%. But I suppose if it would be misbehaving, chances are the checksums will differ and you will notice it. It will probably not by accident compute the same crc/md5 whatever.

I have done some tests with it, comparing data after setting up replication, all went fine, then I changed some integer on the slave, it really computed different checkum as expected. I also did a logical dump compare, it really seems to be doing its job fine.

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
https://www.linkedin.com/in/scottribe/
(303) 722-0567 voice

Re: replication consistency checking

От

Wei Shan

Дата:

09 июня 2015 г., 07:06:07

Hi,

Usually, for failover, we don't do data consistency check. because it's unplanned failover! In my opinion, it's quite safe to trust pg_stat_replication because it's a binary level replication so it's hard to get row missing unless it's a bug. You got higher chance of getting corruption. However, for switchover, we will do the following;

1. Checking pg_stat_replication views

2. Have a script that do a row count on all the data tables for a period of time. We should see that the 2 database have have very similar row count and maybe a slight delay of 1s?

I know that this is not perfect, but at least it's something to start with. The only tool out that that's doing this is Oracle VeriData but I'm not too sure how they are doing this.

Cheers.

On 6 June 2015 at 12:43, hydra <hydrapolic@gmail.com> wrote:

On Fri, Jun 5, 2015 at 4:56 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:
On Jun 5, 2015, at 8:42 AM, Igor Neyman <ineyman@perceptron.com> wrote:
>
> The problem I see with “checksum utility” is that for it to work both compared servers should be “static”: not transactions while it does its job.

Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to compare databases without interfering with updates?

One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same snapshot. (Which would require repeatable reads.)

But for an external utility, that's quite a bit harder. One suggestion which would involve minimal interruption to processing: if you have a snapshottable file system, shut down master, take snapshot, shut down replica, bring master back up, snapshot replica, bring it back up. You *still* have the issue of making sure that at the moment you take master down all changes have been streamed to replica, and the (easier) issue of making sure replica has applied them before taking its snapshot... But if you can manage to pull that off, then you can checksum & compare as much as you want, *IF* you can actually pull that off correctly ;-)

Hello,
I wasn't talking about a static check, indeed I was referring to an online tool.

I haven't read the sources for the MySQL tool yet, however it computes the checksum on the master, writes the operation to the binary log and while using statement replication, the slave computes the checksum - then those can be compared. Not all data in the table are checksummed at once, but smaller chunks are used instead.

As Igor mentioned before, that tool can also fail and is not 100%. But I suppose if it would be misbehaving, chances are the checksums will differ and you will notice it. It will probably not by accident compute the same crc/md5 whatever.

I have done some tests with it, comparing data after setting up replication, all went fine, then I changed some integer on the slave, it really computed different checkum as expected. I also did a logical dump compare, it really seems to be doing its job fine.

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
https://www.linkedin.com/in/scottribe/
(303) 722-0567 voice

--

Regards,
Ang Wei Shan

Re: replication consistency checking

От

Vladimir Borodin

Дата:

09 июня 2015 г., 09:44:05

9 июня 2015 г., в 7:06, Wei Shan <weishan.ang@gmail.com> написал(а):

Hi,

Usually, for failover, we don't do data consistency check. because it's unplanned failover! In my opinion, it's quite safe to trust pg_stat_replication because it's a binary level replication so it's hard to get row missing unless it's a bug. You got higher chance of getting corruption. However, for switchover, we will do the following;

1. Checking pg_stat_replication views
2. Have a script that do a row count on all the data tables for a period of time. We should see that the 2 database have have very similar row count and maybe a slight delay of 1s?

Starting from 9.3 it is sufficient to gracefully stop master (with '-m falst’ or ‘-m smart’) and all changes would be guaranteed delivered to archive and replics (even if they are asynchronous). See this thread [0].

[0] http://www.postgresql.org/message-id/CADp-Sm7LWp-9Ngk5BHcfR1WSwnTtCNmdFkqROA0pD50fjGF5kw@mail.gmail.com

I know that this is not perfect, but at least it's something to start with. The only tool out that that's doing this is Oracle VeriData but I'm not too sure how they are doing this.

Cheers.

On 6 June 2015 at 12:43, hydra <hydrapolic@gmail.com> wrote:

On Fri, Jun 5, 2015 at 4:56 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:
On Jun 5, 2015, at 8:42 AM, Igor Neyman <ineyman@perceptron.com> wrote:
>
> The problem I see with “checksum utility” is that for it to work both compared servers should be “static”: not transactions while it does its job.

Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to compare databases without interfering with updates?

One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same snapshot. (Which would require repeatable reads.)

But for an external utility, that's quite a bit harder. One suggestion which would involve minimal interruption to processing: if you have a snapshottable file system, shut down master, take snapshot, shut down replica, bring master back up, snapshot replica, bring it back up. You *still* have the issue of making sure that at the moment you take master down all changes have been streamed to replica, and the (easier) issue of making sure replica has applied them before taking its snapshot... But if you can manage to pull that off, then you can checksum & compare as much as you want, *IF* you can actually pull that off correctly ;-)

Hello,
I wasn't talking about a static check, indeed I was referring to an online tool.

I haven't read the sources for the MySQL tool yet, however it computes the checksum on the master, writes the operation to the binary log and while using statement replication, the slave computes the checksum - then those can be compared. Not all data in the table are checksummed at once, but smaller chunks are used instead.

As Igor mentioned before, that tool can also fail and is not 100%. But I suppose if it would be misbehaving, chances are the checksums will differ and you will notice it. It will probably not by accident compute the same crc/md5 whatever.

I have done some tests with it, comparing data after setting up replication, all went fine, then I changed some integer on the slave, it really computed different checkum as expected. I also did a logical dump compare, it really seems to be doing its job fine.

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
https://www.linkedin.com/in/scottribe/
(303) 722-0567 voice

--
Regards,
Ang Wei Shan

--

May the force be with you…

https://simply.name

Re: replication consistency checking

От

hydra

Дата:

09 июня 2015 г., 21:58:01

On Tue, Jun 9, 2015 at 6:06 AM, Wei Shan <weishan.ang@gmail.com> wrote:

Hi,

Usually, for failover, we don't do data consistency check. because it's unplanned failover! In my opinion, it's quite safe to trust pg_stat_replication because it's a binary level replication so it's hard to get row missing unless it's a bug. You got higher chance of getting corruption. However, for switchover, we will do the following;

1. Checking pg_stat_replication views
2. Have a script that do a row count on all the data tables for a period of time. We should see that the 2 database have have very similar row count and maybe a slight delay of 1s?

I know that this is not perfect, but at least it's something to start with. The only tool out that that's doing this is Oracle VeriData but I'm not too sure how they are doing this.

Cheers.

Hello,there are situation you wish to do a failover and not just because it's unplanned. For example when migrating to a better hardware.

On 6 June 2015 at 12:43, hydra <hydrapolic@gmail.com> wrote:

On Fri, Jun 5, 2015 at 4:56 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:
On Jun 5, 2015, at 8:42 AM, Igor Neyman <ineyman@perceptron.com> wrote:
>
> The problem I see with “checksum utility” is that for it to work both compared servers should be “static”: not transactions while it does its job.

Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to compare databases without interfering with updates?

One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same snapshot. (Which would require repeatable reads.)

But for an external utility, that's quite a bit harder. One suggestion which would involve minimal interruption to processing: if you have a snapshottable file system, shut down master, take snapshot, shut down replica, bring master back up, snapshot replica, bring it back up. You *still* have the issue of making sure that at the moment you take master down all changes have been streamed to replica, and the (easier) issue of making sure replica has applied them before taking its snapshot... But if you can manage to pull that off, then you can checksum & compare as much as you want, *IF* you can actually pull that off correctly ;-)

Hello,
I wasn't talking about a static check, indeed I was referring to an online tool.

I haven't read the sources for the MySQL tool yet, however it computes the checksum on the master, writes the operation to the binary log and while using statement replication, the slave computes the checksum - then those can be compared. Not all data in the table are checksummed at once, but smaller chunks are used instead.

As Igor mentioned before, that tool can also fail and is not 100%. But I suppose if it would be misbehaving, chances are the checksums will differ and you will notice it. It will probably not by accident compute the same crc/md5 whatever.

I have done some tests with it, comparing data after setting up replication, all went fine, then I changed some integer on the slave, it really computed different checkum as expected. I also did a logical dump compare, it really seems to be doing its job fine.

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
https://www.linkedin.com/in/scottribe/
(303) 722-0567 voice

--
Regards,
Ang Wei Shan

Re: replication consistency checking

От

Jan Lentfer

Дата:

09 июня 2015 г., 22:21:11

Am 06.06.2015 um 06:43 schrieb hydra <hydrapolic@gmail.com>:

On Fri, Jun 5, 2015 at 4:56 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:
On Jun 5, 2015, at 8:42 AM, Igor Neyman <ineyman@perceptron.com> wrote:
>
> The problem I see with “checksum utility” is that for it to work both compared servers should be “static”: not transactions while it does its job.

Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to compare databases without interfering with updates?

One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same snapshot. (Which would require repeatable reads.)

But for an external utility, that's quite a bit harder. One suggestion which would involve minimal interruption to processing: if you have a snapshottable file system, shut down master, take snapshot, shut down replica, bring master back up, snapshot replica, bring it back up. You *still* have the issue of making sure that at the moment you take master down all changes have been streamed to replica, and the (easier) issue of making sure replica has applied them before taking its snapshot... But if you can manage to pull that off, then you can checksum & compare as much as you want, *IF* you can actually pull that off correctly ;-)

Hello,
I wasn't talking about a static check, indeed I was referring to an online tool.

I haven't read the sources for the MySQL tool yet, however it computes the checksum on the master, writes the operation to the binary log and while using statement replication, the slave computes the checksum - then those can be compared. Not all data in the table are checksummed at once, but smaller chunks are used instead.

As Igor mentioned before, that tool can also fail and is not 100%. But I suppose if it would be misbehaving, chances are the checksums will differ and you will notice it. It will probably not by accident compute the same crc/md5 whatever.

I have done some tests with it, comparing data after setting up replication, all went fine, then I changed some integer on the slave, it really computed different checkum as expected. I also did a logical dump compare, it really seems to be doing its job fine.

I am entering this discussion a bit late, so maybe I am missing the point. But SR is using xlog and there is a crc32 checksum on each xlog record. So why would you need to compare the whole thing again when each record has been approved during replication ?

Reference: http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/access/xlogrecord.h;h=4ef6c206d28437ffdecdd7097eb012055ac69c7d;hb=b94085920b016e64ee40a0f6c50199889565cc56

Jan

Re: replication consistency checking

От

Jan Lentfer

Дата:

09 июня 2015 г., 22:48:19




> Am 05.06.2015 um 16:56 schrieb Scott Ribe <scott_ribe@elevated-dev.com>:
>
>> On Jun 5, 2015, at 8:42 AM, Igor Neyman <ineyman@perceptron.com> wrote:
>>
>> The problem I see with “checksum utility” is that for it to work both compared servers should be “static”:  not
transactionswhile it does its job. 
>
> Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to
comparedatabases without interfering with updates? 
>
Also, if I remember the Postgres SR bug correctly, this kind of check that Percona provides would not have helped with
thiskind of bug. The corruption did not occur *during* replication but only if you restarted the slave because
transactionswere falsely marked as commited or non-commited when the slave came up again. You might have noticed the
corruptionearlier, though. 


> One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same
snapshot.(Which would require repeatable reads.) 
I actually think this would a need thing to have (for pre-production) test environments, like alpha or beta testing.

Jan

Re: replication consistency checking

От

Jan Lentfer

Дата:

09 июня 2015 г., 23:15:13

Am 04.06.2015 um 06:49 schrieb hydra <hydrapolic@gmail.com>:

After setting up streaming replication, is it possible to check whether the slave has the same data as the master?

In the MySQL world there is the percona-toolkit with pt-table-checksum that does this job:
https://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html

Thank you

I have not digged to deep, but as far as I can see from your link:

pt-table-checksum requires statement-based replication, and it sets binlog_format=STATEMENT on the master

And the mysql manual says

If you are using InnoDB tables and the transaction isolation level is READ COMMITTED or READ UNCOMMITTED, only row-based logging can be used.

https://dev.mysql.com/doc/refman/5.6/en/binary-log-setting.html

So, if I am not misinterpreting, this plays in a different league.

Jan

Re: replication consistency checking

От

Wei Shan

Дата:

10 июня 2015 г., 09:56:53

That's called switchover my friend...

On 10 June 2015 at 02:57, hydra <hydrapolic@gmail.com> wrote:

On Tue, Jun 9, 2015 at 6:06 AM, Wei Shan <weishan.ang@gmail.com> wrote:
Hi,

Usually, for failover, we don't do data consistency check. because it's unplanned failover! In my opinion, it's quite safe to trust pg_stat_replication because it's a binary level replication so it's hard to get row missing unless it's a bug. You got higher chance of getting corruption. However, for switchover, we will do the following;

1. Checking pg_stat_replication views
2. Have a script that do a row count on all the data tables for a period of time. We should see that the 2 database have have very similar row count and maybe a slight delay of 1s?

I know that this is not perfect, but at least it's something to start with. The only tool out that that's doing this is Oracle VeriData but I'm not too sure how they are doing this.

Cheers.

Hello,there are situation you wish to do a failover and not just because it's unplanned. For example when migrating to a better hardware.

On 6 June 2015 at 12:43, hydra <hydrapolic@gmail.com> wrote:

On Fri, Jun 5, 2015 at 4:56 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:
On Jun 5, 2015, at 8:42 AM, Igor Neyman <ineyman@perceptron.com> wrote:
>
> The problem I see with “checksum utility” is that for it to work both compared servers should be “static”: not transactions while it does its job.

Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to compare databases without interfering with updates?

One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same snapshot. (Which would require repeatable reads.)

But for an external utility, that's quite a bit harder. One suggestion which would involve minimal interruption to processing: if you have a snapshottable file system, shut down master, take snapshot, shut down replica, bring master back up, snapshot replica, bring it back up. You *still* have the issue of making sure that at the moment you take master down all changes have been streamed to replica, and the (easier) issue of making sure replica has applied them before taking its snapshot... But if you can manage to pull that off, then you can checksum & compare as much as you want, *IF* you can actually pull that off correctly ;-)

Hello,
I wasn't talking about a static check, indeed I was referring to an online tool.

I haven't read the sources for the MySQL tool yet, however it computes the checksum on the master, writes the operation to the binary log and while using statement replication, the slave computes the checksum - then those can be compared. Not all data in the table are checksummed at once, but smaller chunks are used instead.

As Igor mentioned before, that tool can also fail and is not 100%. But I suppose if it would be misbehaving, chances are the checksums will differ and you will notice it. It will probably not by accident compute the same crc/md5 whatever.

I have done some tests with it, comparing data after setting up replication, all went fine, then I changed some integer on the slave, it really computed different checkum as expected. I also did a logical dump compare, it really seems to be doing its job fine.

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
https://www.linkedin.com/in/scottribe/
(303) 722-0567 voice

--
Regards,
Ang Wei Shan

--

Regards,
Ang Wei Shan

Re: replication consistency checking

От

hydra

Дата:

11 июня 2015 г., 08:12:46

On Tue, Jun 9, 2015 at 9:20 PM, Jan Lentfer <Jan.Lentfer@web.de> wrote:

Am 06.06.2015 um 06:43 schrieb hydra <hydrapolic@gmail.com>:

On Fri, Jun 5, 2015 at 4:56 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:
On Jun 5, 2015, at 8:42 AM, Igor Neyman <ineyman@perceptron.com> wrote:
>
> The problem I see with “checksum utility” is that for it to work both compared servers should be “static”: not transactions while it does its job.

Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to compare databases without interfering with updates?

One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same snapshot. (Which would require repeatable reads.)

But for an external utility, that's quite a bit harder. One suggestion which would involve minimal interruption to processing: if you have a snapshottable file system, shut down master, take snapshot, shut down replica, bring master back up, snapshot replica, bring it back up. You *still* have the issue of making sure that at the moment you take master down all changes have been streamed to replica, and the (easier) issue of making sure replica has applied them before taking its snapshot... But if you can manage to pull that off, then you can checksum & compare as much as you want, *IF* you can actually pull that off correctly ;-)

Hello,
I wasn't talking about a static check, indeed I was referring to an online tool.

I haven't read the sources for the MySQL tool yet, however it computes the checksum on the master, writes the operation to the binary log and while using statement replication, the slave computes the checksum - then those can be compared. Not all data in the table are checksummed at once, but smaller chunks are used instead.

As Igor mentioned before, that tool can also fail and is not 100%. But I suppose if it would be misbehaving, chances are the checksums will differ and you will notice it. It will probably not by accident compute the same crc/md5 whatever.

I have done some tests with it, comparing data after setting up replication, all went fine, then I changed some integer on the slave, it really computed different checkum as expected. I also did a logical dump compare, it really seems to be doing its job fine.

I am entering this discussion a bit late, so maybe I am missing the point. But SR is using xlog and there is a crc32 checksum on each xlog record. So why would you need to compare the whole thing again when each record has been approved during replication ?

Hello Jan,

you don't have it if you don't want to. However I'd like to have the possibility to do so and thus I was asking - is it possible? Are you guys doing it?

The reasons are mentioned above, but still:

- bugs can appear anywhere,

- the bug report mentioned before also states to check the data, but does not give and hints how to do it, that's why I asked here

Reference: http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/access/xlogrecord.h;h=4ef6c206d28437ffdecdd7097eb012055ac69c7d;hb=b94085920b016e64ee40a0f6c50199889565cc56

Jan

Re: replication consistency checking

От

hydra

Дата:

11 июня 2015 г., 08:14:46

On Tue, Jun 9, 2015 at 9:47 PM, Jan Lentfer <Jan.Lentfer@web.de> wrote:

> Am 05.06.2015 um 16:56 schrieb Scott Ribe <scott_ribe@elevated-dev.com>:
>
>> On Jun 5, 2015, at 8:42 AM, Igor Neyman <ineyman@perceptron.com> wrote:
>>
>> The problem I see with “checksum utility” is that for it to work both compared servers should be “static”: not transactions while it does its job.
>
> Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to compare databases without interfering with updates?
>
Also, if I remember the Postgres SR bug correctly, this kind of check that Percona provides would not have helped with this kind of bug. The corruption did not occur *during* replication but only if you restarted the slave because transactions were falsely marked as commited or non-commited when the slave came up again. You might have noticed the corruption earlier, though.

Ok but we do restart our slaves from time to time (upgrades) so sooner or later you would discover if that would be the problem. But maybe it will discover bugs/problems that occur *during* replication.

> One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same snapshot. (Which would require repeatable reads.)
I actually think this would a need thing to have (for pre-production) test environments, like alpha or beta testing.

Jan

Re: replication consistency checking

От

hydra

Дата:

11 июня 2015 г., 08:16:33

On Tue, Jun 9, 2015 at 10:15 PM, Jan Lentfer <Jan.Lentfer@web.de> wrote:

Am 04.06.2015 um 06:49 schrieb hydra <hydrapolic@gmail.com>:

After setting up streaming replication, is it possible to check whether the slave has the same data as the master?

In the MySQL world there is the percona-toolkit with pt-table-checksum that does this job:
https://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html

Thank you

I have not digged to deep, but as far as I can see from your link:

pt-table-checksum requires statement-based replication, and it sets binlog_format=STATEMENT on the master

And the mysql manual says

If you are using InnoDB tables and the transaction isolation level is READ COMMITTED or READ UNCOMMITTED, only row-based logging can be used.

https://dev.mysql.com/doc/refman/5.6/en/binary-log-setting.html

So, if I am not misinterpreting, this plays in a different league.
Jan

The recommended binlog format for replication is ROW, so everything except the checks is using ROW and only those checks have STATEMENT set.

Re: replication consistency checking

От

hydra

Дата:

11 июня 2015 г., 08:24:15

On Wed, Jun 10, 2015 at 8:56 AM, Wei Shan <weishan.ang@gmail.com> wrote:

That's called switchover my friend...

Thanks for clarification.

So for you it's like this:

switchover - planned failover from master to slave (maybe because upgrades, better hardware)

master failover - UNplanned failover to slave because master has failed

If that is true for the whole PostgreSQL community then yes, let's call it switchover.

However needing an online tool for data verification when your master has failed doesn't give much sense anyway ;) The tool is needed when the operation is normal so you can periodically check the status.

For example Mozilla used to check the databases each 12 hours:
https://blog.mozilla.org/it/2013/12/16/upgrading-from-mysql-5-1-to-mariadb-5-5/

That is great. I have started to check at least once a day. Of course there are multiple layers of trusting the data / backups more:

- good hardware,

- having the db set with safety (not speed) if the data are critical (syncing data to disk, having data/binlog checkums),

- checking the replication consistency (because mostly we do backups from the slave?),

- periodically restoring the slave backup to the test environment (each day) so developers can check

This has worked for me so far, that's why I was asking for something similar in the PostgreSQL world.

Thanks guys for the tips / notes so far.

On 10 June 2015 at 02:57, hydra <hydrapolic@gmail.com> wrote:

On Tue, Jun 9, 2015 at 6:06 AM, Wei Shan <weishan.ang@gmail.com> wrote:
Hi,

Usually, for failover, we don't do data consistency check. because it's unplanned failover! In my opinion, it's quite safe to trust pg_stat_replication because it's a binary level replication so it's hard to get row missing unless it's a bug. You got higher chance of getting corruption. However, for switchover, we will do the following;

1. Checking pg_stat_replication views
2. Have a script that do a row count on all the data tables for a period of time. We should see that the 2 database have have very similar row count and maybe a slight delay of 1s?

I know that this is not perfect, but at least it's something to start with. The only tool out that that's doing this is Oracle VeriData but I'm not too sure how they are doing this.

Cheers.

Hello,there are situation you wish to do a failover and not just because it's unplanned. For example when migrating to a better hardware.

On 6 June 2015 at 12:43, hydra <hydrapolic@gmail.com> wrote:

On Fri, Jun 5, 2015 at 4:56 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:
On Jun 5, 2015, at 8:42 AM, Igor Neyman <ineyman@perceptron.com> wrote:
>
> The problem I see with “checksum utility” is that for it to work both compared servers should be “static”: not transactions while it does its job.

Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to compare databases without interfering with updates?

One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same snapshot. (Which would require repeatable reads.)

But for an external utility, that's quite a bit harder. One suggestion which would involve minimal interruption to processing: if you have a snapshottable file system, shut down master, take snapshot, shut down replica, bring master back up, snapshot replica, bring it back up. You *still* have the issue of making sure that at the moment you take master down all changes have been streamed to replica, and the (easier) issue of making sure replica has applied them before taking its snapshot... But if you can manage to pull that off, then you can checksum & compare as much as you want, *IF* you can actually pull that off correctly ;-)

Hello,
I wasn't talking about a static check, indeed I was referring to an online tool.

I haven't read the sources for the MySQL tool yet, however it computes the checksum on the master, writes the operation to the binary log and while using statement replication, the slave computes the checksum - then those can be compared. Not all data in the table are checksummed at once, but smaller chunks are used instead.

As Igor mentioned before, that tool can also fail and is not 100%. But I suppose if it would be misbehaving, chances are the checksums will differ and you will notice it. It will probably not by accident compute the same crc/md5 whatever.

I have done some tests with it, comparing data after setting up replication, all went fine, then I changed some integer on the slave, it really computed different checkum as expected. I also did a logical dump compare, it really seems to be doing its job fine.

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
https://www.linkedin.com/in/scottribe/
(303) 722-0567 voice

--
Regards,
Ang Wei Shan

--
Regards,
Ang Wei Shan

Re: replication consistency checking

От

Bruce Momjian

Дата:

11 июня 2015 г., 15:39:10

On Thu, Jun 11, 2015 at 07:12:36AM +0200, hydra wrote:
>     I am entering this discussion a bit late, so maybe I am missing the point.
>     But SR is using xlog and there is a crc32 checksum on each xlog record. So
>     why would you need to compare the whole thing again when each record has
>     been approved during replication ?
>
>
>
> Hello Jan,
> you don't have it if you don't want to. However I'd like to have the
> possibility to do so and thus I was asking - is it possible? Are you guys doing
> it?
>
> The reasons are mentioned above, but still:
> - bugs can appear anywhere,
> - the bug report mentioned before also states to check the data, but does not
> give and hints how to do it, that's why I asked here

The answer is "no".

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Re: replication consistency checking

От

Bruce Momjian

Дата:

11 июня 2015 г., 15:53:20

On Thu, Jun 11, 2015 at 08:39:05AM -0400, Bruce Momjian wrote:
> On Thu, Jun 11, 2015 at 07:12:36AM +0200, hydra wrote:
> >     I am entering this discussion a bit late, so maybe I am missing the point.
> >     But SR is using xlog and there is a crc32 checksum on each xlog record. So
> >     why would you need to compare the whole thing again when each record has
> >     been approved during replication ?
> >
> >
> >
> > Hello Jan,
> > you don't have it if you don't want to. However I'd like to have the
> > possibility to do so and thus I was asking - is it possible? Are you guys doing
> > it?
> >
> > The reasons are mentioned above, but still:
> > - bugs can appear anywhere,
> > - the bug report mentioned before also states to check the data, but does not
> > give and hints how to do it, that's why I asked here
>
> The answer is "no".

You are falling into a trap I see often.  There is a flaw in another
database product, and you want Postgres to fix it or monitor for it.
Postgres just doesn't have that problem to the same level.  We have
different problems, and we allocate resources to add features based on
our own problems, not the problems of other database products.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Re: replication consistency checking

От

jaime soler

Дата:

14 июня 2015 г., 15:23:51

El jue, 11-06-2015 a las 07:14 +0200, hydra escribió:
>
>
> On Tue, Jun 9, 2015 at 9:47 PM, Jan Lentfer <Jan.Lentfer@web.de>
> wrote:
>
>
>
>
>         > Am 05.06.2015 um 16:56 schrieb Scott Ribe
>         <scott_ribe@elevated-dev.com>:
>         >
>         >> On Jun 5, 2015, at 8:42 AM, Igor Neyman
>         <ineyman@perceptron.com> wrote:
>         >>
>         >> The problem I see with “checksum utility” is that for it to
>         work both compared servers should be “static”:  not
>         transactions while it does its job.
>         >
>         > Indeed, and that was brought up before and OP seems to be
>         ignoring it. What magic does MySQL (supposedly) use to compare
>         databases without interfering with updates?
>         >
>         Also, if I remember the Postgres SR bug correctly, this kind
>         of check that Percona provides would not have helped with this
>         kind of bug. The corruption did not occur *during* replication
>         but only if you restarted the slave because transactions were
>         falsely marked as commited or non-commited when the slave came
>         up again. You might have noticed the corruption earlier,
>         though.
>
>
>
> Ok but we do restart our slaves from time to time (upgrades) so sooner
> or later you would discover if that would be the problem. But maybe it
> will discover bugs/problems that occur *during* replication.

Are you or your company considering to fund "this tool"?, if yes, maybe
you get some feedback at pgsql-hackers@postgresql.org

>
>
>
>         > One could imagine a built-in feature in PG which depends on
>         using MVCC and having both sides look at the same snapshot.
>         (Which would require repeatable reads.)
>         I actually think this would a need thing to have (for
>         pre-production) test environments, like alpha or beta testing.
>
>         Jan
>
>

Re: replication consistency checking

От

hydra

Дата:

14 июня 2015 г., 21:53:11

On Thu, Jun 11, 2015 at 2:53 PM, Bruce Momjian <bruce@momjian.us> wrote:

On Thu, Jun 11, 2015 at 08:39:05AM -0400, Bruce Momjian wrote:
> On Thu, Jun 11, 2015 at 07:12:36AM +0200, hydra wrote:
> > I am entering this discussion a bit late, so maybe I am missing the point.
> > But SR is using xlog and there is a crc32 checksum on each xlog record. So
> > why would you need to compare the whole thing again when each record has
> > been approved during replication ?
> >
> >
> >
> > Hello Jan,
> > you don't have it if you don't want to. However I'd like to have the
> > possibility to do so and thus I was asking - is it possible? Are you guys doing
> > it?
> >
> > The reasons are mentioned above, but still:
> > - bugs can appear anywhere,
> > - the bug report mentioned before also states to check the data, but does not
> > give and hints how to do it, that's why I asked here
>
> The answer is "no".

You are falling into a trap I see often. There is a flaw in another
database product, and you want Postgres to fix it or monitor for it.
Postgres just doesn't have that problem to the same level. We have
different problems, and we allocate resources to add features based on
our own problems, not the problems of other database products.

I didn't bring up this topic because I would have problem with data replication consistency on MySQL. Indeed, I've been using different kinds of replication and after setting it up correctly (which took me some time), I've managed to live 3 years (while doing constant upgrades on the databases) without the data being different. It really helped me to know it's ok and many of the bugs that could have caused the data to differ could be spotted and monitored. Thankfully no data corruption occurred yet (to my best knowledge).

I've posted a real life experience of having a hard time doing a switchover and realizing something was wrong (data missing, old data present, etc.) with PostgreSQL. But it's general, not PostgreSQL specific, because:
- every software has bugs,
- replication involves software and hardware, both vulnerable to data corruption,
- data in the database is one of the most critical parts in IT.

Please don't blindly close your eyes saying this does not affect PostgreSQL. Besides, you are not checking, how can you know?

It's like and old joke:
A: "We never had a security incident"
B: "Are you monitoring it?"
A: "Err.. no."

:)

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

Re: replication consistency checking

От

hydra

Дата:

14 июня 2015 г., 21:58:25

On Sun, Jun 14, 2015 at 2:23 PM, jaime soler <jaime.soler@gmail.com> wrote:

El jue, 11-06-2015 a las 07:14 +0200, hydra escribió:
>
>
> On Tue, Jun 9, 2015 at 9:47 PM, Jan Lentfer <Jan.Lentfer@web.de>
> wrote:
>
>
>
>
> > Am 05.06.2015 um 16:56 schrieb Scott Ribe
> <scott_ribe@elevated-dev.com>:
> >
> >> On Jun 5, 2015, at 8:42 AM, Igor Neyman
> <ineyman@perceptron.com> wrote:
> >>
> >> The problem I see with “checksum utility” is that for it to
> work both compared servers should be “static”: not
> transactions while it does its job.
> >
> > Indeed, and that was brought up before and OP seems to be
> ignoring it. What magic does MySQL (supposedly) use to compare
> databases without interfering with updates?
> >
> Also, if I remember the Postgres SR bug correctly, this kind
> of check that Percona provides would not have helped with this
> kind of bug. The corruption did not occur *during* replication
> but only if you restarted the slave because transactions were
> falsely marked as commited or non-commited when the slave came
> up again. You might have noticed the corruption earlier,
> though.
>
>
>
> Ok but we do restart our slaves from time to time (upgrades) so sooner
> or later you would discover if that would be the problem. But maybe it
> will discover bugs/problems that occur *during* replication.

Are you or your company considering to fund "this tool"?, if yes, maybe
you get some feedback at pgsql-hackers@postgresql.org

Hello,

at first I wanted to know whether something is done in this area. I've only find pg_comparator:
http://www.cri.ensmp.fr/~coelho/pg_comparator/pg_comparator.html

but haven't tested yet. If I don't find anything usable I will consider your advice. But I'm just an IT admin doing my job so it's not my decision. However thanks for the idea.

>
>
>
> > One could imagine a built-in feature in PG which depends on
> using MVCC and having both sides look at the same snapshot.
> (Which would require repeatable reads.)
> I actually think this would a need thing to have (for
> pre-production) test environments, like alpha or beta testing.
>
> Jan
>
>