Обсуждение: Warm standby problems
Hi, I have one PostgreSQL 8.3 machine feeding WAL files to another PostgreSQL 8.3 machine that's running in recovery mode. However, fairly often (every few days), the standby machine breaks out of recovery mode with log messages like: 2009-10-23 21:47:40 EDT LOG: incorrect resource manager data checksum in record at 11/4E5AA18C 2009-10-23 21:47:40 EDT LOG: redo done at 11/4E5A9A40 2009-10-23 21:47:40 EDT LOG: last completed transaction was at log time 2009-10-23 21:28:36.531005-04 2009-10-23 21:47:40 EDT LOG: restored log file "00000001000000110000004E" from archive cp: cannot stat `/var/backups/postgres/wal/00000002.history': No such file or directory 2009-10-23 21:47:46 EDT LOG: selected new timeline ID: 2 cp: cannot stat `/var/backups/postgres/wal/00000001.history': No such file or directory 2009-10-23 21:47:52 EDT LOG: archive recovery complete 2009-10-23 21:47:55 EDT LOG: database system is ready to accept connections I'm pretty sure the WAL is shipped correctly (it's shipped via SCP). The only thing I can think of is that the shipping version is 8.3.7 and the consuming version is 8.3.8. Would this be enough to cause the "incorrect checksum" message? Regards, David.
"David F. Skoll" <dfs@roaringpenguin.com> writes: > I have one PostgreSQL 8.3 machine feeding WAL files to another PostgreSQL 8.3 > machine that's running in recovery mode. However, fairly often (every > few days), the standby machine breaks out of recovery mode with log messages > like: > 2009-10-23 21:47:40 EDT LOG: incorrect resource manager data checksum in record at 11/4E5AA18C > I'm pretty sure the WAL is shipped correctly (it's shipped via SCP). > The only thing I can think of is that the shipping version is 8.3.7 and > the consuming version is 8.3.8. Would this be enough to cause the > "incorrect checksum" message? No; there's no WAL change between 8.3.7 and 8.3.8. What seems more likely is that you're somehow shipping the WAL files before they're quite finished. regards, tom lane
Tom Lane wrote: > No; there's no WAL change between 8.3.7 and 8.3.8. What seems more > likely is that you're somehow shipping the WAL files before they're > quite finished. I doubt it. Our archive_command works like this: 1) scp the file over to the backup server as root. It's stored in a file named <name_of_wal>.NEW 2) When the scp finishes, ssh to the standby server and chown it to postgres. 3) Then, ssh to the standby server and rename the <name_of_wal>.NEW file to the real WAL filename. Only after step (3) does our standby script even see the file. So unless there's a possibility that the master server invokes our archive script before the WAL has finished being written, I can't see how this could happen. Regards, David.
"David F. Skoll" <dfs@roaringpenguin.com> writes: > Tom Lane wrote: >> No; there's no WAL change between 8.3.7 and 8.3.8. What seems more >> likely is that you're somehow shipping the WAL files before they're >> quite finished. > So unless there's a possibility that the master server invokes our archive > script before the WAL has finished being written, I can't see how this > could happen. What about the other direction: the script invoked by the archive returns "done" before the bits have all been shipped? PG is entirely capable of overwriting the contents of a WAL file the moment the script returns "done". regards, tom lane
Tom Lane wrote: > What about the other direction: the script invoked by the archive > returns "done" before the bits have all been shipped? Do you mean the wal_archive_command? It waits for scp to finish. It's written in Perl; here is the relevant part. Regards, David. [Stuff deleted...] my $remote_user = $failover->get_setting('backup_ssh_user'); my $remote_host = $failover->get_setting('backup_db_host'); my $remote_directory = $failover->get_setting('backup_wal_dir'); my $postgres_user = $failover->get_setting('pg_userid') || 'postgres'; my $pid = open(CHILD, '-|'); my $wal_filename_only; (undef, undef, $wal_filename_only) = File::Spec->splitpath( $wal_file ); if( $pid ) { # Parent while(<CHILD>) { chomp; canit_log( "Output from scp: $_" ); } # This should wait for scp to finish my $rc = close CHILD; my $errno = $!; my $child_error = $?; alarm(0); if( ! $rc ) { if( $errno ) { canit_warning("Error closing pipe from scp: $errno"); exit EXIT_FAILURE; } elsif ($child_error & 127) { canit_warning("scp died with signal " . ($child_error & 127)); exit EXIT_FAILURE; } else { canit_warning("scp exited with non-zero exit code " . ($child_error >> 8)); exit EXIT_FAILURE; } } my $exit_status = EXIT_OK; if (system('ssh', "$remote_user\@$remote_host", "chown $postgres_user $remote_directory/$wal_filename_only.NEW") != 0){ canit_warning("ssh command to chown remote file failed: $?"); $exit_status = EXIT_FAILURE; } if (system('ssh', "$remote_user\@$remote_host", "mv -f $remote_directory/$wal_filename_only.NEW $remote_directory/$wal_filename_only")!= 0) { canit_warning("ssh command to rename remote file failed: $?"); $exit_status = EXIT_FAILURE; } if ($exit_status == EXIT_OK) { canit_log("Successully archived WAL file $wal_filename_only on $remote_host"); } exit $exit_status; } # In the child: # Make sure we capture STDERR open(STDERR, '>&STDOUT'); # In new scope to suppress warning about code after an exec - see # 'perldoc exec' for details { no warnings 'exec'; exec 'scp', '-q', $wal_file, "$remote_user\@$remote_host:$remote_directory/$wal_filename_only.NEW"; }
Hi, In an effort to track down the problem, I switched to using rsync rather than scp to copy the files. I also take the SHA1 hash on each end, and have my archiving script exit with a non-zero status if there's a mismatch. Sure enough: Oct 27 14:26:35 colo2vs1 canit-failover-wal-archive[29118]: Warning: rsync succeeded, but local_sha1 1fe9fc62b2a05d21530decac1c5442969adc5819 != remote_sha1 4f9f8bcd151129db64acd05470f0f05954b56232 !! This is a "can't happen" situation, so I have to investigate bugs in rsync, ssh, the kernel, the network, the disk.... bleah. But I'm pretty sure it's not a PostgreSQL problem. (My script exits with non-zero status if the SHA1s mismatch, and PostgreSQL re-archives the WAL a short time later, and that succeeds, so I'm happy for now.) Regards, David.
"David F. Skoll" <dfs@roaringpenguin.com> wrote: > (My script exits with non-zero status if the SHA1s mismatch, and > PostgreSQL re-archives the WAL a short time later, and that > succeeds, so I'm happy for now.) Just out of curiosity, could you show us the non-comment portions of your postgresql.conf file? -Kevin
Kevin Grittner wrote: > Just out of curiosity, could you show us the non-comment portions of > your postgresql.conf file? Sure! Here they are. Regards, David. ============================================================================= data_directory = '/var/lib/postgresql/8.3/main' hba_file = '/etc/postgresql/8.3/main/pg_hba.conf' ident_file = '/etc/postgresql/8.3/main/pg_ident.conf' external_pid_file = '/var/run/postgresql/8.3-main.pid' port = 5432 max_connections = 100 unix_socket_directory = '/var/run/postgresql' ssl = true shared_buffers = 24MB max_fsm_pages = 500000 archive_mode = on archive_command = '/usr/bin/wal_archive_command.pl %p' archive_timeout = 240min log_line_prefix = '%t ' autovacuum = off datestyle = 'iso, mdy' lc_messages = 'C' lc_monetary = 'C' lc_numeric = 'C' lc_time = 'C' default_text_search_config = 'pg_catalog.english'
"David F. Skoll" <dfs@roaringpenguin.com> writes: > Sure enough: > Oct 27 14:26:35 colo2vs1 canit-failover-wal-archive[29118]: > Warning: rsync succeeded, but local_sha1 1fe9fc62b2a05d21530decac1c5442969adc5819 > != remote_sha1 4f9f8bcd151129db64acd05470f0f05954b56232 !! > This is a "can't happen" situation, so I have to investigate bugs in rsync, > ssh, the kernel, the network, the disk.... bleah. > But I'm pretty sure it's not a PostgreSQL problem. > (My script exits with non-zero status if the SHA1s mismatch, and PostgreSQL > re-archives the WAL a short time later, and that succeeds, so I'm happy > for now.) So, when it archives successfully the second time, which if either of the two mismatched sha1's proves to have been correct? (I'm still wondering about the possibility that the WAL file is changing underneath you ...) regards, tom lane
Tom Lane wrote: > So, when it archives successfully the second time, which if either of > the two mismatched sha1's proves to have been correct? The one on the master server (lines wrapped for readability). "local" refers to the master server, and "remote" to the standby server. Oct 27 14:26:35 colo2vs1 canit-failover-wal-archive[29118]: Warning: rsync succeeded, but local_sha1 1fe9fc62b2a05d21530decac1c5442969adc5819 != remote_sha1 4f9f8bcd151129db64acd05470f0f05954b56232 !! Oct 27 14:26:52 colo2vs1 canit-failover-wal-archive[29161]: local_sha1 == remote_sha1 (1fe9fc62b2a05d21530decac1c5442969adc5819) Oct 27 14:26:53 colo2vs1 canit-failover-wal-archive[29161]: Successfully archived WAL file 000000010000001200000058 on colo3vs1.roaringpenguin.com However, the sha1 is taken after rsync exits, so it's unknown what the local sha1 actually was while the rsync was running. Maybe the sha1 really was 4f9... on the master server until something changed the file. > (I'm still wondering about the possibility that the WAL file is changing > underneath you ...) Well, can PostgreSQL change the WAL file while the archive_command is running? I can't see anything on the backup server end changing that copy. Regards, David.
"David F. Skoll" <dfs@roaringpenguin.com> wrote: > shared_buffers = 24MB You should probably set that higher. Different workloads favor different settings, but if you've got any reasonable amount of RAM for a modern machine, somewhere between 256MB and one-third of total RAM is usually best. > archive_command = '/usr/bin/wal_archive_command.pl %p' It would probably be safer to pass in %f, too, and use it for the file name rather than plucking off the last portion of the %p -- at a minimum it might reduce the chances that you'd use that technique in restore_command, where it flat out won't work. > autovacuum = off Be very careful if you choose to do this -- you've just made yourself responsible for preventing bloat which can slowly strangle performance. There are a few setting which should almost always be overridden for production use. You might want to read this for a start: http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server None of those, however, seem likely to cause the problem you describe. Back to blaming the OS. On the other hand, it's possible that tuning these up might change the timings such that you don't hit the OS bug so often. -Kevin
Kevin Grittner wrote: >> shared_buffers = 24MB > You should probably set that higher. Nah. This machine is totally bored; tweaking PostgreSQL would be pointless since it's so under-utilized. >> archive_command = '/usr/bin/wal_archive_command.pl %p' > It would probably be safer to pass in %f, too, and use it for the file > name rather than plucking off the last portion of the %p -- at a > minimum it might reduce the chances that you'd use that technique in > restore_command, where it flat out won't work. Nah. Our Perl code is pretty robust; we use the File::Spec module to split apart path names and that's safe. The restore_command is robust (wrapped): restore_command = '/usr/share/canit/failover/libexec/canit-failover-pg-standby.pl \ -s 2 -t /var/backups/postgres/initiate-failover /var/backups/postgres/wal/ %f %p %r' and well-tested. >> autovacuum = off > Be very careful if you choose to do this -- you've just made yourself > responsible for preventing bloat which can slowly strangle > performance. As I said, the server is completely under-utilized. Nightly vacuum takes about 7 minutes: Oct 26 2009 21:56:53 DETAIL: A total of 29184 page slots are in use (including overhead). Oct 26 2009 21:56:53 29184 page slots are required to track all free space. > There are a few setting which should almost always be overridden for > production use. You might want to read this for a start: I doubt any of those settings is causing the problem. The one setting that I *do* suspect is the archive_timeout. We don't have that set on other systems, and they do not exhibit the problem. I tried looking at the PostgreSQL source to see if a log switch requested because of archive_timeout could somehow result in the WAL file being changed during execution of archive_command, but I'm completely unfamiliar with the Pg source code and quickly got lost. :-( I'm going to leave the system for a few days and see how many SHA1 mismatches I get. Then I'm going to disable archive_timeout and see if that changes anything. Regards, David.
"David F. Skoll" <dfs@roaringpenguin.com> writes: > Tom Lane wrote: >> So, when it archives successfully the second time, which if either of >> the two mismatched sha1's proves to have been correct? > The one on the master server (lines wrapped for readability). > However, the sha1 is taken after rsync exits, so it's unknown what > the local sha1 actually was while the rsync was running. Be worth taking the sha1 first to see if you get a different answer about that, or maybe even doing it twice. >> (I'm still wondering about the possibility that the WAL file is changing >> underneath you ...) > Well, can PostgreSQL change the WAL file while the archive_command is > running? It's not supposed to... regards, tom lane