Обсуждение: Setting up a warm standby server - some questions
Hi, I'm trying to setup a warm standby server. But there are some things that I don't yet understand. I'm testing this on my windows box to understand the concepts and while waiting for the real hardware ;) I got the basic setup working, and a simulated failover was working, but some things are unclear to me: What is the recommended way to switch back to the primary, once that is up and running again? Especially assuming that data has been changed on the secondar? Can I do a base backup of the data directory (after running pg_start_backup()) from the secondary to the primary? What is the recommended way to get the secondary back into the "standby" mode? Currently I simply shut it down, rename recovery.done to recovery.conf and start it up again. Is this allright? In the logfile that is written by pg_standby I can see some "errors" and I'm not sure if that's OK. My recovery.conf looks like this: restore_command = 'c:\Projects\pgtest\standby\server\bin\pg_standby.exe -l -d -s 2 -k 50 -t c:\Projects\pgtest\standby\pgsql.triggerc:\Projects\pgtest\archive %f %p %r >>c:\Projects\pg\standby\standby.log 2>&1' Which is essentially copied from the manual and seems to be working. However in the standby.log I can see messages like this: Command for restore : copy "c:\Projects\pg\archive\00000003.history" "pg_xlog\RECOVERYHISTORY" Keep archive history : 000000000000000000000000 and later running restore :The system cannot find the file specified. The system cannot find the file specified. The system cannot find the file specified. The system cannot find the file specified. not restored history file not found The "cannot find the file" or "history file not found" do not look right to me. Although I also see "success" messages in there, like: running restore : 1 file(s) copied. OK So what should I do about the error messages? I have also problems shutting down the secondary server while it is in "standby" (i.e. recovery) mode If I use "pg_ctl stop" it never shuts down. If I use "pg_ctl stop -m immediate" most of the postgres processes terminate, but the one that spawned the pg_standby processwill linger around. What am I missing here? Thanks a lot in advance Thomas
Hello, I could solve one of the questions myself :) > I have also problems shutting down the secondary server while it is in > "standby" (i.e. recovery) mode This works fine when using "-m fast" instead of "-m immediate". I would still like an answer on these questions: > What is the recommended way to switch back to the primary, once that is > up and running again? > What is the recommended way to get the secondary back into the "standby" > mode? As we are trying to minimize the possible data loss, I'm looking at the archive_timeout setting (I'm currently evaluatingour possibilities with 8.4) The manual at http://www.postgresql.org/docs/current/static/continuous-archiving.html#RECOVERY-CONFIG-SETTINGS states: "It is therefore unwise to set a very short archive_timeout — it will bloat your archive storage. archive_timeout settings of a minute or so are usually reasonable" But on the other hand http://www.postgresql.org/docs/current/static/warm-standby.html states: "The length of the window of data loss can be limited by use of the archive_timeout parameter, which can be set as low as a few seconds if required" but kind of restrains itself right after that by stating: "However such low settings will substantially increase the bandwidth requirements for file shipping" My question is: is that the bandwidth between primary and standby? If the archive is stored on a differend harddisk (or storgae system) as the data directory, I'd reckon it wouldn't have muchimpact on the primary server. Or am I missing something? Regards Thomas
On Fri, 2010-07-16 at 15:06 +0200, Thomas Kellerer wrote: > Hello, > > > I could solve one of the questions myself :) > > I have also problems shutting down the secondary server while it is in > > "standby" (i.e. recovery) mode > > This works fine when using "-m fast" instead of "-m immediate". fast is the preferred way. immediate is the sledghammer. > > I would still like an answer on these questions: > > > What is the recommended way to switch back to the primary, once that is > > up and running again? You need to do a new base backup from secondary to primary and ship the wal files back from secondary to primary. Once you are ready to switch, switch back to it the same way that you switched to the the secondary. For the new base backup, you'll probably want to use something like rsync that does a differential backup so you don't have to move all the blocks that are still the same. But that depends on how big your DB is, how much churn there is, and how long the primary has been out. > > What is the recommended way to get the secondary back into the "standby" > > mode? Same thing as above. After switching back - new base backup from primary to secondary (which should involve a very small differential of base files), and continue streaming back to the secondary. Should all be very quick if it is automated. > > As we are trying to minimize the possible data loss, I'm looking at the archive_timeout setting (I'm currently evaluatingour possibilities with 8.4) > > The manual at > http://www.postgresql.org/docs/current/static/continuous-archiving.html#RECOVERY-CONFIG-SETTINGS > states: > > "It is therefore unwise to set a very short archive_timeout — it will bloat your archive storage. > archive_timeout settings of a minute or so are usually reasonable" > > But on the other hand > http://www.postgresql.org/docs/current/static/warm-standby.html > states: > > "The length of the window of data loss can be limited by use of the archive_timeout parameter, which can > be set as low as a few seconds if required" > > but kind of restrains itself right after that by stating: > > "However such low settings will substantially increase the bandwidth requirements for file shipping" > > My question is: is that the bandwidth between primary and standby? Yes. It means that you are going to be shipping a lot of files from the primary to secondary. If you set the timeout to 1 second, you will be shipping 960MB a minute (1 16MB wal segment per second). There is a utility to reduce the size to the wal files - pg_lesslog http://pglesslog.projects.postgresql.org/ I have no experience with it though. I'm always a bit hesitant when it comes to using 3rd party tools to manipulate something as critical as wal files. The folks that wrote it (NTT) are solid though. > If the archive is stored on a differend harddisk (or storgae system) as the data directory, I'd reckon it wouldn't havemuch impact on the primary server. Or am I missing something? It shouldn't have much impact in this case. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp.