Обсуждение: postgres hot-standby questions.
Hello everyone, Two questions, grateful for any feedback anyone can share. They relate to switchover between master and hot-standby. 1. What exactly is the behaviour of the master/primary during shutdown when it has a hot standby? https://wiki.postgresql.org/wiki/Hot_Standby http://www.postgresql.org/docs/current/static/hot-standby.html If I shut down the master, then afterwards when it is finished, I shut down the standby, will they contain identical logicaldatabases & WAL records, e.g. assuming possible network failure? i.e. - Is the primary shutdown delayed until it has received notice from the standby that all WAL has been received? ( my guessis: no) - Is the primary shutdown delayed until it has received notice from the standby that all WAL has been applied? ( my guessis: no) - Can the primary generate any new WAL (e.g. checkpoint) or logical data changes during the shutdown process that might notbe sent to the standby? (my guess is: no) For example, https://vibhorkumar.wordpress.com/2014/06/30/switchoverswitchback-in-postgresql-9-3/ "With this patch, the walsender process tries to send all outstanding WAL records to the standby in replication when theuser shuts down the master." "tries"? That page also makes it seem like you have to manually check the WAL status. (section 2) Is there any way to make the primary's completion of shutdown automatically synchronous with completion of WAL on standby(s)? 2. Let's assume for the moment I have some crazy reason to prefer to avoid rsync where possible, such as its historical weirdbehaviour on HFS+ filesystems or its present unreliable heuristic for syncing hard-links or its default behaviour ofnot using checksums. Can a controlled switchover (not failover) be built on wal_keep_segments alone? wal_keep_segments could be set to a fairly high number, maybe 10000 (160GB) to allow standby catchup even after a day. Let'sassume here than 99% of maintenance takes less than an hour, and that we're keeping rsync as a fallback for the worstcase. e.g. here's the whole switchover process... shutdown A (Master) shutdown B (Standby) (A and B should be identical in terms of WAL and logical data at this point). swap M/S configurations around start B (Master) do some work on A for an hour finish work on A start A (Standby) A catches up with B from wal_keep_segments. shutdown B (Master) shutdown A (Standby) (A and B should be identical in terms of WAL and logical data at this point). swap M/S configurations around start A (Master) start B (Standby) Graeme Bell
On Mar 26, 2015, at 8:17 AM, Graeme B. Bell <grb@skogoglandskap.no> wrote: > > shutdown A (Master) > shutdown B (Standby) > (A and B should be identical in terms of WAL and logical data at this point). > swap M/S configurations around > start B (Master) First, note that you don't actually have to shut down B and swap master/slave configs. You can use trigger_file in recovery.conf. In general, with async replication, you don't know that every last transaction is replicated at any moment. (And of coursewith synch replication, you add points of failure.) I don't actually know the answer to your questions, because inmy use, the connection between server & replica is high-bandwidth low-latency. My routine is to shut down all servicesthat access the db, then shut down the db. The progression of service shutdown pretty much guarantees that if theconnection is up, the replica is up to date well before the master is shut down. So all I have to do is, after accessis shut down use one of the many methods to check replication lag, just as a double-check that replication was working. It does seem to me it would be a good idea to 1) document explicitly in 17.5 "Shutting Down the Server" what happens withasync replication (I suspect all your guesses are correct) and 2) provide some option to wait for replication to finish. The thing about 2 is that we probably don't want it to be automatic or default, because the nature of asynch replicas isthat it's not uncommon to have ones far away, over slower less-reliable links, which may take an unknown amount of timeto come up to date. It's not uncommon to want to bring the master down for a point update or some config change, andimmediately bring it back up, in which case we don't want the shutdown delayed. It's also not uncommon to have multiplereplicas, some close by that should probably be always up-to-date, and some far away for protection against big disasters,which may lag. So the "wait for replica" option to shut down would need a way to specify *which* replica/s we wantedto wait for. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ https://www.linkedin.com/in/scottribe/ (303) 722-0567 voice
> > Hello everyone, > > Two questions, grateful for any feedback anyone can share. They relate to > switchover between master and hot-standby. > > > 1. What exactly is the behaviour of the master/primary during shutdown > when it has a hot standby? > > https://wiki.postgresql.org/wiki/Hot_Standby > http://www.postgresql.org/docs/current/static/hot-standby.html > > If I shut down the master, then afterwards when it is finished, I shut > down the standby, will they contain identical logical databases & WAL > records, e.g. assuming possible network failure? > > i.e. > - Is the primary shutdown delayed until it has received notice from the > standby that all WAL has been received? ( my guess is: no) > - Is the primary shutdown delayed until it has received notice from the > standby that all WAL has been applied? ( my guess is: no) > - Can the primary generate any new WAL (e.g. checkpoint) or logical data > changes during the shutdown process that might not be sent to the standby? > (my guess is: no) > > For example, > https://vibhorkumar.wordpress.com/2014/06/30/switchoverswitchback-in-postgresql-9-3/ > "With this patch, the walsender process tries to send all outstanding WAL > records to the standby in replication when the user shuts down the > master." > "tries"? > > That page also makes it seem like you have to manually check the WAL > status. (section 2) > Is there any way to make the primary's completion of shutdown > automatically synchronous with completion of WAL on standby(s)? What you used for you repltación? aplication u other. > > 2. Let's assume for the moment I have some crazy reason to prefer to avoid > rsync where possible, such as its historical weird behaviour on HFS+ > filesystems or its present unreliable heuristic for syncing hard-links or > its default behaviour of not using checksums. Can a controlled switchover > (not failover) be built on wal_keep_segments alone? > > wal_keep_segments could be set to a fairly high number, maybe 10000 > (160GB) to allow standby catchup even after a day. Let's assume here than > 99% of maintenance takes less than an hour, and that we're keeping rsync > as a fallback for the worst case. > > e.g. here's the whole switchover process... > > shutdown A (Master) > shutdown B (Standby) > (A and B should be identical in terms of WAL and logical data at this > point). > swap M/S configurations around > start B (Master) > > do some work on A for an hour > finish work on A > > start A (Standby) > A catches up with B from wal_keep_segments. > > > shutdown B (Master) > shutdown A (Standby) > (A and B should be identical in terms of WAL and logical data at this > point). > swap M/S configurations around > start A (Master) > start B (Standby) > Remeber init the slave igual firt moment. > Graeme Bell > > > > > -- > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-admin--- > This message was processed by Kaspersky Mail Gateway 5.6.28/RELEASE > running at host imx2.etecsa.cu > Visit our web-site: <http://www.kaspersky.com>, <http://www.viruslist.com> > Saludos, Gilberto Castillo La Habana, Cuba --- This message was processed by Kaspersky Mail Gateway 5.6.28/RELEASE running at host imx3.etecsa.cu Visit our web-site: <http://www.kaspersky.com>, <http://www.viruslist.com>
On 3/26/2015 10:17 AM, Graeme B. Bell wrote: > Hello everyone, > > Two questions, grateful for any feedback anyone can share. They relate to switchover between master and hot-standby. > > > 1. What exactly is the behaviour of the master/primary during shutdown when it has a hot standby? > > https://wiki.postgresql.org/wiki/Hot_Standby > http://www.postgresql.org/docs/current/static/hot-standby.html > > If I shut down the master, then afterwards when it is finished, I shut down the standby, will they contain identical logicaldatabases & WAL records, e.g. assuming possible network failure? > > i.e. > - Is the primary shutdown delayed until it has received notice from the standby that all WAL has been received? ( myguess is: no) > - Is the primary shutdown delayed until it has received notice from the standby that all WAL has been applied? ( myguess is: no) > - Can the primary generate any new WAL (e.g. checkpoint) or logical data changes during the shutdown process that mightnot be sent to the standby? (my guess is: no) > > For example, https://vibhorkumar.wordpress.com/2014/06/30/switchoverswitchback-in-postgresql-9-3/ > "With this patch, the walsender process tries to send all outstanding WAL records to the standby in replication when theuser shuts down the master." > "tries"? > > That page also makes it seem like you have to manually check the WAL status. (section 2) > Is there any way to make the primary's completion of shutdown automatically synchronous with completion of WAL on standby(s)? In synchronous streaming replication, the master will wait for all pending transactions to complete before stopping. It willnot accept new connections while a shutdown is pending. > > 2. Let's assume for the moment I have some crazy reason to prefer to avoid rsync where possible, such as its historicalweird behaviour on HFS+ filesystems or its present unreliable heuristic for syncing hard-links or its default behaviourof not using checksums. Can a controlled switchover (not failover) be built on wal_keep_segments alone? > > wal_keep_segments could be set to a fairly high number, maybe 10000 (160GB) to allow standby catchup even after a day.Let's assume here than 99% of maintenance takes less than an hour, and that we're keeping rsync as a fallback for theworst case. > > e.g. here's the whole switchover process... > > shutdown A (Master) > shutdown B (Standby) > (A and B should be identical in terms of WAL and logical data at this point). > swap M/S configurations around > start B (Master) > > do some work on A for an hour > finish work on A > > start A (Standby) > A catches up with B from wal_keep_segments. > > > shutdown B (Master) > shutdown A (Standby) > (A and B should be identical in terms of WAL and logical data at this point). > swap M/S configurations around > start A (Master) > start B (Standby) If you want then to promote the standby to master, you need as others have noted, a trigger file placed on the standby. Atthat point, your new primary will absolutely be up to date with all transactions. That's the main reason I really like synchronous replication over asynchronous. The problemsonly arise when your standby fails as the current primary cannot commit a transaction and is thus somewhat unresponsive. A second hot standby will solve that problem providing bothhot standbys do not go offline simultaneously. >
On 26 Mar 2015, at 16:07, Scott Ribe <scott_ribe@elevated-dev.com> wrote: > On Mar 26, 2015, at 8:17 AM, Graeme B. Bell <grb@skogoglandskap.no> wrote: >> >> shutdown A (Master) >> shutdown B (Standby) >> (A and B should be identical in terms of WAL and logical data at this point). >> swap M/S configurations around >> start B (Master) > > First, note that you don't actually have to shut down B and swap master/slave configs. You can use trigger_file in recovery.conf. Thanks Simon, that's a good suggestion. Do you know if using the trigger file makes any change to the DB or WAL which couldn't be replicated on the master when itcomes back up? Actually, it's probably worth mentioning a bit more information about my use case, for anyone reading. I am concerned about being able to do a clean, safe, quick swap *and soon after, a quick/clean/safe swap back again* withouthaving to checksum the entire DB (potentially TBs of data) to be certain I'm not breaking something or losing somethingon the return trip. Typically we have these big DBs but we want to make some little change which only takes 10-30 minutes. This is something that pg_rewind may help to address in 9.5, but I'd like to be able to do it now and we can't risk usingpg_rewind in our production environment at present. Actually even with pg_rewind we have the tricky problem that a user could have a transaction accepted on the master, whichnever reaches the slave if you failover without a synchronous shutdown process. Such transactions could be rewound onthe master DB with pg_rewind to allow it to become a slave, but we can't exactly email the users to let them know theircompleted transaction wasn't durable after all. > In general, with async replication, you don't know that every last transaction is replicated at any moment. (And of coursewith synch replication, you add points of failure.) With fully synchronous replication the performance issue is a real killer for us. I suppose what I'm wondering is about async normal operation, but sync operation during shutdown. It might seem a strange compromise but in practice something like 95% of the time when the server goes down it's a controlleddowntime not a failure. So we would benefit from sync during controlled shutdown often but almost never duringordinary operation. So the cost/benefit for fully sync operation is terrible for us but the cost/benefit of sync-on-shutdownis huge. I could try to script it using e.g. pg_last_xlog_receive_location, pg_last_xlog_replay_location, but that doesn't help somuch when the DB decides to complete shutdown while the packet is still on the wire, and the packet goes missing. > I don't actually know the answer to your questions, because in my use, the connection between server & replica is high-bandwidthlow-latency. My routine is to shut down all services that access the db, then shut down the db. The progressionof service shutdown pretty much guarantees that if the connection is up, the replica is up to date well beforethe master is shut down. So all I have to do is, after access is shut down use one of the many methods to check replicationlag, just as a double-check that replication was working. > > It does seem to me it would be a good idea to 1) document explicitly in 17.5 "Shutting Down the Server" what happens withasync replication (I suspect all your guesses are correct) and 2) provide some option to wait for replication to finish. > > The thing about 2 is that we probably don't want it to be automatic or default, because the nature of asynch replicas isthat it's not uncommon to have ones far away, over slower less-reliable links, which may take an unknown amount of timeto come up to date. It's not uncommon to want to bring the master down for a point update or some config change, andimmediately bring it back up, in which case we don't want the shutdown delayed. It's also not uncommon to have multiplereplicas, some close by that should probably be always up-to-date, and some far away for protection against big disasters,which may lag. So the "wait for replica" option to shut down would need a way to specify *which* replica/s we wantedto wait for. I agree, a non-default option to do sync-shutdown sounds like what I'm thinking of. Good points about the multi-standby problem. Graeme Bell
On Mar 26, 2015, at 9:43 AM, Graeme B. Bell <grb@skogoglandskap.no> wrote: > > It might seem a strange compromise Not at all. Sounds like a good feature to me. It's just that a simple "wait for replicas" option is not all that useful;there's more to think about. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ https://www.linkedin.com/in/scottribe/ (303) 722-0567 voice
On Mar 26, 2015, at 9:43 AM, Graeme B. Bell <grb@skogoglandskap.no> wrote: > > Do you know if using the trigger file makes any change to the DB or WAL which couldn't be replicated on the master whenit comes back up? No. It just signals PG to come out of continuous recovery mode and start allowing normal (read/write) operations. It alsomoves recovery.conf out of the way (to recovery.done) so that if PG is restarted, it does NOT go back into recovery modeand start trying to get updates from the former master. > I could try to script it using e.g. pg_last_xlog_receive_location, pg_last_xlog_replay_location, but that doesn't helpso much when the DB decides to complete shutdown while the packet is still on the wire, and the packet goes missing. Yeah, that mostly tells you if your network is faster than your disk. (OK, not really, but you obviously know what I mean...)But try comparing pg_current_xlog_location on the master to pg_last_xlog_receive_location on the replica. You couldalso look into using the pg_stat_replication view. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ https://www.linkedin.com/in/scottribe/ (303) 722-0567 voice
> With fully synchronous replication the performance issue is a real killer for us. > I suppose what I'm wondering is about async normal operation, but sync operation during shutdown. Replying to myself - but a further thought - Assuming that you have your server configured with 1 or more hot standbys. Are there situations where it is smart & sane to allow a controlled (slow, not emergency) shutdown to complete asynchronouslywithout knowing if any standby got the last bits of wal? Should a fully async shutdown with a connected standby ever happen? You've no guarantee the standbys are up to date. Since the server is shutdown, you've no way to ask it and check withoutlooking at the filesystem directly. I guess the problem with "eventually consistent" is that during shutdown you know for sure that 'eventually' isn't ever goingto happen unless you make it happen. Graeme.
On Mar 26, 2015, at 10:08 AM, Graeme B. Bell <grb@skogoglandskap.no> wrote: > > Assuming that you have your server configured with 1 or more hot standbys. > Are there situations where it is smart & sane to allow a controlled (slow, not emergency) shutdown to complete asynchronouslywithout knowing if any standby got the last bits of wal? Sure. Point updates. Shut down 9.4.0. Immediately re-start 9.4.1 on the same cluster. OS security update, reboot server, PG shuts down and re-starts on boot. Etc. I don't care in those cases what the state of the network and my remote replicas are. The replication is interrupted briefly,and then resumes. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ https://www.linkedin.com/in/scottribe/ (303) 722-0567 voice
On Mar 26, 2015, at 10:16 AM, Graeme B. Bell <grb@skogoglandskap.no> wrote: > > >> But try comparing pg_current_xlog_location on the master to pg_last_xlog_receive_location on the replica. You could alsolook into using the pg_stat_replication view. >> > > > This becomes challenging after the master is shutdown... Right. Thus the need for the sequence I mentioned: shut down all services accessing the database, check the status, shutdown the master... -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ https://www.linkedin.com/in/scottribe/ (303) 722-0567 voice
On Mar 26, 2015, at 10:40 AM, Graeme B. Bell <grb@skogoglandskap.no> wrote: > > I could be missing something obvious, but how does this stop e.g. checkpoints, autovacuum etc from occuring and generatingnew wal that the standby might like? It seems like stopping services accessing the DB is likely to cause a lengthyautovacuum to begin, for example. OK, fair point. But do you care? I care that all committed transactions are on the replica. I don't care if they're still in WAL or written to the backingfiles. Actually, think about that question, you don't either--WAL gets copied to the replica, then moved to the backingfiles. Checkpoints don't affect replication as far as I know. And autovacuum? Couldn't care less if its effects arereplicated yet; when the replica becomes master, autovacuum will run on it. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ https://www.linkedin.com/in/scottribe/ (303) 722-0567 voice
On 26 Mar 2015, at 17:18, Scott Ribe <scott_ribe@elevated-dev.com> wrote: > On Mar 26, 2015, at 10:08 AM, Graeme B. Bell <grb@skogoglandskap.no> wrote: >> >> Assuming that you have your server configured with 1 or more hot standbys. >> Are there situations where it is smart & sane to allow a controlled (slow, not emergency) shutdown to complete asynchronouslywithout knowing if any standby got the last bits of wal? > > Sure. > > Point updates. Shut down 9.4.0. Immediately re-start 9.4.1 on the same cluster. > OS security update, reboot server, PG shuts down and re-starts on boot. Thanks for the examples, they should have been obvious to me. You're right, because the master isn't going to throw away wal, it'll be kept due to wal_keep_segments, or 9.4 replicationslots or whatever, and since the master *does* come directly back up in this scenario, 'eventually' will indeedarrive. Graeme Bell
>>> >>> But try comparing pg_current_xlog_location on the master to pg_last_xlog_receive_location on the replica. You could alsolook into using the pg_stat_replication view. >>> >> >> >> This becomes challenging after the master is shutdown... > > Right. Thus the need for the sequence I mentioned: shut down all services accessing the database, check the status, shutdown the master... I could be missing something obvious, but how does this stop e.g. checkpoints, autovacuum etc from occuring and generatingnew wal that the standby might like? It seems like stopping services accessing the DB is likely to cause a lengthyautovacuum to begin, for example. Graeme Bell
On 26 Mar 2015, at 17:48, Scott Ribe <scott_ribe@elevated-dev.com> wrote: > On Mar 26, 2015, at 10:40 AM, Graeme B. Bell <grb@skogoglandskap.no> wrote: >> >> I could be missing something obvious, but how does this stop e.g. checkpoints, autovacuum etc from occuring and generatingnew wal that the standby might like? It seems like stopping services accessing the DB is likely to cause a lengthyautovacuum to begin, for example. > > OK, fair point. But do you care? Yes, because I think it will prevent me from doing the fast 'swap back' which I mentioned in my previous emails, which isthe entire point of this. I don't want to have to do an rsync --checksum on terabytes of data after a 10 minute periodof downtime to be certain I can safely bring my normal master back into play, if I can trade that away for e.g. 1 secondof synchronous shutdown time. I want to be very certain that postgres hasn't produced one last bit of WAL that the standby hasn't caught, because if ithas, I won't be able to directly apply the (promoted) standby's new WAL entries over the top of it. I also don't want tostart playing guessing games about whether I can delete a WAL segment or not because that might lead to srious accidentalbrain damage of the DB. It won't be till 9.5 is well tested that I consider that kind of stuff. > I care that all committed transactions are on the replica. I don't care if they're still in WAL or written to the backingfiles. Actually, think about that question, you don't either--WAL gets copied to the replica, then moved to the backingfiles. Checkpoints don't affect replication as far as I know. Let's say the standby drops off the network some time before shutdown. A checkpoint or autovacuum might generate a smallchange/entry in WAL (I don't know this for sure regarding autovacuum; this is a worst case assumption). That will preventthe master from being suitable for use when I come to switch back again. The same problem occurs with any normaltransaction that occurs in that timeframe too, on servers where it's harder to take down all the services that mightconnect. I would like my master server to be able to wait or warn me that the standby isn't complete, during shutdown. Maybe the logicalreplication slots service can help with that. Anyway this may seem a bit theoretical so here's a practical example that I know will cause this effect: Let's imagine someone follows your advice but is already running a PITR archive with archive_timeout. The recommended timeoutis 1 minute. Every minute their server generates a new WAL segment. They start turning off services. Once everything is turned off, they shutdown the master assuming that the standby is upto date since there has been no server activity. However, depending on the time during the minute that they do it at (anddepending on e.g. the network speed - 16MB is a big chunk of data for a WAN), there's a pretty good chance that the serverhas a new WAL file and perhaps also modifications to the raw files on the disk, that make it unsuitable as a ready-to-gostandby for the newly promoted master. And that's in the best case, where there are no connections to the server ongoing or being attempted. Graeme Bell p.s. Here's a troubling situation (unrelated to my original question, but with some analogous problems) 1. Let's assume you run a standby and a PITR recovery backup. 2. Power interruption knocks out half the building. Network switches are offline, master is still commiting WAL to disk andto its PITR area. It cannot reach the standby. 3. Master goes down. 4. Standby is promoted manually or automatically since it is still reachable. 5. PITR and master come back online. Standby copies it's version of events into the PITR area. However, this version of eventsruns parallel to the WAL the master managed to commit. 6. What happens when you try to recover from the WAL in the PITR? - If we clean the PITR to make the standby able to start writing its new timeline there, we lose some transactions. - If we don't clean the PITR, we have 2 branches of history in our PITR archive that both claim to follow the last WAL entrythat was duplicated to the standby.
On Mar 26, 2015, at 12:17 PM, Graeme B. Bell <grb@skogoglandskap.no> wrote: > > ...I won't be able to directly apply the (promoted) standby's new WAL entries over the top of it. I see--there's our difference. When I do this, I am willing to stay on the standby for a while if need be. > A checkpoint or autovacuum might generate a small change/entry in WAL (I don't know this for sure regarding autovacuum;this is a worst case assumption). I would think autovacuum would have to, since it writes some changes to at least index pages. > Let's imagine someone follows your advice but is already running a PITR archive with archive_timeout. The recommended timeoutis 1 minute. Every minute their server generates a new WAL segment. Yeah, I'm always assuming streaming replication. If you know you have a delay in replication, you'd better remember that;-) -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ https://www.linkedin.com/in/scottribe/ (303) 722-0567 voice
> On Mar 26, 2015, at 12:42 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote: > > On Mar 26, 2015, at 12:17 PM, Graeme B. Bell <grb@skogoglandskap.no> wrote: >> >> ...I won't be able to directly apply the (promoted) standby's new WAL entries over the top of it. > > I see--there's our difference. When I do this, I am willing to stay on the standby for a while if need be. > >> A checkpoint or autovacuum might generate a small change/entry in WAL (I don't know this for sure regarding autovacuum;this is a worst case assumption). > > I would think autovacuum would have to, since it writes some changes to at least index pages. But you can disable autovacuum. And you can manually checkpoint. So maybe you'd just add that after shutting down servicesthat access the db. (For me, that's mostly: "sudo launchctl unload my.particular.prefix.*") -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ https://www.linkedin.com/in/scottribe/ (303) 722-0567 voice
On 26 Mar 2015, at 19:48, Scott Ribe <scott_ribe@elevated-dev.com> wrote: > >> On Mar 26, 2015, at 12:42 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote: >> >> On Mar 26, 2015, at 12:17 PM, Graeme B. Bell <grb@skogoglandskap.no> wrote: >>> >>> ...I won't be able to directly apply the (promoted) standby's new WAL entries over the top of it. >> >> I see--there's our difference. When I do this, I am willing to stay on the standby for a while if need be. >> >>> A checkpoint or autovacuum might generate a small change/entry in WAL (I don't know this for sure regarding autovacuum;this is a worst case assumption). >> >> I would think autovacuum would have to, since it writes some changes to at least index pages. > > But you can disable autovacuum. And you can manually checkpoint. So maybe you'd just add that after shutting down servicesthat access the db. (For me, that's mostly: "sudo launchctl unload my.particular.prefix.*") That may work, but this is starting to look very hacky, and the problem with hacky approaches is that you usually miss somethingyou don't know about yet, or get caught by something that changes later. For example: in the processes that you were using prior to our conversation, which you mentioned before (shutting down servicesbefore server), were you already doing a forced checkpoint to counter archive_timeout, and had you disabled autovacuum? Graeme.
> >> Let's imagine someone follows your advice but is already running a PITR archive with archive_timeout. The recommendedtimeout is 1 minute. Every minute their server generates a new WAL segment. > > Yeah, I'm always assuming streaming replication. If you know you have a delay in replication, you'd better remember that;-) If we agree on that assumption, then: There is definitely a race condition between the generation of WAL by archive_timeout and any SQL-based check that you mightdo against the server to see it's WAL state before shutting down, which can result in WAL that is not synchronised betweenmaster and standby (and which you do not know is not synchronised). And a possible further race condition between the generation of WAL by archive timeout and the shutdown process, dependingon at what point in the shutdown process the server ceases to generate new WAL, which can also result in WAL thatis not synchronised between master and standby (and which you do not know is not synchronised). "Similar" or "Maybe the same" isn't going to be good enough here for reliable behaviour. Graeme Bell
On Mar 27, 2015, at 3:46 AM, Graeme B. Bell <grb@skogoglandskap.no> wrote: > > For example: in the processes that you were using prior to our conversation, which you mentioned before (shutting downservices before server), were you already doing a forced checkpoint to counter archive_timeout, and had you disabledautovacuum? No. because I run on the newly-promoted master for a while, then rsync back. So all I care about is getting all committedtransactions replicated. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ https://www.linkedin.com/in/scottribe/ (303) 722-0567 voice