Обсуждение: Hot standby, recent changes
1. The XLogFlush() call you added to dbase_redo doesn't help where it is. You need to call XLogFlush() after the *commit* record of the DROP DATABASE. The idea is minimize the window where the files have already been deleted but the entry in pg_database is still visible, if someone kills the standby and starts it up as a new master. This isn't really hot standby's fault, you have the same window with PITR, so I think it would be better to handle this as a separate patch. Also, please handle all the commands that use ForceSyncCommit(). 2. "Allow RULEs ON INSERT, ON UPDATE and ON DELETE iff they generate only SELECT statements that act INSTEAD OF the actual event." This affects any read-only transaction, not just hot standby, so I think this should be discussed and changed as separate patch. 3. The "Out of lock mem killer" in StandbyAcquireAccessExclusiveLock is quite harsh. It aborts all read-only transactions. It should be enough to kill just one random one, or maybe the one that's holding most locks. Also, if there still isn't enough shared memory after killing all backends, it goes into an infinite loop. I guess that shouldn't happen, but it seems a bit squishy anyway. It would be nice to differentiate between "out of shared mem" and "someone else is holding the lock" more accurately. Maybe add a new return value to LockAcquire() for "out of shared mem". 4. Need to handle the case where master is started up with wal_standby_info=true, shut down, and restarted with wal_standby_info=false, while the standby server runs continuously. And the code in StartupXLog() to initialize recovery snapshot from a shutdown checkpoint needs to check that too. 5. You removed this comment from KnownAssignedXidsAdd: - /* - * XXX: We should check that we don't exceed maxKnownAssignedXids. - * Even though the hash table might hold a few more entries than that, - * we use fixed-size arrays of that size elsewhere and expected all - * entries in the hash table to fit. - */ I think the issue still exists. The comment refers to KnownAssignedXidsGet, which takes as an argument an array that has to be large enough to hold all entries in the known-assigned hash table. If there's more entries than expected in the hash table, KnownAssignedXidsGet will overrun the array. Perhaps KnownAssignedXidsGet should return a palloc'd array instead or something, but I don't think it's fine as it is. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Sun, 2009-12-06 at 12:32 +0200, Heikki Linnakangas wrote: > 2. "Allow RULEs ON INSERT, ON UPDATE and ON DELETE iff they generate > only SELECT statements that act INSTEAD OF the actual event." This > affects any read-only transaction, not just hot standby, so I think this > should be discussed and changed as separate patch. OK > 4. Need to handle the case where master is started up with > wal_standby_info=true, shut down, and restarted with > wal_standby_info=false, while the standby server runs continuously. And > the code in StartupXLog() to initialize recovery snapshot from a > shutdown checkpoint needs to check that too. I don't really understand the use case for shutting down the server and then using it as a HS base backup. Why would anyone do that? Why would they have their server down for potentially hours, when they can take the backup while the server is up? If the server is idle, it can be up-and-idle just as easily as down-and-idle, in which case we wouldn't need to support this at all. Adding yards of code for this capability isn't important to me. I'd rather strip the whole lot out than keep fiddling with a low priority area. Please justify this as a real world solution before we continue trying to support it. > 5. You removed this comment from KnownAssignedXidsAdd: > > - /* > - * XXX: We should check that we don't exceed maxKnownAssignedXids. > - * Even though the hash table might hold a few more entries than that, > - * we use fixed-size arrays of that size elsewhere and expected all > - * entries in the hash table to fit. > - */ > > I think the issue still exists. The comment refers to > KnownAssignedXidsGet, which takes as an argument an array that has to be > large enough to hold all entries in the known-assigned hash table. If > there's more entries than expected in the hash table, > KnownAssignedXidsGet will overrun the array. Perhaps > KnownAssignedXidsGet should return a palloc'd array instead or > something, but I don't think it's fine as it is. Same issue perhaps, different place. -- Simon Riggs www.2ndQuadrant.com
On Sun, 2009-12-06 at 12:32 +0200, Heikki Linnakangas wrote: > 1. The XLogFlush() call you added to dbase_redo doesn't help where it > is. You need to call XLogFlush() after the *commit* record of the DROP > DATABASE. The idea is minimize the window where the files have already > been deleted but the entry in pg_database is still visible, if someone > kills the standby and starts it up as a new master. This isn't really > hot standby's fault, you have the same window with PITR, so I think it > would be better to handle this as a separate patch. Also, please handle > all the commands that use ForceSyncCommit(). I think I'll just add a flag to the commit record to do this. That way anybody that sets ForceSyncCommit will do as you propose. -- Simon Riggs www.2ndQuadrant.com
On Sun, 2009-12-06 at 12:32 +0200, Heikki Linnakangas wrote: > 3. The "Out of lock mem killer" in StandbyAcquireAccessExclusiveLock is > quite harsh. It aborts all read-only transactions. It should be enough > to kill just one random one, or maybe the one that's holding most locks. > Also, if there still isn't enough shared memory after killing all > backends, it goes into an infinite loop. I guess that shouldn't happen, > but it seems a bit squishy anyway. It would be nice to differentiate > between "out of shared mem" and "someone else is holding the lock" more > accurately. Maybe add a new return value to LockAcquire() for "out of > shared mem". OK, will abort infinite loop by adding new return value. If people don't like having everything killed, they can add more lock memory. It isn't worth adding more code because its hard to test and unlikely to be an issue in real usage, and it has a clear workaround that is already mentioned in a hint. -- Simon Riggs www.2ndQuadrant.com
On Sun, 2009-12-06 at 11:20 +0000, Simon Riggs wrote: > On Sun, 2009-12-06 at 12:32 +0200, Heikki Linnakangas wrote: > > > 3. The "Out of lock mem killer" in StandbyAcquireAccessExclusiveLock is > > quite harsh. It aborts all read-only transactions. It should be enough > > to kill just one random one, or maybe the one that's holding most locks. > > Also, if there still isn't enough shared memory after killing all > > backends, it goes into an infinite loop. I guess that shouldn't happen, > > but it seems a bit squishy anyway. It would be nice to differentiate > > between "out of shared mem" and "someone else is holding the lock" more > > accurately. Maybe add a new return value to LockAcquire() for "out of > > shared mem". > > OK, will abort infinite loop by adding new return value. You had me for a minute, but there is no infinite loop. Once we start killing people it reverts to throwing an ERROR on an out of memory condition. -- Simon Riggs www.2ndQuadrant.com
On Sun, 2009-12-06 at 10:51 +0000, Simon Riggs wrote: > > 5. You removed this comment from KnownAssignedXidsAdd: > > > > - /* > > - * XXX: We should check that we don't exceed maxKnownAssignedXids. > > - * Even though the hash table might hold a few more entries than that, > > - * we use fixed-size arrays of that size elsewhere and expected all > > - * entries in the hash table to fit. > > - */ > > > > I think the issue still exists. The comment refers to > > KnownAssignedXidsGet, which takes as an argument an array that has to be > > large enough to hold all entries in the known-assigned hash table. If > > there's more entries than expected in the hash table, > > KnownAssignedXidsGet will overrun the array. Perhaps > > KnownAssignedXidsGet should return a palloc'd array instead or > > something, but I don't think it's fine as it is. > > Same issue perhaps, different place. Well, its not same issue. One was about entering things into a shared memory hash table, one is about reading values into an locally malloc'd array. The array is sized as the max size of KnownAssignedXids, so there is no danger that there would ever be an overrun. One caller does not set the max size explicitly, so I will add a comment/code changes to make that clearer. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > On Sun, 2009-12-06 at 12:32 +0200, Heikki Linnakangas wrote: >> 4. Need to handle the case where master is started up with >> wal_standby_info=true, shut down, and restarted with >> wal_standby_info=false, while the standby server runs continuously. And >> the code in StartupXLog() to initialize recovery snapshot from a >> shutdown checkpoint needs to check that too. > > I don't really understand the use case for shutting down the server and > then using it as a HS base backup. Why would anyone do that? Why would > they have their server down for potentially hours, when they can take > the backup while the server is up? If the server is idle, it can be > up-and-idle just as easily as down-and-idle, in which case we wouldn't > need to support this at all. Adding yards of code for this capability > isn't important to me. I'd rather strip the whole lot out than keep > fiddling with a low priority area. Please justify this as a real world > solution before we continue trying to support it. This affects using a shutdown checkpoint as a recovery start point (restore point) in general, not just a fresh backup taken from a shut down database. Consider this scenario: 0. You have a master and a standby configured properly, and up and running. 1. You shut down master for some reason. 2. You restart standby. For some reason. Maybe by accident, or you want to upgrade minor version or whatever. 3. Standby won't accept connections until the master is started too. Admin says "WTF?" -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Sun, 2009-12-06 at 20:32 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Sun, 2009-12-06 at 12:32 +0200, Heikki Linnakangas wrote: > >> 4. Need to handle the case where master is started up with > >> wal_standby_info=true, shut down, and restarted with > >> wal_standby_info=false, while the standby server runs continuously. And > >> the code in StartupXLog() to initialize recovery snapshot from a > >> shutdown checkpoint needs to check that too. > > > > I don't really understand the use case for shutting down the server and > > then using it as a HS base backup. Why would anyone do that? Why would > > they have their server down for potentially hours, when they can take > > the backup while the server is up? If the server is idle, it can be > > up-and-idle just as easily as down-and-idle, in which case we wouldn't > > need to support this at all. Adding yards of code for this capability > > isn't important to me. I'd rather strip the whole lot out than keep > > fiddling with a low priority area. Please justify this as a real world > > solution before we continue trying to support it. > > This affects using a shutdown checkpoint as a recovery start point > (restore point) in general, not just a fresh backup taken from a shut > down database. > > Consider this scenario: > > 0. You have a master and a standby configured properly, and up and running. > 1. You shut down master for some reason. > 2. You restart standby. For some reason. Maybe by accident, or you want > to upgrade minor version or whatever. > 3. Standby won't accept connections until the master is started too. > Admin says "WTF?" OK, I accept that as a possible scenario. I'm concerned that the number of possible scenarios we are trying to support in the first version is too high, increasing the likelihood that some of them do not work correctly because the test scenarios didn't re-create them exactly. In the above scenario, if it is of serious concern the admin can failover to the standby and then access the standby for queries. If the master was down for any length of time that's exactly what we'd expect to happen anyway. So I don't see the scenario as very likely; I'm sure it will happen to somebody. In any case, it is in the opposite direction to the main feature, which is a High Availability server, so any admin that argued that he wanted to have his HA standby stay as a standby in this case would likely be in a minority. I would rather document it as a known caveat and be done. -- Simon Riggs www.2ndQuadrant.com
On Sun, Dec 6, 2009 at 3:06 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Sun, 2009-12-06 at 20:32 +0200, Heikki Linnakangas wrote: >> Simon Riggs wrote: >> > On Sun, 2009-12-06 at 12:32 +0200, Heikki Linnakangas wrote: >> >> 4. Need to handle the case where master is started up with >> >> wal_standby_info=true, shut down, and restarted with >> >> wal_standby_info=false, while the standby server runs continuously. And >> >> the code in StartupXLog() to initialize recovery snapshot from a >> >> shutdown checkpoint needs to check that too. >> > >> > I don't really understand the use case for shutting down the server and >> > then using it as a HS base backup. Why would anyone do that? Why would >> > they have their server down for potentially hours, when they can take >> > the backup while the server is up? If the server is idle, it can be >> > up-and-idle just as easily as down-and-idle, in which case we wouldn't >> > need to support this at all. Adding yards of code for this capability >> > isn't important to me. I'd rather strip the whole lot out than keep >> > fiddling with a low priority area. Please justify this as a real world >> > solution before we continue trying to support it. >> >> This affects using a shutdown checkpoint as a recovery start point >> (restore point) in general, not just a fresh backup taken from a shut >> down database. >> >> Consider this scenario: >> >> 0. You have a master and a standby configured properly, and up and running. >> 1. You shut down master for some reason. >> 2. You restart standby. For some reason. Maybe by accident, or you want >> to upgrade minor version or whatever. >> 3. Standby won't accept connections until the master is started too. >> Admin says "WTF?" > > OK, I accept that as a possible scenario. > > I'm concerned that the number of possible scenarios we are trying to > support in the first version is too high, increasing the likelihood that > some of them do not work correctly because the test scenarios didn't > re-create them exactly. > > In the above scenario, if it is of serious concern the admin can > failover to the standby and then access the standby for queries. If the > master was down for any length of time that's exactly what we'd expect > to happen anyway. > > So I don't see the scenario as very likely; I'm sure it will happen to > somebody. In any case, it is in the opposite direction to the main > feature, which is a High Availability server, so any admin that argued > that he wanted to have his HA standby stay as a standby in this case > would likely be in a minority. > > I would rather document it as a known caveat and be done. For what it's worth, this doesn't seem particularly unlikely or unusual to me. But on the other hand, I do really want to get this committed soon, and I don't have time to help at the moment, so maybe I should keep my mouth shut. ...Robert
Le 6 déc. 2009 à 23:26, Robert Haas a écrit : >>> Consider this scenario: >>> >>> 0. You have a master and a standby configured properly, and up and running. >>> 1. You shut down master for some reason. >>> 2. You restart standby. For some reason. Maybe by accident, or you want >>> to upgrade minor version or whatever. >>> 3. Standby won't accept connections until the master is started too. >>> Admin says "WTF?" >> >> I would rather document it as a known caveat and be done. +1 > For what it's worth, this doesn't seem particularly unlikely or > unusual to me. I'm sorry to have to disagree here. Shutting down the master in my book means you're upgrading it, minor or major or thebox under it. It's not a frequent event. More frequent than a crash but still. Now the master is offline, you have a standby, and you're restarting it too, but you don't mean that as a switchover. I'mwith Simon here, the WTF is "are you in search of trouble?" more than anything else. I don't think shutting down the standbywhile the master is offline is considered as a good practice if your goal is HA... Regards, -- dim
On Sun, 2009-12-06 at 17:26 -0500, Robert Haas wrote: > For what it's worth, this doesn't seem particularly unlikely or > unusual to me. I don't know many people who shutdown both nodes of a highly available application at the same time. If they did, I wouldn't expect them to complain they couldn't run queries on the standby when an two obvious and simple workarounds exist to allow them to access their data: start the master again, or make the standby switchover, both of which are part of standard operating procedures. It doesn't seem very high up the list of additional features, at least. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > On Sun, 2009-12-06 at 17:26 -0500, Robert Haas wrote: > >> For what it's worth, this doesn't seem particularly unlikely or >> unusual to me. > > I don't know many people who shutdown both nodes of a highly available > application at the same time. If they did, I wouldn't expect them to > complain they couldn't run queries on the standby when an two obvious > and simple workarounds exist to allow them to access their data: start > the master again, or make the standby switchover, both of which are part > of standard operating procedures. It might not be common or expected in a typical HA installation, but it would be a very strange limitation in my mind. It might well happen e.g in a standby used for reporting, or when you do PITR. > It doesn't seem very high up the list of additional features, at least. Well, it's in the patch now. I'm just asking you to not break it. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, 2009-12-07 at 10:02 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Sun, 2009-12-06 at 17:26 -0500, Robert Haas wrote: > > > >> For what it's worth, this doesn't seem particularly unlikely or > >> unusual to me. > > > > I don't know many people who shutdown both nodes of a highly available > > application at the same time. If they did, I wouldn't expect them to > > complain they couldn't run queries on the standby when an two obvious > > and simple workarounds exist to allow them to access their data: start > > the master again, or make the standby switchover, both of which are part > > of standard operating procedures. > > It might not be common or expected in a typical HA installation, but it > would be a very strange limitation in my mind. It might well happen e.g > in a standby used for reporting, or when you do PITR. Yes, its possible that the standby can be shutdown while disconnected from a master. If that occurs, all we are saying is that we cannot use shutdown checkpoints as starting points. If the primary was set up in default mode, then wal_standby_info = on and so there will be running_xact WAL records immediately after each checkpoint to use as starting points. We don't need shutdown checkpoints as well. So adding shutdown checkpoints as a valid starting place does not help in this case, it just makes the code harder to test. > > It doesn't seem very high up the list of additional features, at least. > > Well, it's in the patch now. I'm just asking you to not break it. There's been many things in the patch that have been removed. This is by far the least important of the things removed in the name of getting this done as soon as possible, with good quality. I see no reason for your eagerness to include this feature. I have removed it; as you say, its in the repo if someone wants to add it in later. -- Simon Riggs www.2ndQuadrant.com