Обсуждение: Improve handling of parameter differences in physical replication
When certain parameters are changed on a physical replication primary, this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL record. The standby then checks whether its own settings are at least as big as the ones on the primary. If not, the standby shuts down with a fatal error. The correspondence of settings between primary and standby is required because those settings influence certain shared memory sizings that are required for processing WAL records that the primary might send. For example, if the primary sends a prepared transaction, the standby must have had max_prepared_transaction set appropriately or it won't be able to process those WAL records. However, fatally shutting down the standby immediately upon receipt of the parameter change record might be a bit of an overreaction. The resources related to those settings are not required immediately at that point, and might never be required if the activity on the primary does not exhaust all those resources. An extreme example is raising max_prepared_transactions on the primary but never actually using prepared transactions. Where this becomes a serious problem is if you have many standbys and you do a failover. If the newly promoted standby happens to have a higher setting for one of the relevant parameters, all the other standbys that have followed it then shut down immediately and won't be able to continue until you change all their settings. If we didn't do the hard shutdown and we just let the standby roll on with recovery, nothing bad will happen and it will eventually produce an appropriate error when those resources are required (e.g., "maximum number of prepared transactions reached"). So I think there are better ways to handle this. It might be reasonable to provide options. The attached patch doesn't do that but it would be pretty easy. What the attached patch does is: Upon receipt of XLOG_PARAMETER_CHANGE, we still check the settings but only issue a warning and set a global flag if there is a problem. Then when we actually hit the resource issue and the flag was set, we issue another warning message with relevant information. Additionally, at that point we pause recovery instead of shutting down, so a hot standby remains usable. (That could certainly be configurable.) Btw., I think the current setup is slightly buggy. The MaxBackends value that is used to size shared memory is computed as MaxConnections + autovacuum_max_workers + 1 + max_worker_processes + max_wal_senders, but we don't track autovacuum_max_workers in WAL. (This patch was developed together with Simon Riggs.) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
Hello Thank you for working on this! > Where this becomes a serious problem is if you have many standbys and you do a failover. +1 Several times my team would like to pause recovery instead of panic after change settings on primary. (same thing for create_tablespace_directoriesreplay errors too...) We documented somewhere (excluding code) shutting down the standby immediately upon receipt of the parameter change? doc/src/sgml/high-availability.sgmlsays only about "refuse to start". regards, Sergei
On 2020/02/27 17:23, Peter Eisentraut wrote: > When certain parameters are changed on a physical replication primary, this is communicated to standbys using the XLOG_PARAMETER_CHANGEWAL record. The standby then checks whether its own settings are at least as big as the ones on theprimary. If not, the standby shuts down with a fatal error. > > The correspondence of settings between primary and standby is required because those settings influence certain sharedmemory sizings that are required for processing WAL records that the primary might send. For example, if the primarysends a prepared transaction, the standby must have had max_prepared_transaction set appropriately or it won't beable to process those WAL records. > > However, fatally shutting down the standby immediately upon receipt of the parameter change record might be a bit of anoverreaction. The resources related to those settings are not required immediately at that point, and might never be requiredif the activity on the primary does not exhaust all those resources. An extreme example is raising max_prepared_transactionson the primary but never actually using prepared transactions. > > Where this becomes a serious problem is if you have many standbys and you do a failover. If the newly promoted standbyhappens to have a higher setting for one of the relevant parameters, all the other standbys that have followed itthen shut down immediately and won't be able to continue until you change all their settings. > > If we didn't do the hard shutdown and we just let the standby roll on with recovery, nothing bad will happen and it willeventually produce an appropriate error when those resources are required (e.g., "maximum number of prepared transactionsreached"). > > So I think there are better ways to handle this. It might be reasonable to provide options. The attached patch doesn'tdo that but it would be pretty easy. What the attached patch does is: > > Upon receipt of XLOG_PARAMETER_CHANGE, we still check the settings but only issue a warning and set a global flag if thereis a problem. Then when we actually hit the resource issue and the flag was set, we issue another warning message withrelevant information. Additionally, at that point we pause recovery instead of shutting down, so a hot standby remainsusable. (That could certainly be configurable.) +1 > Btw., I think the current setup is slightly buggy. The MaxBackends value that is used to size shared memory is computedas MaxConnections + autovacuum_max_workers + 1 + max_worker_processes + max_wal_senders, but we don't track autovacuum_max_workersin WAL. Maybe this is because autovacuum doesn't work during recovery? Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
On 2020-02-27 11:13, Fujii Masao wrote: >> Btw., I think the current setup is slightly buggy. The MaxBackends value that is used to size shared memory is computedas MaxConnections + autovacuum_max_workers + 1 + max_worker_processes + max_wal_senders, but we don't track autovacuum_max_workersin WAL. > Maybe this is because autovacuum doesn't work during recovery? Autovacuum on the primary can use locks or xids, and so it's possible that the standby when processing WAL encounters more of those than it has locally allocated shared memory to handle. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Feb 27, 2020 at 02:37:24PM +0100, Peter Eisentraut wrote: > On 2020-02-27 11:13, Fujii Masao wrote: >>> Btw., I think the current setup is slightly buggy. The > MaxBackends value that is used to size shared memory is computed as > MaxConnections + autovacuum_max_workers + 1 + max_worker_processes + > max_wal_senders, but we don't track autovacuum_max_workers in WAL. >> Maybe this is because autovacuum doesn't work during recovery? > > Autovacuum on the primary can use locks or xids, and so it's possible that > the standby when processing WAL encounters more of those than it has locally > allocated shared memory to handle. Putting aside your patch because that sounds like a separate issue.. Doesn't this mean that autovacuum_max_workers should be added to the control file, that we need to record in WAL any updates done to it and that CheckRequiredParameterValues() is wrong? -- Michael
Вложения
On 2020-02-28 08:45, Michael Paquier wrote: > On Thu, Feb 27, 2020 at 02:37:24PM +0100, Peter Eisentraut wrote: >> On 2020-02-27 11:13, Fujii Masao wrote: >>>> Btw., I think the current setup is slightly buggy. The >> MaxBackends value that is used to size shared memory is computed as >> MaxConnections + autovacuum_max_workers + 1 + max_worker_processes + >> max_wal_senders, but we don't track autovacuum_max_workers in WAL. >>> Maybe this is because autovacuum doesn't work during recovery? >> >> Autovacuum on the primary can use locks or xids, and so it's possible that >> the standby when processing WAL encounters more of those than it has locally >> allocated shared memory to handle. > > Putting aside your patch because that sounds like a separate issue.. > Doesn't this mean that autovacuum_max_workers should be added to the > control file, that we need to record in WAL any updates done to it and > that CheckRequiredParameterValues() is wrong? That would be a direct fix, yes. Perhaps it might be better to track the combined MaxBackends instead, however. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Feb 28, 2020 at 08:49:08AM +0100, Peter Eisentraut wrote: > Perhaps it might be better to track the combined MaxBackends instead, > however. Not sure about that. I think that we should keep them separated, as that's more useful for debugging and more verbose for error reporting. (Worth noting that max_prepared_xacts is separate because of its dummy PGPROC entries created by PREPARE TRANSACTION, so it cannot be included in the set). -- Michael
Вложения
On 2020-Feb-27, Peter Eisentraut wrote: > So this patch relaxes this a bit. Upon receipt of > XLOG_PARAMETER_CHANGE, we still check the settings but only issue a > warning and set a global flag if there is a problem. Then when we > actually hit the resource issue and the flag was set, we issue another > warning message with relevant information. Additionally, at that > point we pause recovery, so a hot standby remains usable. Hmm, so what is the actual end-user behavior? As I read the code, we first send the WARNING, then pause recovery until the user resumes replication; at that point we raise the original error. Presumably, at that point the startup process terminates and is relaunched, and replay continues normally. Is that it? I think if the startup process terminates because of the original error, after it is unpaused, postmaster will get that as a signal to do a crash-recovery cycle, closing all existing connections. Is that right? If so, it would be worth improving that (possibly by adding a sigsetjmp() block) to avoid the disruption. Thanks, -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, 29 Feb 2020 at 06:39, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > On 2020-Feb-27, Peter Eisentraut wrote: > > > So this patch relaxes this a bit. Upon receipt of > > XLOG_PARAMETER_CHANGE, we still check the settings but only issue a > > warning and set a global flag if there is a problem. Then when we > > actually hit the resource issue and the flag was set, we issue another > > warning message with relevant information. Additionally, at that > > point we pause recovery, so a hot standby remains usable. > > Hmm, so what is the actual end-user behavior? As I read the code, we > first send the WARNING, then pause recovery until the user resumes > replication; at that point we raise the original error. I think after recovery is paused users will be better to restart the server rather than resume the recovery. I agree with this idea but I'm slightly concerned that users might not realize that recovery is paused until they look at that line in server log or at pg_stat_replication because the standby server is still functional. So I think we can periodically send WARNING to inform user that we're still waiting for parameter change and restart. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-02-28 16:33, Alvaro Herrera wrote: > Hmm, so what is the actual end-user behavior? As I read the code, we > first send the WARNING, then pause recovery until the user resumes > replication; at that point we raise the original error. Presumably, at > that point the startup process terminates and is relaunched, and replay > continues normally. Is that it? No, at that point you get the original, current behavior that the server instance shuts down with a fatal error. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-03-09 09:11, Masahiko Sawada wrote: > I think after recovery is paused users will be better to restart the > server rather than resume the recovery. I agree with this idea but I'm > slightly concerned that users might not realize that recovery is > paused until they look at that line in server log or at > pg_stat_replication because the standby server is still functional. So > I think we can periodically send WARNING to inform user that we're > still waiting for parameter change and restart. I think that would be annoying, unless you create a system for configuring those periodic warnings. I imagine in a case like having set max_prepared_transactions but never actually using prepared transactions, people will just ignore the warning until they have their next restart, so it could be months of periodic warnings. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, 9 Mar 2020 at 18:45, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 2020-03-09 09:11, Masahiko Sawada wrote: > > I think after recovery is paused users will be better to restart the > > server rather than resume the recovery. I agree with this idea but I'm > > slightly concerned that users might not realize that recovery is > > paused until they look at that line in server log or at > > pg_stat_replication because the standby server is still functional. So > > I think we can periodically send WARNING to inform user that we're > > still waiting for parameter change and restart. > > I think that would be annoying, unless you create a system for > configuring those periodic warnings. > > I imagine in a case like having set max_prepared_transactions but never > actually using prepared transactions, people will just ignore the > warning until they have their next restart, so it could be months of > periodic warnings. Well I meant to periodically send warning messages while waiting for parameter change, that is after exhausting resources and stopping recovery. In this situation user need to notice that as soon as possible. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
At Mon, 9 Mar 2020 21:13:38 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in > On Mon, 9 Mar 2020 at 18:45, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: > > > > On 2020-03-09 09:11, Masahiko Sawada wrote: > > > I think after recovery is paused users will be better to restart the > > > server rather than resume the recovery. I agree with this idea but I'm > > > slightly concerned that users might not realize that recovery is > > > paused until they look at that line in server log or at > > > pg_stat_replication because the standby server is still functional. So > > > I think we can periodically send WARNING to inform user that we're > > > still waiting for parameter change and restart. > > > > I think that would be annoying, unless you create a system for > > configuring those periodic warnings. > > > > I imagine in a case like having set max_prepared_transactions but never > > actually using prepared transactions, people will just ignore the > > warning until they have their next restart, so it could be months of > > periodic warnings. > > Well I meant to periodically send warning messages while waiting for > parameter change, that is after exhausting resources and stopping > recovery. In this situation user need to notice that as soon as > possible. If we lose connection, standby continues to complain about lost connection every 5 seconds. This is a situation of that kind. By the way, when I reduced max_connection only on master then take exclusive locks until standby complains on lock exchaustion, I see a WARNING that is saying max_locks_per_transaction instead of max_connection. WARNING: insufficient setting for parameter max_connections DETAIL: max_connections = 2 is a lower setting than on the master server (where its value was 3). HINT: Change parameters and restart the server, or there may be resource exhaustion errors sooner or later. CONTEXT: WAL redo at 0/60000A0 for XLOG/PARAMETER_CHANGE: max_connections=3 max_worker_processes=8 max_wal_senders=2 max_prepared_xacts=0max_locks_per_xact=10 wal_level=replica wal_log_hints=off track_commit_timestamp=off WARNING: recovery paused because of insufficient setting of parameter max_locks_per_transaction (currently 10) DETAIL: The value must be at least as high as on the primary server. HINT: Recovery cannot continue unless the parameter is changed and the server restarted. CONTEXT: WAL redo at 0/6004A80 for Standb regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2020-03-10 09:57, Kyotaro Horiguchi wrote: >> Well I meant to periodically send warning messages while waiting for >> parameter change, that is after exhausting resources and stopping >> recovery. In this situation user need to notice that as soon as >> possible. > > If we lose connection, standby continues to complain about lost > connection every 5 seconds. This is a situation of that kind. My argument is that it's not really the same. If a standby is disconnected for more than a few minutes, it's really not going to be a good standby anymore after a while. In this case, however, having certain parameter discrepancies is really harmless and you can run with it for a long time. I'm not strictly opposed to a periodic warning, but it's unclear to me how we would find a good interval. > By the way, when I reduced max_connection only on master then take > exclusive locks until standby complains on lock exchaustion, I see a > WARNING that is saying max_locks_per_transaction instead of > max_connection. > > > WARNING: insufficient setting for parameter max_connections > DETAIL: max_connections = 2 is a lower setting than on the master server (where its value was 3). > HINT: Change parameters and restart the server, or there may be resource exhaustion errors sooner or later. > CONTEXT: WAL redo at 0/60000A0 for XLOG/PARAMETER_CHANGE: max_connections=3 max_worker_processes=8 max_wal_senders=2 max_prepared_xacts=0max_locks_per_xact=10 wal_level=replica wal_log_hints=off track_commit_timestamp=off > WARNING: recovery paused because of insufficient setting of parameter max_locks_per_transaction (currently 10) > DETAIL: The value must be at least as high as on the primary server. > HINT: Recovery cannot continue unless the parameter is changed and the server restarted. > CONTEXT: WAL redo at 0/6004A80 for Standb This is all a web of half-truths. The lock tables are sized based on max_locks_per_xact * (MaxBackends + max_prepared_xacts). So if you run out of lock space, we currently recommend (in the single-server case), that you raise max_locks_per_xact, but you could also raise max_prepared_xacts or something else. So this is now the opposite case where the lock table on the master was bigger because of max_connections. We could make the advice less specific and just say, in essence, you need to make some parameter changes; see earlier for some hints. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
At Tue, 10 Mar 2020 14:47:47 +0100, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote in > On 2020-03-10 09:57, Kyotaro Horiguchi wrote: > >> Well I meant to periodically send warning messages while waiting for > >> parameter change, that is after exhausting resources and stopping > >> recovery. In this situation user need to notice that as soon as > >> possible. > > If we lose connection, standby continues to complain about lost > > connection every 5 seconds. This is a situation of that kind. > > My argument is that it's not really the same. If a standby is > disconnected for more than a few minutes, it's really not going to be > a good standby anymore after a while. In this case, however, having > certain parameter discrepancies is really harmless and you can run > with it for a long time. I'm not strictly opposed to a periodic > warning, but it's unclear to me how we would find a good interval. I meant the behavior after streaming is paused. That situation leads to loss of WAL or running out of WAL storage on the master. Actually 5 seconds as a interval would be too frequent, but, maybe, we need at least one message for a WAL segment-size? > > By the way, when I reduced max_connection only on master then take > > exclusive locks until standby complains on lock exchaustion, I see a > > WARNING that is saying max_locks_per_transaction instead of > > max_connection. ... > > WARNING: recovery paused because of insufficient setting of parameter > > max_locks_per_transaction (currently 10) > > DETAIL: The value must be at least as high as on the primary server. > > HINT: Recovery cannot continue unless the parameter is changed and the > > server restarted. > > CONTEXT: WAL redo at 0/6004A80 for Standb > > This is all a web of half-truths. The lock tables are sized based on > max_locks_per_xact * (MaxBackends + max_prepared_xacts). So if you > run out of lock space, we currently recommend (in the single-server > case), that you raise max_locks_per_xact, but you could also raise > max_prepared_xacts or something else. So this is now the opposite > case where the lock table on the master was bigger because of > max_connections. Yeah, I know. So, I'm not sure whether the checks on individual GUC variable (other than wal_level) makes sense. We might even not need the WARNING on parameter change. > We could make the advice less specific and just say, in essence, you > need to make some parameter changes; see earlier for some hints. In that sense the direction menetioned above seems sensible. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Here is an updated patch that incorporates some of the suggestions. In particular, some of the warning messages have been rephrased to more accurate (but also less specific), the warning message at recovery pause repeats every 1 minute, and the documentation has been updated. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
On Thu, 12 Mar 2020 at 04:34, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > Here is an updated patch that incorporates some of the suggestions. In > particular, some of the warning messages have been rephrased to more > accurate (but also less specific), the warning message at recovery pause > repeats every 1 minute, and the documentation has been updated. > Thank you for updating the patch. I have one comment on the latest version patch: + do + { + TimestampTz now = GetCurrentTimestamp(); + + if (TimestampDifferenceExceeds(last_warning, now, 60000)) + { + ereport(WARNING, + (errmsg("recovery paused because of insufficient parameter settings"), + errdetail("See earlier in the log about which settings are insufficient."), + errhint("Recovery cannot continue unless the configuration is changed and the server restarted."))); + last_warning = now; + } + + pg_usleep(1000000L); /* 1000 ms */ + HandleStartupProcInterrupts(); + } I think we can set wait event WAIT_EVENT_RECOVERY_PAUSE here. The others look good to me. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello > I think we can set wait event WAIT_EVENT_RECOVERY_PAUSE here. +1, since we added this in recoveryPausesHere. PS: do we need to add a prototype for the RecoveryRequiredIntParameter function in top of xlog.c? regards, Sergei
On 2020-03-27 20:15, Sergei Kornilov wrote: >> I think we can set wait event WAIT_EVENT_RECOVERY_PAUSE here. > > +1, since we added this in recoveryPausesHere. committed with that addition > PS: do we need to add a prototype for the RecoveryRequiredIntParameter function in top of xlog.c? There is no consistent style, I think, but I usually only add prototypes for static functions if they are required because of the ordering in the file. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Here is another stab at this subject. This is a much simplified variant: When encountering a parameter change in the WAL that is higher than the standby's current setting, we log a warning (instead of an error until now) and pause recovery. If you resume (unpause) recovery, the instance shuts down as before. This allows you to keep your standbys running for a bit (depending on lag requirements) and schedule the required restart more deliberately. I had previously suggested making this new behavior configurable, but there didn't seem to be much interest in that, so I have not included that there. The documentation changes are mostly carried over from previous patch versions (but adjusted for the actual behavior of the patch). -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
Here is a minimally updated new patch version to resolve a merge conflict. On 2020-06-24 10:00, Peter Eisentraut wrote: > Here is another stab at this subject. > > This is a much simplified variant: When encountering a parameter change > in the WAL that is higher than the standby's current setting, we log a > warning (instead of an error until now) and pause recovery. If you > resume (unpause) recovery, the instance shuts down as before. > > This allows you to keep your standbys running for a bit (depending on > lag requirements) and schedule the required restart more deliberately. > > I had previously suggested making this new behavior configurable, but > there didn't seem to be much interest in that, so I have not included > that there. > > The documentation changes are mostly carried over from previous patch > versions (but adjusted for the actual behavior of the patch). -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
Hello Thank you! I'm on vacation, so I was finally able to review the patch. Seems WAIT_EVENT_RECOVERY_PAUSE addition was lost during patch simplification. > ereport(FATAL, > (errmsg("recovery aborted because of insufficient parameter settings"), > errhint("You can restart the server after making the necessary configuration changes."))); I think we should repeat here conflicted param_name and minValue. pg_wal_replay_resume can be called days after recoverybeing paused. The initial message can be difficult to find. > errmsg("recovery will be paused") May be use the same "recovery has paused" as in recoveryPausesHere? It doesn't seem to make any difference since we set pauseright after that, but there will be a little less work translators. Not sure about "If recovery is unpaused". The word "resumed" seems to have been usually used in docs. regards, Sergei
On 2020-11-19 20:17, Sergei Kornilov wrote: > Seems WAIT_EVENT_RECOVERY_PAUSE addition was lost during patch simplification. added >> ereport(FATAL, >> (errmsg("recovery aborted because of insufficient parameter settings"), >> errhint("You can restart the server after making the necessary configuration changes."))); > > I think we should repeat here conflicted param_name and minValue. pg_wal_replay_resume can be called days after recoverybeing paused. The initial message can be difficult to find. done > >> errmsg("recovery will be paused") > > May be use the same "recovery has paused" as in recoveryPausesHere? It doesn't seem to make any difference since we setpause right after that, but there will be a little less work translators. done > Not sure about "If recovery is unpaused". The word "resumed" seems to have been usually used in docs. I think I like "unpaused" better here, because "resumed" would seem to imply that recovery can actually continue. One thing that has not been added to my patch is the equivalent of 496ee647ecd2917369ffcf1eaa0b2cdca07c8730, which allows promotion while recovery is paused. I'm not sure that would be necessary, and it doesn't look easy to add either. -- Peter Eisentraut 2ndQuadrant, an EDB company https://www.2ndquadrant.com/
Вложения
Hello > I think I like "unpaused" better here, because "resumed" would seem to > imply that recovery can actually continue. Good, I agree. > One thing that has not been added to my patch is the equivalent of > 496ee647ecd2917369ffcf1eaa0b2cdca07c8730, which allows promotion while > recovery is paused. I'm not sure that would be necessary, and it > doesn't look easy to add either. Hmm... Good question. How about putting CheckForStandbyTrigger() in a wait loop, but reporting FATAL with an appropriatemessage, such as "promotion is not possible because of insufficient parameter settings"? Also it suits me if we only document that we ignore promote here. I don't think this is an important case. And yes, it'snot easy to allow promotion, since we have already updated control file. Probably we need pause only after we reached consistency? 2020-11-20 18:10:23.617 MSK 19722 @ from [vxid: txid:0] [] LOG: entering standby mode 2020-11-20 18:10:23.632 MSK 19722 @ from [vxid: txid:0] [] WARNING: hot standby is not possible because of insufficientparameter settings 2020-11-20 18:10:23.632 MSK 19722 @ from [vxid: txid:0] [] DETAIL: max_connections = 100 is a lower setting than on theprimary server, where its value was 150. 2020-11-20 18:10:23.632 MSK 19722 @ from [vxid: txid:0] [] LOG: recovery has paused 2020-11-20 18:10:23.632 MSK 19722 @ from [vxid: txid:0] [] DETAIL: If recovery is unpaused, the server will shut down. 2020-11-20 18:10:23.632 MSK 19722 @ from [vxid: txid:0] [] HINT: You can then restart the server after making the necessaryconfiguration changes. 2020-11-20 18:13:09.767 MSK 19755 melkij@postgres from [local] [vxid: txid:0] [] FATAL: the database system is startingup regards, Sergei
On 2020-11-20 16:47, Sergei Kornilov wrote: > Hmm... Good question. How about putting CheckForStandbyTrigger() in a wait loop, but reporting FATAL with an appropriatemessage, such as "promotion is not possible because of insufficient parameter settings"? > Also it suits me if we only document that we ignore promote here. I don't think this is an important case. And yes, it'snot easy to allow promotion, since we have already updated control file. > > Probably we need pause only after we reached consistency? Here is an updated patch that implements both of these points. I have used hot standby active instead of reached consistency. I guess arguments could be made either way, but the original use case really cared about hot standby. -- Peter Eisentraut 2ndQuadrant, an EDB company https://www.2ndquadrant.com/
Вложения
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation: tested, passed Hello Look good for me. I think the patch is ready for commiter. The new status of this patch is: Ready for Committer
On 2021-01-15 12:28, Sergei Kornilov wrote: > The following review has been posted through the commitfest application: > make installcheck-world: tested, passed > Implements feature: tested, passed > Spec compliant: not tested > Documentation: tested, passed > > Hello > Look good for me. I think the patch is ready for commiter. > > The new status of this patch is: Ready for Committer committed -- Peter Eisentraut 2ndQuadrant, an EDB company https://www.2ndquadrant.com/