Обсуждение: pgsql: pg_ctl: Detect current standby state from pg_control
pg_ctl: Detect current standby state from pg_control pg_ctl used to determine whether a server was in standby mode by looking for a recovery.conf file. With this change, it instead looks into pg_control, which is potentially more accurate. There are also occasional discussions about removing recovery.conf, so this removes one dependency. Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Branch ------ master Details ------- http://git.postgresql.org/pg/commitdiff/c1dc51d4844e2a37412b034c07c1c5a439ba0b9d Modified Files -------------- src/backend/utils/misc/pg_controldata.c | 12 +++++++++ src/bin/pg_controldata/pg_controldata.c | 4 +++ src/bin/pg_ctl/pg_ctl.c | 47 ++++++++++++++++++++++++++------- src/common/controldata_utils.c | 15 +++++------ src/include/common/controldata_utils.h | 2 +- 5 files changed, 62 insertions(+), 18 deletions(-)
Peter Eisentraut <peter_e@gmx.net> writes: > pg_ctl: Detect current standby state from pg_control Coverity thinks that this patch introduced a bunch of null-pointer-dereference hazards, and AFAICS it is right. The change in get_controlfile()'s API is completely broken and needs to be undone. regards, tom lane
On Mon, Sep 26, 2016 at 12:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> pg_ctl: Detect current standby state from pg_control > > Coverity thinks that this patch introduced a bunch of > null-pointer-dereference hazards, and AFAICS it is right. > The change in get_controlfile()'s API is completely broken > and needs to be undone. So you'd rather have a sleep(1) and complicate this code to replace it with a WaitLatch() when the code is used by the backend? I don't agree with that as the new API is cleaner as presented, though I agree that it is a change that caller does not get any result in case of a CRC mismatch, but who's going to use results that cannot be trusted anyway? It seems to me that the correct fix here is that pg_controldata.c should just exit like in the attached patch. Somewhat I missed that during my lookup of c1dc51d4. If we would still want callers of get_controlfile() to be allowed to read untrustworthy results, we could just add a boolean status flag.. -- Michael
Вложения
Michael Paquier <michael.paquier@gmail.com> writes: > On Mon, Sep 26, 2016 at 12:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Coverity thinks that this patch introduced a bunch of >> null-pointer-dereference hazards, and AFAICS it is right. >> The change in get_controlfile()'s API is completely broken >> and needs to be undone. > So you'd rather have a sleep(1) and complicate this code to replace it > with a WaitLatch() when the code is used by the backend? I don't agree > with that as the new API is cleaner as presented, though I agree that > it is a change that caller does not get any result in case of a CRC > mismatch, but who's going to use results that cannot be trusted > anyway? Well, they certainly won't be using any untrustworthy reports if pg_controldata dumps core instead of printing the data, which is what will happen in HEAD. Although I don't understand your reference to needing a sleep. I think that it's 100% pointless for get_control_dbstate to be worried about transient CRC failures. If writes to pg_control aren't atomic then we have problems enormously larger than whether "pg_ctl promote" throws an error or not. > It seems to me that the correct fix here is that > pg_controldata.c should just exit like in the attached patch. No. Removing pg_controldata's designed-in ability to print information from corrupted pg_control files was not part of the advertised impact of this patch, and I will absolutely not hold still for it happening as an accidental consequence of ill-advised refactoring. It may well be that you need two different APIs for get_controlfile(), maybe with a flag, or two different wrappers around some common code. Or just go back to using #ifdef FRONTEND, and forget the silliness of expecting pg_ctl not to throw an error for bad pg_control content. BTW, now that get_controlfile has become a global symbol rather than something private to pg_controldata.c, I suggest that it needs a less generic name. "control file" could apply to a lot of things. Possibly "get_pg_control_contents" would be better. regards, tom lane
On 9/26/16 8:53 AM, Tom Lane wrote: > I think that it's 100% pointless for get_control_dbstate > to be worried about transient CRC failures. If writes to pg_control > aren't atomic then we have problems enormously larger than whether > "pg_ctl promote" throws an error or not. The new code was designed to handle that, but if we think it's not an issue, then we can simplify it. I'm on it. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services