Обсуждение: Return value of pg_promote()

Поиск
Список
Период
Сортировка

Return value of pg_promote()

От
Ashutosh Sharma
Дата:
Hi All,

At present, pg_promote() returns true to the caller on successful
promotion of standby, however it returns false in multiple scenarios
which includes:

1) The SIGUSR1 signal could not be sent to the postmaster process.
2) The postmaster died during standby promotion.
3) Standby couldn't be promoted within the specified wait time.

For an application calling this function, if pg_promote returns false,
it is hard to interpret the reason behind it. So I think we should
*only* allow pg_promote to return false when the server could not be
promoted in the given wait time and in other scenarios it should just
throw an error (FATAL, ERROR ... depending on the type of failure that
occurred). Please let me know your thoughts on this change. thanks.!

--
With Regards,
Ashutosh Sharma.



Re: Return value of pg_promote()

От
Laurenz Albe
Дата:
On Tue, 2023-06-06 at 16:35 +0530, Ashutosh Sharma wrote:
> At present, pg_promote() returns true to the caller on successful
> promotion of standby, however it returns false in multiple scenarios
> which includes:
>
> 1) The SIGUSR1 signal could not be sent to the postmaster process.
> 2) The postmaster died during standby promotion.
> 3) Standby couldn't be promoted within the specified wait time.
>
> For an application calling this function, if pg_promote returns false,
> it is hard to interpret the reason behind it. So I think we should
> *only* allow pg_promote to return false when the server could not be
> promoted in the given wait time and in other scenarios it should just
> throw an error (FATAL, ERROR ... depending on the type of failure that
> occurred). Please let me know your thoughts on this change. thanks.!

As the original author, I'd say that that sounds reasonable, particularly
in case #1.  If the postmaster dies, we are going to die too, so it
probably doesn't matter much.  But I think an error is certainly also
correct in that case.

Yours,
Laurenz Albe



Re: Return value of pg_promote()

От
Fujii Masao
Дата:

On 2023/06/07 2:00, Laurenz Albe wrote:
> On Tue, 2023-06-06 at 16:35 +0530, Ashutosh Sharma wrote:
>> At present, pg_promote() returns true to the caller on successful
>> promotion of standby, however it returns false in multiple scenarios
>> which includes:
>>
>> 1) The SIGUSR1 signal could not be sent to the postmaster process.
>> 2) The postmaster died during standby promotion.
>> 3) Standby couldn't be promoted within the specified wait time.
>>
>> For an application calling this function, if pg_promote returns false,
>> it is hard to interpret the reason behind it. So I think we should
>> *only* allow pg_promote to return false when the server could not be
>> promoted in the given wait time and in other scenarios it should just
>> throw an error (FATAL, ERROR ... depending on the type of failure that
>> occurred). Please let me know your thoughts on this change. thanks.!
> 
> As the original author, I'd say that that sounds reasonable, particularly
> in case #1.  If the postmaster dies, we are going to die too, so it
> probably doesn't matter much.  But I think an error is certainly also
> correct in that case.

+1

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Return value of pg_promote()

От
Ashutosh Sharma
Дата:
On Wed, Jun 7, 2023 at 9:55 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2023/06/07 2:00, Laurenz Albe wrote:
> > On Tue, 2023-06-06 at 16:35 +0530, Ashutosh Sharma wrote:
> >> At present, pg_promote() returns true to the caller on successful
> >> promotion of standby, however it returns false in multiple scenarios
> >> which includes:
> >>
> >> 1) The SIGUSR1 signal could not be sent to the postmaster process.
> >> 2) The postmaster died during standby promotion.
> >> 3) Standby couldn't be promoted within the specified wait time.
> >>
> >> For an application calling this function, if pg_promote returns false,
> >> it is hard to interpret the reason behind it. So I think we should
> >> *only* allow pg_promote to return false when the server could not be
> >> promoted in the given wait time and in other scenarios it should just
> >> throw an error (FATAL, ERROR ... depending on the type of failure that
> >> occurred). Please let me know your thoughts on this change. thanks.!
> >
> > As the original author, I'd say that that sounds reasonable, particularly
> > in case #1.  If the postmaster dies, we are going to die too, so it
> > probably doesn't matter much.  But I think an error is certainly also
> > correct in that case.
>
> +1
>

Thanks for sharing your thoughts, Laurenz and Fujii-san. I've prepared
a patch that makes pg_promote error out if it couldn't send SIGUSR1 to
the postmaster or if the postmaster died in the middle of standby
promotion. PFA. Please note that now (with this patch) pg_promote only
returns false if the standby could not be promoted within the given
wait time. In case of any kind of failure, it just reports an error
based on the type of failure that occurred.

--
With Regards,
Ashutosh Sharma.

Вложения

Re: Return value of pg_promote()

От
Michael Paquier
Дата:
On Thu, Jun 08, 2023 at 04:53:50PM +0530, Ashutosh Sharma wrote:
> Thanks for sharing your thoughts, Laurenz and Fujii-san. I've prepared
> a patch that makes pg_promote error out if it couldn't send SIGUSR1 to
> the postmaster or if the postmaster died in the middle of standby
> promotion. PFA. Please note that now (with this patch) pg_promote only
> returns false if the standby could not be promoted within the given
> wait time. In case of any kind of failure, it just reports an error
> based on the type of failure that occurred.

     if (kill(PostmasterPid, SIGUSR1) != 0)
     {
-        ereport(WARNING,
-                (errmsg("failed to send signal to postmaster: %m")));
         (void) unlink(PROMOTE_SIGNAL_FILE);
-        PG_RETURN_BOOL(false);
+        ereport(ERROR,
+                (errmsg("failed to send signal to postmaster: %m")));
     }

Shouldn't you assign an error code to this one rather than the
default one for internal errors, like ERRCODE_SYSTEM_ERROR?

     /* return immediately if waiting was not requested */
@@ -744,7 +743,9 @@ pg_promote(PG_FUNCTION_ARGS)
          * necessity for manual cleanup of all postmaster children.
          */
         if (rc & WL_POSTMASTER_DEATH)
-            PG_RETURN_BOOL(false);
+            ereport(FATAL,
+                    (errcode(ERRCODE_ADMIN_SHUTDOWN),
+                     errmsg("terminating connection due to unexpected postmaster exit")));

I would add an errcontext here, to let somebody know that the
connection died while waiting for the promotion to be processed, say
"while waiting on promotion".
--
Michael

Вложения

Re: Return value of pg_promote()

От
Michael Paquier
Дата:
On Wed, Aug 16, 2023 at 05:02:09PM +0900, Michael Paquier wrote:
>      if (kill(PostmasterPid, SIGUSR1) != 0)
>      {
> -        ereport(WARNING,
> -                (errmsg("failed to send signal to postmaster: %m")));
>          (void) unlink(PROMOTE_SIGNAL_FILE);
> -        PG_RETURN_BOOL(false);
> +        ereport(ERROR,
> +                (errmsg("failed to send signal to postmaster: %m")));
>      }
>
> Shouldn't you assign an error code to this one rather than the
> default one for internal errors, like ERRCODE_SYSTEM_ERROR?
>
>      /* return immediately if waiting was not requested */
> @@ -744,7 +743,9 @@ pg_promote(PG_FUNCTION_ARGS)
>           * necessity for manual cleanup of all postmaster children.
>           */
>          if (rc & WL_POSTMASTER_DEATH)
> -            PG_RETURN_BOOL(false);
> +            ereport(FATAL,
> +                    (errcode(ERRCODE_ADMIN_SHUTDOWN),
> +                     errmsg("terminating connection due to unexpected postmaster exit")));
>
> I would add an errcontext here, to let somebody know that the
> connection died while waiting for the promotion to be processed, say
> "while waiting on promotion".

I have just noticed that we do not have a CF entry for this proposal,
so I have added one with Laurenz as author:
https://commitfest.postgresql.org/44/4504/

For now the patch is waiting on author.  Could you address my
last review?
--
Michael

Вложения

Re: Return value of pg_promote()

От
Ashutosh Sharma
Дата:
Hi Michael,

On Thu, Aug 17, 2023 at 6:07 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Aug 16, 2023 at 05:02:09PM +0900, Michael Paquier wrote:
> >      if (kill(PostmasterPid, SIGUSR1) != 0)
> >      {
> > -        ereport(WARNING,
> > -                (errmsg("failed to send signal to postmaster: %m")));
> >          (void) unlink(PROMOTE_SIGNAL_FILE);
> > -        PG_RETURN_BOOL(false);
> > +        ereport(ERROR,
> > +                (errmsg("failed to send signal to postmaster: %m")));
> >      }
> >
> > Shouldn't you assign an error code to this one rather than the
> > default one for internal errors, like ERRCODE_SYSTEM_ERROR?
> >
> >      /* return immediately if waiting was not requested */
> > @@ -744,7 +743,9 @@ pg_promote(PG_FUNCTION_ARGS)
> >           * necessity for manual cleanup of all postmaster children.
> >           */
> >          if (rc & WL_POSTMASTER_DEATH)
> > -            PG_RETURN_BOOL(false);
> > +            ereport(FATAL,
> > +                    (errcode(ERRCODE_ADMIN_SHUTDOWN),
> > +                     errmsg("terminating connection due to unexpected postmaster exit")));
> >
> > I would add an errcontext here, to let somebody know that the
> > connection died while waiting for the promotion to be processed, say
> > "while waiting on promotion".
>
> I have just noticed that we do not have a CF entry for this proposal,
> so I have added one with Laurenz as author:
> https://commitfest.postgresql.org/44/4504/
>
> For now the patch is waiting on author.  Could you address my
> last review?

Thanks for reviewing the patch and adding a CF entry for it. PFA patch
that addresses your review comments.

And... Sorry for the delayed response. I totally missed it.

--
With Regards,
Ashutosh Sharma.

Вложения

Re: Return value of pg_promote()

От
Michael Paquier
Дата:
On Mon, Aug 28, 2023 at 11:50:45AM +0530, Ashutosh Sharma wrote:
> Thanks for reviewing the patch and adding a CF entry for it. PFA patch
> that addresses your review comments.

That looks OK seen from here.  Perhaps others have more comments?

> And... Sorry for the delayed response. I totally missed it.

No problem.
--
Michael

Вложения

Re: Return value of pg_promote()

От
Laurenz Albe
Дата:
On Thu, 2023-08-17 at 09:37 +0900, Michael Paquier wrote:
> I have just noticed that we do not have a CF entry for this proposal,
> so I have added one with Laurenz as author:
> https://commitfest.postgresql.org/44/4504/

I have changed the author to Fujii Masao.

Yours,
Laurenz Albe



Re: Return value of pg_promote()

От
Michael Paquier
Дата:
On Mon, Aug 28, 2023 at 02:09:37PM +0200, Laurenz Albe wrote:
> On Thu, 2023-08-17 at 09:37 +0900, Michael Paquier wrote:
> > I have just noticed that we do not have a CF entry for this proposal,
> > so I have added one with Laurenz as author:
> > https://commitfest.postgresql.org/44/4504/
>
> I have changed the author to Fujii Masao.

Still incorrect, as the author is Ashutosh Sharma.  Fujii-san has
provided some feedback, though.
--
Michael

Вложения