Обсуждение: recovery_min_delay casting problems lead to busy looping

Поиск
Список
Период
Сортировка

recovery_min_delay casting problems lead to busy looping

От
Andres Freund
Дата:
Hi,

recoveryApplyDelay() does:   TimestampDifference(GetCurrentTimestamp(), recoveryDelayUntilTime,
&secs,µsecs);
 
   if (secs <= 0 && microsecs <= 0)       break;
   elog(DEBUG2, "recovery apply delay %ld seconds, %d milliseconds",        secs, microsecs / 1000);
   WaitLatch(&XLogCtl->recoveryWakeupLatch,             WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
secs* 1000L + microsecs / 1000);
 

The problem is that the 'microsecs <= 0' comparison is done while in
microsecs, but the sleeping converts to milliseconds. Which will often
be 0. I've seen this cause ~15-20 iterations per loop. Annoying, but not
terrible.

I think we should simply make the abort condition '&& microsecs / 1000
<= 0'.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: recovery_min_delay casting problems lead to busy looping

От
Robert Haas
Дата:
On Mon, Mar 23, 2015 at 10:18 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> recoveryApplyDelay() does:
>     TimestampDifference(GetCurrentTimestamp(), recoveryDelayUntilTime,
>                         &secs, µsecs);
>
>     if (secs <= 0 && microsecs <= 0)
>         break;
>
>     elog(DEBUG2, "recovery apply delay %ld seconds, %d milliseconds",
>          secs, microsecs / 1000);
>
>     WaitLatch(&XLogCtl->recoveryWakeupLatch,
>               WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
>               secs * 1000L + microsecs / 1000);
>
> The problem is that the 'microsecs <= 0' comparison is done while in
> microsecs, but the sleeping converts to milliseconds. Which will often
> be 0. I've seen this cause ~15-20 iterations per loop. Annoying, but not
> terrible.
>
> I think we should simply make the abort condition '&& microsecs / 1000
> <= 0'.

That's a subtle violation of the documented behavior, although there's
a good chance nobody would ever care.  What about just changing the
WaitLatch call to say Max(secs * 1000L + microsecs / 1000, 1)?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: recovery_min_delay casting problems lead to busy looping

От
Andres Freund
Дата:
On 2015-03-23 10:25:48 -0400, Robert Haas wrote:
> On Mon, Mar 23, 2015 at 10:18 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > recoveryApplyDelay() does:
> >     TimestampDifference(GetCurrentTimestamp(), recoveryDelayUntilTime,
> >                         &secs, µsecs);
> >
> >     if (secs <= 0 && microsecs <= 0)
> >         break;
> >
> >     elog(DEBUG2, "recovery apply delay %ld seconds, %d milliseconds",
> >          secs, microsecs / 1000);
> >
> >     WaitLatch(&XLogCtl->recoveryWakeupLatch,
> >               WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
> >               secs * 1000L + microsecs / 1000);
> >
> > The problem is that the 'microsecs <= 0' comparison is done while in
> > microsecs, but the sleeping converts to milliseconds. Which will often
> > be 0. I've seen this cause ~15-20 iterations per loop. Annoying, but not
> > terrible.
> >
> > I think we should simply make the abort condition '&& microsecs / 1000
> > <= 0'.
> 
> That's a subtle violation of the documented behavior

Would it be? The delay is specified on a millisecond resolution, so not
waiting if below one ms doesn't seem unreasonable to me.

>, although there's
> a good chance nobody would ever care.  What about just changing the
> WaitLatch call to say Max(secs * 1000L + microsecs / 1000, 1)?

I could live with that as well. Although we at least should convert the
elog(DEBUG) to log milliseconds in floating point in that case.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services