Обсуждение: recovery_min_delay casting problems lead to busy looping
Hi, recoveryApplyDelay() does: TimestampDifference(GetCurrentTimestamp(), recoveryDelayUntilTime, &secs,µsecs); if (secs <= 0 && microsecs <= 0) break; elog(DEBUG2, "recovery apply delay %ld seconds, %d milliseconds", secs, microsecs / 1000); WaitLatch(&XLogCtl->recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, secs* 1000L + microsecs / 1000); The problem is that the 'microsecs <= 0' comparison is done while in microsecs, but the sleeping converts to milliseconds. Which will often be 0. I've seen this cause ~15-20 iterations per loop. Annoying, but not terrible. I think we should simply make the abort condition '&& microsecs / 1000 <= 0'. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Mar 23, 2015 at 10:18 AM, Andres Freund <andres@2ndquadrant.com> wrote: > recoveryApplyDelay() does: > TimestampDifference(GetCurrentTimestamp(), recoveryDelayUntilTime, > &secs, µsecs); > > if (secs <= 0 && microsecs <= 0) > break; > > elog(DEBUG2, "recovery apply delay %ld seconds, %d milliseconds", > secs, microsecs / 1000); > > WaitLatch(&XLogCtl->recoveryWakeupLatch, > WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, > secs * 1000L + microsecs / 1000); > > The problem is that the 'microsecs <= 0' comparison is done while in > microsecs, but the sleeping converts to milliseconds. Which will often > be 0. I've seen this cause ~15-20 iterations per loop. Annoying, but not > terrible. > > I think we should simply make the abort condition '&& microsecs / 1000 > <= 0'. That's a subtle violation of the documented behavior, although there's a good chance nobody would ever care. What about just changing the WaitLatch call to say Max(secs * 1000L + microsecs / 1000, 1)? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015-03-23 10:25:48 -0400, Robert Haas wrote: > On Mon, Mar 23, 2015 at 10:18 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > recoveryApplyDelay() does: > > TimestampDifference(GetCurrentTimestamp(), recoveryDelayUntilTime, > > &secs, µsecs); > > > > if (secs <= 0 && microsecs <= 0) > > break; > > > > elog(DEBUG2, "recovery apply delay %ld seconds, %d milliseconds", > > secs, microsecs / 1000); > > > > WaitLatch(&XLogCtl->recoveryWakeupLatch, > > WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, > > secs * 1000L + microsecs / 1000); > > > > The problem is that the 'microsecs <= 0' comparison is done while in > > microsecs, but the sleeping converts to milliseconds. Which will often > > be 0. I've seen this cause ~15-20 iterations per loop. Annoying, but not > > terrible. > > > > I think we should simply make the abort condition '&& microsecs / 1000 > > <= 0'. > > That's a subtle violation of the documented behavior Would it be? The delay is specified on a millisecond resolution, so not waiting if below one ms doesn't seem unreasonable to me. >, although there's > a good chance nobody would ever care. What about just changing the > WaitLatch call to say Max(secs * 1000L + microsecs / 1000, 1)? I could live with that as well. Although we at least should convert the elog(DEBUG) to log milliseconds in floating point in that case. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services