Обсуждение: Avoiding roundoff error in pg_sleep()
I chanced to notice that if you ask pg_sleep for 1ms delay, what you actually get is 2ms, for example regression=# \timing on Timing is on. regression=# do $$ begin for i in 1..1000 loop perform pg_sleep(0.001); end loop; end $$; DO Time: 2081.175 ms (00:02.081) regression=# do $$ begin for i in 1..1000 loop perform pg_sleep(0.002); end loop; end $$; DO Time: 2177.407 ms (00:02.177) A bit of gdb-ing confirms that the delay passed to WaitLatch() is 2ms, so the problem is fundamentally one of floating-point roundoff error in pg_sleep's calculation of delay_ms. I didn't try to figure out exactly why that's happening. It may well vary depending on the absolute magnitude of current values of GetCurrentTimestamp(), because I don't recall having noticed any such behavior back when this code was written. Anyway, I propose trying to get rid of this misbehavior by avoiding floating point in the delay computation, as attached. With this patch I get less surprising behavior: regression=# \timing on Timing is on. regression=# do $$ begin for i in 1..1000 loop perform pg_sleep(0.001); end loop; end $$; DO Time: 1063.997 ms (00:01.064) regression=# do $$ begin for i in 1..1000 loop perform pg_sleep(0.002); end loop; end $$; DO Time: 2172.849 ms (00:02.173) The code is a little more tied to TimestampTz being measured in microseconds than it was before, but it wouldn't really be much harder to fix if we ever change that. regards, tom lane diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c index 6c5e3438447..bc01209cf20 100644 --- a/src/backend/utils/adt/misc.c +++ b/src/backend/utils/adt/misc.c @@ -369,7 +369,20 @@ Datum pg_sleep(PG_FUNCTION_ARGS) { float8 secs = PG_GETARG_FLOAT8(0); - float8 endtime; + int64 usecs; + TimestampTz endtime; + + /* + * Convert the delay to int64 microseconds, rounding up any fraction, and + * silently limiting it to PG_INT64_MAX/2 microseconds (about 150K years) + * to ensure the computation of endtime won't overflow. Historically + * we've treated NaN as "no wait", not an error, so keep that behavior. + */ + if (isnan(secs) || secs <= 0.0) + PG_RETURN_VOID(); + secs *= 1e6; /* we assume overflow will produce +Inf */ + secs = ceil(secs); /* round up */ + usecs = (int64) Min(secs, (float8) (PG_INT64_MAX / 2)); /* * We sleep using WaitLatch, to ensure that we'll wake up promptly if an @@ -383,22 +396,20 @@ pg_sleep(PG_FUNCTION_ARGS) * less than the specified time when WaitLatch is terminated early by a * non-query-canceling signal such as SIGHUP. */ -#define GetNowFloat() ((float8) GetCurrentTimestamp() / 1000000.0) - - endtime = GetNowFloat() + secs; + endtime = GetCurrentTimestamp() + usecs; for (;;) { - float8 delay; + TimestampTz delay; long delay_ms; CHECK_FOR_INTERRUPTS(); - delay = endtime - GetNowFloat(); - if (delay >= 600.0) + delay = endtime - GetCurrentTimestamp(); + if (delay >= 600 * USECS_PER_SEC) delay_ms = 600000; - else if (delay > 0.0) - delay_ms = (long) ceil(delay * 1000.0); + else if (delay > 0) + delay_ms = (long) ((delay + 999) / 1000); else break;
Hi
I suspect, that any user, that run something like pg_sleep(1000000000),
start transaction, that stops autovacuum and creates other negative effects up to server crash,
and only this user can stop it by command interrupt (all signals only restart
this sleep or kill whole server).
If I am correct, why Postgres needs this command?
Best regard,
Vladlen Popolitov.
On Thursday, September 25, 2025 21:42 MSK, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I chanced to notice that if you ask pg_sleep for 1ms delay,
what you actually get is 2ms, for example
regression=# \timing on
Timing is on.
regression=# do $$ begin
for i in 1..1000 loop
perform pg_sleep(0.001);
end loop; end $$;
DO
Time: 2081.175 ms (00:02.081)
regression=# do $$ begin
for i in 1..1000 loop
perform pg_sleep(0.002);
end loop; end $$;
DO
Time: 2177.407 ms (00:02.177)
A bit of gdb-ing confirms that the delay passed to WaitLatch() is 2ms,
so the problem is fundamentally one of floating-point roundoff error
in pg_sleep's calculation of delay_ms. I didn't try to figure out
exactly why that's happening. It may well vary depending on the
absolute magnitude of current values of GetCurrentTimestamp(),
because I don't recall having noticed any such behavior back when
this code was written.
Anyway, I propose trying to get rid of this misbehavior by avoiding
floating point in the delay computation, as attached. With this
patch I get less surprising behavior:
regression=# \timing on
Timing is on.
regression=# do $$ begin
for i in 1..1000 loop
perform pg_sleep(0.001);
end loop; end $$;
DO
Time: 1063.997 ms (00:01.064)
regression=# do $$ begin
for i in 1..1000 loop
perform pg_sleep(0.002);
end loop; end $$;
DO
Time: 2172.849 ms (00:02.173)
The code is a little more tied to TimestampTz being measured in
microseconds than it was before, but it wouldn't really be much
harder to fix if we ever change that.
regards, tom lane
--
Vladlen Popolitov.
=?utf-8?q?=D0=9F=D0=BE=D0=BF=D0=BE=D0=BB=D0=B8=D1=82=D0=BE=D0=B2_=D0=92=D0=BB=D0=B0=D0=B4=D0=BB=D0=B5=D0=BD?= <v.popolitov@postgrespro.ru>writes: > I suspect, that any user, that run something like pg_sleep(1000000000), > start transaction, that stops autovacuum and creates other negative effects up to server crash, > and only this user can stop it by command interrupt (all signals only restart > this sleep or kill whole server). How is this different from any other long-running query? If you'd bothered to test, you'd have seen that pg_sleep() is interruptible by query cancel, just like anything else, so "kill whole server" is not required. We're not in the business of trying to limit users' resource consumption. If we wanted to do that, it'd be a very major undertaking. regards, tom lane
On Thu, Sep 25, 2025 at 2:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I chanced to notice that if you ask pg_sleep for 1ms delay, > what you actually get is 2ms, for example Oh, wow. I tested and I get the same behavior. > Anyway, I propose trying to get rid of this misbehavior by avoiding > floating point in the delay computation, as attached. Score one for my blind yet fiery hatred of floating-point arithmetic. The patch looks good except that (places tongue firmly in cheek) it will cause problems for users who want to sleep for more than 150,000 years. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > The patch looks good except that (places tongue firmly in cheek) it > will cause problems for users who want to sleep for more than 150,000 > years. We will all be safely dead before the first such bug report ;-) regards, tom lane
On Thu, Sep 25, 2025 at 02:42:32PM -0400, Tom Lane wrote: > Anyway, I propose trying to get rid of this misbehavior by avoiding > floating point in the delay computation, as attached. With this > patch I get less surprising behavior: > > [...] > > The code is a little more tied to TimestampTz being measured in > microseconds than it was before, but it wouldn't really be much > harder to fix if we ever change that. LGTM. I considered suggesting initializing the delay before the loop and then updating it at the end of the loop, but that moves the CHECK_FOR_INTERRUPTS between the delay calculation and the WaitLatch(), which seems like it might extend the sleeps a bit (although that might be negligible in practice). Otherwise, the code seems to match float8_timestamptz() somewhat closely, although I doubt it's worth trying to unify the implementations. -- nathan
Nathan Bossart <nathandbossart@gmail.com> writes: > On Thu, Sep 25, 2025 at 02:42:32PM -0400, Tom Lane wrote: >> Anyway, I propose trying to get rid of this misbehavior by avoiding >> floating point in the delay computation, as attached. > LGTM. I considered suggesting initializing the delay before the loop and > then updating it at the end of the loop, but that moves the > CHECK_FOR_INTERRUPTS between the delay calculation and the WaitLatch(), > which seems like it might extend the sleeps a bit (although that might be > negligible in practice). Yeah, I did not consider changing the fundamental logic of the loop. It's true that this implementation will use a minimum of three GetCurrentTimestamp() calls where in principle you should need only two when WaitLatch() doesn't exit early. I'm having a hard time getting excited about that though. We know that on just about all current platforms, reading the clock is a handful-of-nanoseconds operation. (cf. nearby discussion of GNU/Hurd where we found that that OS is far behind the curve; but I imagine they'll have fixed that by the time anyone would consider Hurd ready for production.) So it seems pretty negligible in a function that currently has 1ms delay resolution and is unlikely to have better than 1us resolution in the future. > Otherwise, the code seems to match > float8_timestamptz() somewhat closely, although I doubt it's worth trying > to unify the implementations. Comparing that, I guess I should write USECS_PER_SEC not 1e6. Not that it'd make a difference now, but in some hypothetical future where somebody is grepping for dependencies on 1us TimestampTz resolution, it'd help them find this one. Thanks for looking at it! regards, tom lane
On Thu, Sep 25, 2025 at 04:11:14PM -0400, Tom Lane wrote: > Nathan Bossart <nathandbossart@gmail.com> writes: >> LGTM. I considered suggesting initializing the delay before the loop and >> then updating it at the end of the loop, but that moves the >> CHECK_FOR_INTERRUPTS between the delay calculation and the WaitLatch(), >> which seems like it might extend the sleeps a bit (although that might be >> negligible in practice). > > Yeah, I did not consider changing the fundamental logic of the loop. > It's true that this implementation will use a minimum of three > GetCurrentTimestamp() calls where in principle you should need only > two when WaitLatch() doesn't exit early. I'm having a hard time > getting excited about that though. We know that on just about all > current platforms, reading the clock is a handful-of-nanoseconds > operation. (cf. nearby discussion of GNU/Hurd where we found that > that OS is far behind the curve; but I imagine they'll have fixed > that by the time anyone would consider Hurd ready for production.) > So it seems pretty negligible in a function that currently has 1ms > delay resolution and is unlikely to have better than 1us resolution > in the future. Agreed, I'm not too worried about the system calls in this case. I think I was more interested in seeing whether we could avoid the complicated float handling. Something else that seems to work is moving the initial endtime calculation to within the loop, like so: - delay = endtime - GetNowFloat(); + if (first) + { + endtime = GetNowFloat() + secs; + delay = secs; + first = false; + } + else + delay = endtime - GetNowFloat(); But, in any case, your patch still LGTM. -- nathan
Nathan Bossart <nathandbossart@gmail.com> writes: > Agreed, I'm not too worried about the system calls in this case. I think I > was more interested in seeing whether we could avoid the complicated float > handling. Something else that seems to work is moving the initial endtime > calculation to within the loop, like so: > - delay = endtime - GetNowFloat(); > + if (first) > + { > + endtime = GetNowFloat() + secs; > + delay = secs; > + first = false; > + } > + else > + delay = endtime - GetNowFloat(); Hmm ... I'm not sure I'd trust that to give the right answer if WaitLatch is interrupted so that we have to recalculate the delay. > But, in any case, your patch still LGTM. Pushed it. Again, thanks for reviewing! regards, tom lane
Hi Tom, Tom Lane писал(а) 2025-09-25 22:36: > =?utf-8?q?=D0=9F=D0=BE=D0=BF=D0=BE=D0=BB=D0=B8=D1=82=D0=BE=D0=B2_=D0=92=D0=BB=D0=B0=D0=B4=D0=BB=D0=B5=D0=BD?= > <v.popolitov@postgrespro.ru> writes: >> I suspect, that any user, that run something like >> pg_sleep(1000000000), >> start transaction, that stops autovacuum and creates other negative >> effects up to server crash, >> and only this user can stop it by command interrupt (all signals only >> restart >> this sleep or kill whole server). > > How is this different from any other long-running query? > If you'd bothered to test, you'd have seen that pg_sleep() > is interruptible by query cancel, just like anything else, > so "kill whole server" is not required. I will try to explain my worries about this. Long running queries could run long due to higher input-output, and they are limited at least by work_mem parameter. pg_sleep() is not limited by anything. > We're not in the business of trying to limit users' resource > consumption. If we wanted to do that, it'd be a very major > undertaking. For example, mssql does not have function like this, it has command WAIT FOR interval . This interval cannot be more than 24H. And this separate command does not start transaction. It is possible to include it in transaction block anyway, but if user needs pause, he get it without disturbing of other users. In our case the user gets a pause and stops autovacuum, create index concurrently and other commands relaying to transaction id for all other users - without choice to avoid this behaviour. I heard about customer, who had real trouble with this. I can understand them, because in yearly 1990s I stopped university mainframe by simple command "DISPLAY MEMORY" asking for all memory, and later explained to administrator - "I just wanted to see, what will happen". It looks like sleep has the wrong place in a function. It should be an utility command, that does not start a transaction. Best regards, Vladlen Popolitov.
It looks like sleep has the wrong place in a function. It should be
an utility command, that does not start a transaction.
Vladlen Popolitov <v.popolitov@postgrespro.ru> writes: > Tom Lane писал(а) 2025-09-25 22:36: >> How is this different from any other long-running query? > I will try to explain my worries about this. Long running queries could > run long > due to higher input-output, and they are limited at least by work_mem > parameter. > pg_sleep() is not limited by anything. So I assume for example, that you would also argue for removing all looping constructs from pl/pgsql so that users could not write infinite loops? DO $$ BEGIN LOOP END LOOP; END $$; is just as effective a blocking transaction as pg_sleep, plus it consumes a CPU. I think the right answer to concerns like this is transaction_timeout and similar features, not arbitrary restrictions. regards, tom lane
Hi David, David G. Johnston писал(а) 2025-09-26 18:08: > On Friday, September 26, 2025, Vladlen Popolitov > <v.popolitov@postgrespro.ru> wrote: > >> It looks like sleep has the wrong place in a function. It should be >> an utility command, that does not start a transaction. > > Then write that command. We’re not going to change pg_sleep. > It’s works just as advertised. It _does_ not work as it advertised. Help page states: 1) "delay execution of the server process" 2) "pg_sleep makes the current session's process sleep" In reality it stops other backends too. > True, it probably should never be > used in production. yes, it should not be used, if you know, but even advanced users do not know the effect of this function. And it can be used by any user of the server. I just pointed attention, that this function could have bigger problem with high values of parameter, not only very small, if you consider problem with small value is really problem. -- Best regards, Vladlen Popolitov.
Hi David,
David G. Johnston писал(а) 2025-09-26 18:08:On Friday, September 26, 2025, Vladlen Popolitov
<v.popolitov@postgrespro.ru> wrote:It looks like sleep has the wrong place in a function. It should be
an utility command, that does not start a transaction.
Then write that command. We’re not going to change pg_sleep.
It’s works just as advertised.
It _does_ not work as it advertised. Help page states:
1) "delay execution of the server process"
2) "pg_sleep makes the current session's process sleep"
In reality it stops other backends too.
On Fri, Sep 26, 2025 at 11:25 AM Vladlen Popolitov <v.popolitov@postgrespro.ru> wrote: > It _does_ not work as it advertised. Help page states: > 1) "delay execution of the server process" > 2) "pg_sleep makes the current session's process sleep" > In reality it stops other backends too. This thread is about fixing a bug in pg_sleep. I understand that you don't like the way that pg_sleep works and think it should work differently. I think you're wrong, and I think Tom and David are right, and I think they've been more than generous in taking time to respond to your concerns. But even if they are wrong and you are right, it's not right to try to hijack a thread about fixing a bug in pg_sleep to demand that it be redesigned. Please stop doing that. If you want to, you can start a separate thread to discuss the problem you're concerned about, but leave this one for discussion of the bug fix. -- Robert Haas EDB: http://www.enterprisedb.com