Обсуждение: GNU/Hurd portability patches

Поиск
Список
Период
Сортировка

GNU/Hurd portability patches

От
Michael Banck
Дата:
Hi,

please find attached the current patches required to get master built
and the testsuites run on Debian's hurd-i386 port. I have not had the
time to test the hurd-amd64 port in the same fashion, but will do so
next.

As mentioned in this thread[1], one needs a fairly recent kernel for the
high-resolution timers in order to avoid regression test failures, and
the buildfarm client run does not like debug_parallel_query.


Michael

[1] https://www.postgresql.org/message-id/685a3ccd.170a0220.2d27e6.2232%40mx.google.com

Вложения

Re: GNU/Hurd portability patches

От
Tom Lane
Дата:
Michael Banck <mbanck@gmx.net> writes:
> please find attached the current patches required to get master built
> and the testsuites run on Debian's hurd-i386 port. I have not had the
> time to test the hurd-amd64 port in the same fashion, but will do so
> next.

Pushed, after some fooling with the comments and commit messages.

Please go ahead and set up a Hurd buildfarm member, so that it
stays fixed.

            regards, tom lane



Re: GNU/Hurd portability patches

От
Michael Banck
Дата:
Hi,

On Tue, Jul 01, 2025 at 12:41:50PM -0400, Tom Lane wrote:
> Michael Banck <mbanck@gmx.net> writes:
> > please find attached the current patches required to get master built
> > and the testsuites run on Debian's hurd-i386 port. I have not had the
> > time to test the hurd-amd64 port in the same fashion, but will do so
> > next.
> 
> Pushed, after some fooling with the comments and commit messages.

Thanks! Also for back-patching them.

Regarding the comment,

| * If <limits.h> didn't define IOV_MAX, define our own.  X/Open requires at
| * least 16.  (GNU Hurd apparently feel that they're not bound by X/Open,
| * because they don't define this symbol at all.)

I personally don't care much about those missing limits on the Hurd, but
Thomas mentioned in
CA+hUKG+tqFVY7Fi=WBvZ6-UsATjcPNBDtphDm7YLjevm2kxSvw@mail.gmail.com (and
Samuel Thibault cited the same sentence to me now when I discussed the
commit with him) that POSIX said "A definition of one of the symbolic
constants in the following list shall be omitted from <limits.h> on
specific implementations where the corresponding value is equal to or
greater than the stated minimum, but is unspecified". So "requires at
least 16" might be a bit too strong here, AIUI.

> Please go ahead and set up a Hurd buildfarm member, so that it
> stays fixed.

Right, will look into this next.


Michael



Re: GNU/Hurd portability patches

От
Tom Lane
Дата:
Michael Banck <mbanck@gmx.net> writes:
> On Tue, Jul 01, 2025 at 12:41:50PM -0400, Tom Lane wrote:
> | * If <limits.h> didn't define IOV_MAX, define our own.  X/Open requires at
> | * least 16.  (GNU Hurd apparently feel that they're not bound by X/Open,
> | * because they don't define this symbol at all.)

> I personally don't care much about those missing limits on the Hurd, but
> Thomas mentioned in
> CA+hUKG+tqFVY7Fi=WBvZ6-UsATjcPNBDtphDm7YLjevm2kxSvw@mail.gmail.com (and
> Samuel Thibault cited the same sentence to me now when I discussed the
> commit with him) that POSIX said "A definition of one of the symbolic
> constants in the following list shall be omitted from <limits.h> on
> specific implementations where the corresponding value is equal to or
> greater than the stated minimum, but is unspecified". So "requires at
> least 16" might be a bit too strong here, AIUI.

Oh, I missed that bit of the spec.  I think "requires at least 16"
is correct anyway, but the parenthetical remark isn't really right.
Not sure if it's worth changing --- the end result is the same in
any case.

            regards, tom lane



Re: GNU/Hurd portability patches

От
Alexander Lakhin
Дата:
Hello hackers,

02.07.2025 02:53, Michael Paquier wrote:
Catching up on this thread after-the-fact, specifically looking at
29213636e6cd as I did the original check tweaked here for O_RDONLY.
Agreed that a backpatch should be OK as done.  The buildfarm looks OK
currently.

Three months later we can see a number of failures produced by that
animal on several branches, e.g. [1]:
timed out after 3600 secs
---

# +++ regress install-check in src/test/modules/test_shm_mq +++
# using postmaster on /home/demo/build-farm-19.1/buildroot/tmp/buildfarm-GBEDDQ, port 5678
---- hang ----

I've spun up Debian Hurd locally, using [2], and reproduced this, just by
running: `make -s check -C src/test/modules/test_shm_mq/` in a loop.

It's not that easy to see a backtrace of the running processes on that OS,
but with some debug logging (attached), I can see that the test backend
process or shm_mq background worker just gets stuck shortly after poll().
One example:
echo "
log_min_messages = DEBUG5
log_line_prefix = '%m [%p:%l] %q%a '
log_connections = 'true'
log_disconnections = 'true'
log_statement = 'all'
autovacuum = off
" > /tmp/extra.config

for i in {1..100}; do echo "ITERATION $i"; TEMP_CONFIG=/tmp/extra.config \
timeout 60 make -s check -C src/test/modules/test_shm_mq/ || break; done; \
ps -A | grep postgres

...

ITERATION 9
ok 1         - test_shm_mq                             14498 ms

ITERATION 10
ok 1         - test_shm_mq                             34759 ms

ITERATION 11
# +++ regress check in src/test/modules/test_shm_mq +++
# initializing database system by copying initdb template
# using temp instance on port 58928 with PID 28848
make: *** [../../../../src/makefiles/pgxs.mk:451: check] Terminated
demo     28848 p0 So    0:00.03 postgres -D /home/demo/postgresql/src/test/modules/test_shm_mq/tmp_check/data -F -c listen_addresses= -k /tmp/pg_regress-McfSYJ
demo     28849  - Ssfo  0:00.06 postgres -D /home/demo/postgresql/src/test/modules/test_shm_mq/tmp_check/data -F -c listen_addresses= -k /tmp/pg_regress-McfSYJ
demo     28850  - Ssfo  0:00.00 postgres -D /home/demo/postgresql/src/test/modules/test_shm_mq/tmp_check/data -F -c listen_addresses= -k /tmp/pg_regress-McfSYJ
demo     28851  - Ssfo  0:00.00 postgres -D /home/demo/postgresql/src/test/modules/test_shm_mq/tmp_check/data -F -c listen_addresses= -k /tmp/pg_regress-McfSYJ
demo     28852  - Ssfo  0:00.00 postgres -D /home/demo/postgresql/src/test/modules/test_shm_mq/tmp_check/data -F -c listen_addresses= -k /tmp/pg_regress-McfSYJ
demo     28853  - Ssfo  0:00.00 postgres -D /home/demo/postgresql/src/test/modules/test_shm_mq/tmp_check/data -F -c listen_addresses= -k /tmp/pg_regress-McfSYJ
demo     28855  - Ssfo  0:00.00 postgres -D /home/demo/postgresql/src/test/modules/test_shm_mq/tmp_check/data -F -c listen_addresses= -k /tmp/pg_regress-McfSYJ
demo     28856  - Ssfo  0:00.00 postgres -D /home/demo/postgresql/src/test/modules/test_shm_mq/tmp_check/data -F -c listen_addresses= -k /tmp/pg_regress-McfSYJ
demo     28862  - Ssfo  0:01.13 postgres -D /home/demo/postgresql/src/test/modules/test_shm_mq/tmp_check/data -F -c listen_addresses= -k /tmp/pg_regress-McfSYJ
demo     28866  - Ssfo  0:00.87 postgres -D /home/demo/postgresql/src/test/modules/test_shm_mq/tmp_check/data -F -c listen_addresses= -k /tmp/pg_regress-McfSYJ

The process 28866 is stuck, it doesn't response to SIGTERM.

src/test/modules/test_shm_mq/log/postmaster.log contains:
2025-09-21 05:48:00.749 BST [28862:23] pg_regress/test_shm_mq LOG:  statement: SELECT test_shm_mq(100, (select string_agg(chr(32+(random()*95)::int), '') from generate_series(1,(100+200*random())::int)), 10000, 1);

2025-09-21 05:48:00.750 BST [28848:59] DEBUG:  postmaster received pmsignal signal
2025-09-21 05:48:00.750 BST [28848:60] DEBUG:  registering background worker ""
2025-09-21 05:48:00.750 BST [28848:61] DEBUG:  assigned pm child slot 240 for bgworker
2025-09-21 05:48:00.750 BST [28848:62] DEBUG:  starting background worker process ""
2025-09-21 05:48:00.751 BST [28866:1] DEBUG:  find_in_path: trying "/home/demo/postgresql/tmp_install/usr/local/pgsql/lib/test_shm_mq"
2025-09-21 05:48:00.751 BST [28866:2] DEBUG:  find_in_path: trying "/home/demo/postgresql/tmp_install/usr/local/pgsql/lib/test_shm_mq.so"
...
!!!shm_mq_receive_bytes[28866]| before WaitLatch
!!!WaitEventSetWaitBlock POLL[28866]| rc: 1
^^^ the last message from that process ^^^

Other run:
2025-09-21 06:03:21.041 BST [29427:23] pg_regress/test_shm_mq LOG:  statement: SELECT test_shm_mq(100, (select string_agg(chr(32+(random()*95)::int), '') from generate_series(1,(100+200*random())::int)), 10000, 1);

2025-09-21 06:03:21.042 BST [29412:64] DEBUG:  postmaster received pmsignal signal
2025-09-21 06:03:21.042 BST [29412:65] DEBUG:  registering background worker ""
2025-09-21 06:03:21.042 BST [29412:66] DEBUG:  assigned pm child slot 239 for bgworker
2025-09-21 06:03:21.042 BST [29412:67] DEBUG:  starting background worker process ""
2025-09-21 06:03:21.043 BST [29431:1] DEBUG:  find_in_path: trying "/home/demo/postgresql/tmp_install/usr/local/pgsql/lib/test_shm_mq"
2025-09-21 06:03:21.043 BST [29431:2] DEBUG:  find_in_path: trying "/home/demo/postgresql/tmp_install/usr/local/pgsql/lib/test_shm_mq.so"

!!!WaitEventSetWaitBlock POLL[29431]| rc: 1
!!!WaitEventSetWaitBlock[29431] 1
!!!WaitEventSetWaitBlock[29431] 2
!!!WaitEventSetWaitBlock[29431] 3
!!!WaitEventSetWaitBlock[29431] 4
!!!WaitEventSetWaitBlock[29431] 5
^^^ the last message from the process 29431 ^^^

So it seems to me that Hurd is not mature enough yet to test Postgres.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-09-19%2007%3A29%3A06
[2] https://cdimage.debian.org/cdimage/ports/latest/hurd-amd64/debian-hurd-amd64-20250807.img.tar.xz

Best regards,
Alexander
Вложения

Re: GNU/Hurd portability patches

От
Michael Banck
Дата:
Hi,

On Sun, Sep 21, 2025 at 09:00:00AM +0300, Alexander Lakhin wrote:
> 02.07.2025 02:53, Michael Paquier wrote:
> > Catching up on this thread after-the-fact, specifically looking at
> > 29213636e6cd as I did the original check tweaked here for O_RDONLY.
> > Agreed that a backpatch should be OK as done.  The buildfarm looks OK
> > currently.
> 
> Three months later we can see a number of failures produced by that
> animal on several branches, e.g. [1]:
> timed out after 3600 secs

Right, I've noticed them as well of course, but did not have time to
take a closer look yet. This timeout in test_shm_mq happens on 32bit
hurd-i386 as well, btw.

> It's not that easy to see a backtrace of the running processes on that OS,
> but with some debug logging (attached), I can see that the test backend
> process or shm_mq background worker just gets stuck shortly after poll().

Thanks for taking a deeper look.

> So it seems to me that Hurd is not mature enough yet to test Postgres.

That is a bit harsh; this issue should be looked into, but I would not
say it is not mature enough to test Postgres.


Michael



Re: GNU/Hurd portability patches

От
Michael Paquier
Дата:
On Sun, Sep 21, 2025 at 09:00:00AM +0300, Alexander Lakhin wrote:
> So it seems to me that Hurd is not mature enough yet to test Postgres.

I'd say that it is likely not mature enough for production.  In terms
of testing, that seems kind of OK.  However, failures like the one you
are reporting here bring noise in the buildfarm, meaning that we would
perhaps tend to ignore reports that are in fact legit because we don't
really know what would be Hurd-related or Postgres-related.  Or we
could get mixes of both.
--
Michael

Вложения

Re: GNU/Hurd portability patches

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> On Sun, Sep 21, 2025 at 09:00:00AM +0300, Alexander Lakhin wrote:
>> So it seems to me that Hurd is not mature enough yet to test Postgres.

> I'd say that it is likely not mature enough for production.  In terms
> of testing, that seems kind of OK.  However, failures like the one you
> are reporting here bring noise in the buildfarm, meaning that we would
> perhaps tend to ignore reports that are in fact legit because we don't
> really know what would be Hurd-related or Postgres-related.  Or we
> could get mixes of both.

Yeah, I think the tendency would be to write off any failures as
"Hurd teething pains" unless there are similar reports from other
animals.  As long as the failure rate is pretty low, I'm okay
with that.

            regards, tom lane



Re: GNU/Hurd portability patches

От
Michael Banck
Дата:
On Mon, Sep 22, 2025 at 07:02:25AM +0900, Michael Paquier wrote:
> On Sun, Sep 21, 2025 at 09:00:00AM +0300, Alexander Lakhin wrote:
> > So it seems to me that Hurd is not mature enough yet to test Postgres.
> 
> I'd say that it is likely not mature enough for production.  In terms
> of testing, that seems kind of OK. 

Ack.

> However, failures like the one you are reporting here bring noise in
> the buildfarm, meaning that we would perhaps tend to ignore reports
> that are in fact legit because we don't really know what would be
> Hurd-related or Postgres-related. 

I will keep an eye on it.

There have been two (infrequent) failures in the isoloation tests as
well, which I haven't had time to investigate further:

In PG15:

|test multiple-row-versions        ... FAILED (test process exited with exit code 1)   145260 ms

This one is a segfault, it happened twice so far and only on REL_15_STABLE:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-09-18%2007%3A57%3A40
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-09-03%2007%3A36%3A45

|setup failed: server closed the connection unexpectedly
|    This probably means the server terminated abnormally
|    before or while processing the request.

|/hurd/crash: [...]/buildroot/REL_15_STABLE/inst/bin/postgres -D data-C(25142) crashed, signal {no:11, code:2,
error:2},exception {1, code:2, subcode:0}, PCs: {
 
| 0x1019b4d34 0x1028ee2a0 0x10249e1ac 0x1025da71f 0x102596611 0x10049fe63,
| 0x102497aec 0x1028e674e 0x1024ada52 0x1024aeb2a 0x1024a785b 0x10291f5b7 0x10291f
| 625 0x1024c7242 0x1024986a2 0x1024c72c0,
| 0x102497aec 0x102572ace
| }, killing task

In PG17:

|not ok 98    - stats                                    2100 ms

|diff -U3 buildroot/REL_17_STABLE/pgsql.build/src/test/isolation/expected/stats_1.out
buildroot/REL_17_STABLE/pgsql.build/src/test/isolation/output_iso/results/stats.out
|--- buildroot/REL_17_STABLE/pgsql.build/src/test/isolation/expected/stats_1.out    2025-09-15 22:06:24.000000000
+0100
|+++ buildroot/REL_17_STABLE/pgsql.build/src/test/isolation/output_iso/results/stats.out    2025-09-15
22:23:05.000000000+0100
 
|@@ -1445,7 +1445,7 @@
| 
| name          |pg_stat_get_function_calls|total_above_zero|self_above_zero
| --------------+--------------------------+----------------+---------------
|-test_stat_func|                         1|t               |t              
|+test_stat_func|                         1|f               |f              
| (1 row)

This one happened twice as well, and so far only on REL_17_STABLE:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-09-15%2021%3A06%3A17
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-09-13%2008%3A04%3A05

This might be due to the HPET timer not always being monotonic - there
has been an independent report via the Debian autobuilder and a GNU Mach
fix was committed last night, I'll check whether this can be
reproduced/confirmed-fixed with this change:

https://lists.gnu.org/archive/html/bug-hurd/2025-09/msg00020.html
https://salsa.debian.org/hurd-team/gnumach/-/commit/06079a8d212817ee0365f318bd90b67bf56bfb06


Michael



Re: GNU/Hurd portability patches

От
Alexander Lakhin
Дата:
Hello Michael,

Thank you for paying attention to this!

Maybe I was wrong and we can at least categorize these failures -- I hope
their number is finite, but my point was that it's hardly possible to use
the information, that fruitcrow gives us, to improve Postgres.

22.09.2025 10:22, Michael Banck wrote:
On Mon, Sep 22, 2025 at 07:02:25AM +0900, Michael Paquier wrote:

However, failures like the one you are reporting here bring noise in
the buildfarm, meaning that we would perhaps tend to ignore reports
that are in fact legit because we don't really know what would be
Hurd-related or Postgres-related. 
I will keep an eye on it.

There have been two (infrequent) failures in the isoloation tests as
well, which I haven't had time to investigate further:


In PG17:

|not ok 98    - stats                                    2100 ms

|diff -U3 buildroot/REL_17_STABLE/pgsql.build/src/test/isolation/expected/stats_1.out buildroot/REL_17_STABLE/pgsql.build/src/test/isolation/output_iso/results/stats.out
|--- buildroot/REL_17_STABLE/pgsql.build/src/test/isolation/expected/stats_1.out	2025-09-15 22:06:24.000000000 +0100
|+++ buildroot/REL_17_STABLE/pgsql.build/src/test/isolation/output_iso/results/stats.out	2025-09-15 22:23:05.000000000 +0100
|@@ -1445,7 +1445,7 @@
| 
| name          |pg_stat_get_function_calls|total_above_zero|self_above_zero
| --------------+--------------------------+----------------+---------------
|-test_stat_func|                         1|t               |t              
|+test_stat_func|                         1|f               |f              
| (1 row)

This one happened twice as well, and so far only on REL_17_STABLE:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-09-15%2021%3A06%3A17
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-09-13%2008%3A04%3A05

This might be due to the HPET timer not always being monotonic - there
has been an independent report via the Debian autobuilder and a GNU Mach
fix was committed last night, I'll check whether this can be
reproduced/confirmed-fixed with this change:

https://lists.gnu.org/archive/html/bug-hurd/2025-09/msg00020.html
https://salsa.debian.org/hurd-team/gnumach/-/commit/06079a8d212817ee0365f318bd90b67bf56bfb06

I reproduced the issue locally and found that
        /* total elapsed time in this function call */
        INSTR_TIME_SET_CURRENT(total);
        INSTR_TIME_SUBTRACT(total, fcu->start);
sometimes gives total.ticks = 0.

I tried the test program from [2] and got on my VM:
went backwards 0 out of 10000000 times
(three times)

But I've created my own test program (see attached), which shows:
for i in {1..1000}; do printf "ITERATION $i "; ./tt 100 || break; done
 ITERATION 1 t1: 55873639081080, t2: 55873639084090, t2 - t1: 3010 (r: 4950)
 ITERATION 2 t1: 55873641019440, t2: 55873641025700, t2 - t1: 6260 (r: 4950)
 ITERATION 3 t1: 55873642794200, t2: 55873642797130, t2 - t1: 2930 (r: 4950)
...
 ITERATION 23 t1: 55873675001590, t2: 55873675001590, t2 - t1: 0 (r: 4950)

I don't know how to test the patch committed, but if you can, that would
be nice.

Best regards,
Alexander
Вложения

Re: GNU/Hurd portability patches

От
Michael Paquier
Дата:
On Mon, Sep 22, 2025 at 11:30:00PM +0300, Alexander Lakhin wrote:
> I reproduced the issue locally and found that
>         /* total elapsed time in this function call */
>         INSTR_TIME_SET_CURRENT(total);
>         INSTR_TIME_SUBTRACT(total, fcu->start);
> sometimes gives total.ticks = 0.
>
> I tried the test program from [2] and got on my VM:
> went backwards 0 out of 10000000 times
> (three times)
>
> But I've created my own test program (see attached), which shows:
> for i in {1..1000}; do printf "ITERATION $i "; ./tt 100 || break; done
>  ITERATION 1 t1: 55873639081080, t2: 55873639084090, t2 - t1: 3010 (r: 4950)
>  ITERATION 2 t1: 55873641019440, t2: 55873641025700, t2 - t1: 6260 (r: 4950)
>  ITERATION 3 t1: 55873642794200, t2: 55873642797130, t2 - t1: 2930 (r: 4950)
> ...
>  ITERATION 23 t1: 55873675001590, t2: 55873675001590, t2 - t1: 0 (r: 4950)
>
> I don't know how to test the patch committed, but if you can, that would
> be nice.

We've had this exact same issue of a clock going backwards with one of
the netbsd animals on an older version not supported anymore by
upstream and that has been kicked out of the buildfarm, as far as I
recall.  This has created some disturbance in the regression tests
causing EXPLAIN plan outputs we did not expect, in terms of extra
negative signs and the animal showing red periodically.  So yes, this
random factor would be annoying in the buildfarm.
--
Michael

Вложения

Re: GNU/Hurd portability patches

От
Michael Banck
Дата:
Hi,

On Mon, Sep 22, 2025 at 11:30:00PM +0300, Alexander Lakhin wrote:
> Maybe I was wrong and we can at least categorize these failures -- I hope
> their number is finite, but my point was that it's hardly possible to use
> the information, that fruitcrow gives us, to improve Postgres.

Or, for that matter, to improve GNU Mach/Hurd...
 
> 22.09.2025 10:22, Michael Banck wrote:
> > There have been two (infrequent) failures in the isoloation tests as
> > well, which I haven't had time to investigate further:
> > 
> > In PG17:
> > 
> > |not ok 98    - stats                                    2100 ms
> > 
> > |diff -U3 buildroot/REL_17_STABLE/pgsql.build/src/test/isolation/expected/stats_1.out
buildroot/REL_17_STABLE/pgsql.build/src/test/isolation/output_iso/results/stats.out
> > |--- buildroot/REL_17_STABLE/pgsql.build/src/test/isolation/expected/stats_1.out    2025-09-15 22:06:24.000000000
+0100
> > |+++ buildroot/REL_17_STABLE/pgsql.build/src/test/isolation/output_iso/results/stats.out    2025-09-15
22:23:05.000000000+0100
 
> > |@@ -1445,7 +1445,7 @@
> > |
> > | name          |pg_stat_get_function_calls|total_above_zero|self_above_zero
> > | --------------+--------------------------+----------------+---------------
> > |-test_stat_func|                         1|t               |t
> > |+test_stat_func|                         1|f               |f
> > | (1 row)
> > 
> > This one happened twice as well, and so far only on REL_17_STABLE:
> > 
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-09-15%2021%3A06%3A17
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-09-13%2008%3A04%3A05
> > 
> > This might be due to the HPET timer not always being monotonic - there
> > has been an independent report via the Debian autobuilder and a GNU Mach
> > fix was committed last night, I'll check whether this can be
> > reproduced/confirmed-fixed with this change:
> > 
> > https://lists.gnu.org/archive/html/bug-hurd/2025-09/msg00020.html
> > https://salsa.debian.org/hurd-team/gnumach/-/commit/06079a8d212817ee0365f318bd90b67bf56bfb06
> 
> I reproduced the issue locally and found that
>         /* total elapsed time in this function call */
>         INSTR_TIME_SET_CURRENT(total);
>         INSTR_TIME_SUBTRACT(total, fcu->start);
> sometimes gives total.ticks = 0.
> 
> I tried the test program from [2] and got on my VM:
> went backwards 0 out of 10000000 times
> (three times)
> 
> But I've created my own test program (see attached), which shows:
> for i in {1..1000}; do printf "ITERATION $i "; ./tt 100 || break; done
>  ITERATION 1 t1: 55873639081080, t2: 55873639084090, t2 - t1: 3010 (r: 4950)
>  ITERATION 2 t1: 55873641019440, t2: 55873641025700, t2 - t1: 6260 (r: 4950)
>  ITERATION 3 t1: 55873642794200, t2: 55873642797130, t2 - t1: 2930 (r: 4950)
> ...
>  ITERATION 23 t1: 55873675001590, t2: 55873675001590, t2 - t1: 0 (r: 4950)
> 
> I don't know how to test the patch committed, but if you can, that would
> be nice.

Thanks for the test. I ran the stats test with the GNU Mach patch and
while it seemed to help, it did error out eventually. However, your test
case is much better/faster and I also see 0 deltas after a few hundred
to a few thousand iterations.

I'll report that on their development list, looks like they have not
plugged all the holes yet..


Michael



Re: GNU/Hurd portability patches

От
Michael Banck
Дата:
Hi,

On Wed, Sep 24, 2025 at 08:31:27AM +0900, Michael Paquier wrote:
> We've had this exact same issue of a clock going backwards with one of
> the netbsd animals on an older version not supported anymore by
> upstream and that has been kicked out of the buildfarm, as far as I
> recall.  This has created some disturbance in the regression tests
> causing EXPLAIN plan outputs we did not expect, in terms of extra
> negative signs and the animal showing red periodically.  

This was the case initially on 32bit Hurd until I configured it to use
APIC (which is a requirement for HPET timers). So the clock is no longer
going backwards in an obvious way; and apart from the stats isolation
test, I have not seen issues in this area.

> So yes, this random factor would be annoying in the buildfarm.

How much timer resolution do we require from the system? GNU Mach seems
to (at least try to) guarantee that the timer won't go backwards, but it
does not guarantee (currently) that two consecutive clock_gettime()
calls will return something different in all cases.


Michael



Re: GNU/Hurd portability patches

От
Michael Banck
Дата:
On Wed, Sep 24, 2025 at 09:41:19AM +0200, Michael Banck wrote:
> On Wed, Sep 24, 2025 at 08:31:27AM +0900, Michael Paquier wrote:
> > So yes, this random factor would be annoying in the buildfarm.
> 
> How much timer resolution do we require from the system? GNU Mach seems
> to (at least try to) guarantee that the timer won't go backwards, but it
> does not guarantee (currently) that two consecutive clock_gettime()
> calls will return something different in all cases.

This is the pg_test_timing output on my hurd-i386 VM with
pg_test_timing from HEAD:

Average loop time including overhead: 13866,64 ns
Histogram of timing durations:
   <= ns   % of total  running %      count
       0       0,0510     0,0510        122
       1       0,0000     0,0510          0
       3       0,0000     0,0510          0
       7       0,0000     0,0510          0
      15       0,0000     0,0510          0
      31       0,0000     0,0510          0
      63       0,0000     0,0510          0
     127       0,0000     0,0510          0
     255       0,0000     0,0510          0
     511       0,0000     0,0510          0
    1023       0,0004     0,0514          1
    2047       0,0000     0,0514          0
    4095      98,9320    98,9834     236681
    8191       0,8845    99,8679       2116
   16383       0,0393    99,9072         94
   32767       0,0343    99,9415         82
[...]

Observed timing durations up to 99,9900%:
      ns   % of total  running %      count
       0       0,0510     0,0510        122
     729       0,0004     0,0514          1
    3519       0,0004     0,0518          1
    3630       0,0130     0,0648         31
    3640       0,1651     0,2299        395
    3650       0,7449     0,9748       1782
    3660       2,3395     3,3143       5597

Clearly those aren't very precise (running Debian 13 GNU/Linux on the
same host in the same qemu/kvm fashion, I get an average loop time
including overhead of around 30ns), but I assumed that the 122 0ns
entries would be the problem; however Hannu reported back in 2024 that
he saw something similar on his Macbook Air M1:
https://www.postgresql.org/message-id/CAMT0RQSbzeJN+nPo_QXib-P62rgez=dJxoaTURcN1FYPoLpQPg@mail.gmail.com

|Per loop time including overhead: 21.54 ns
|Histogram of timing durations:
|   <= ns   % of total  running %      count
|       0      49.1655    49.1655   68481688
|       1       0.0000    49.1655          0
|       3       0.0000    49.1655          0
|       7       0.0000    49.1655          0
|      15       0.0000    49.1655          0
|      31       0.0000    49.1655          0
|      63      50.6890    99.8545   70603742
|     127       0.1432    99.9976     199411
|     255       0.0015    99.9991       2065

I wonder what is going on here, was that a fluke or is that not related
to the stats isolation test failure after all? Anybody else tried the
updated pg_test_timing on Apple hardware and could possibly run the tt.c
test case from Alexander?

btw, the stats test failed in a similar way on hamerkop (Windows Server
2016) once, 35 days ago:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamerkop&dt=2025-08-19%2013%3A56%3A17


Michael



Re: GNU/Hurd portability patches

От
Tom Lane
Дата:
Michael Banck <mbanck@gmx.net> writes:
> How much timer resolution do we require from the system? GNU Mach seems
> to (at least try to) guarantee that the timer won't go backwards, but it
> does not guarantee (currently) that two consecutive clock_gettime()
> calls will return something different in all cases.

I think it is reasonable to require the clock to not go backwards
during a test run, but it's not at all reasonable to require the
clock to advance by more than zero between two successive readings.

We used to encounter the no-advance case all the time, back when
machines had clock resolutions measured in milliseconds.  It's
relatively rare now though, so it's possible that some test case
has crept in that expects that.  But I'd call it a bug in the
test case if so.

It'd be interesting to see the output of a pg_test_timing run
from your Hurd machine.

            regards, tom lane



Re: GNU/Hurd portability patches

От
Tom Lane
Дата:
Michael Banck <mbanck@gmx.net> writes:
> This is the pg_test_timing output on my hurd-i386 VM with
> pg_test_timing from HEAD:

Ask and ye shall receive ... sorry for not reading the whole thread.

> I wonder what is going on here, was that a fluke or is that not related
> to the stats isolation test failure after all? Anybody else tried the
> updated pg_test_timing on Apple hardware and could possibly run the tt.c
> test case from Alexander?

Yeah, I see zero-ns outputs on a couple different Apple M-series
machines.  This is the output on the M4 Mini that runs the sifaka
and indri BF animals:

$ pg_test_timing
Testing timing overhead for 3 seconds.
Average loop time including overhead: 17.22 ns
Histogram of timing durations:
   <= ns   % of total  running %      count
       0      58.8235    58.8235  102495613
       1       0.0000    58.8235          0
       3       0.0000    58.8235          0
       7       0.0000    58.8235          0
      15       0.0000    58.8235          0
      31       0.0000    58.8235          0
      63      41.1229    99.9464   71653499
     127       0.0502    99.9966      87421
     255       0.0026    99.9992       4522
     511       0.0000    99.9992         56
    1023       0.0001    99.9993        117
    2047       0.0001    99.9994        164
    4095       0.0003    99.9997        558
    8191       0.0003   100.0000        501
   16383       0.0000   100.0000         50
   32767       0.0000   100.0000         17
   65535       0.0000   100.0000          0
  131071       0.0000   100.0000          1

Observed timing durations up to 99.9900%:
      ns   % of total  running %      count
       0      58.8235    58.8235  102495613
      41      13.7077    72.5313   23884717
      42      27.4151    99.9464   47768782
      83       0.0277    99.9741      48304
      84       0.0140    99.9881      24425
     125       0.0084    99.9966      14692
...
  107083       0.0000   100.0000          1

Those animals are not showing failures, so we can't blame
"clock didn't advance" as a problem in itself.  However,
the thing that jumps out at me from your results is that
the clock resolution seems to be only 3 to 4 us on Hurd:

> Histogram of timing durations:
>    <= ns   % of total  running %      count
>        0       0,0510     0,0510        122
>        1       0,0000     0,0510          0
>        3       0,0000     0,0510          0
>        7       0,0000     0,0510          0
>       15       0,0000     0,0510          0
>       31       0,0000     0,0510          0
>       63       0,0000     0,0510          0
>      127       0,0000     0,0510          0
>      255       0,0000     0,0510          0
>      511       0,0000     0,0510          0
>     1023       0,0004     0,0514          1
>     2047       0,0000     0,0514          0
>     4095      98,9320    98,9834     236681
>     8191       0,8845    99,8679       2116

It seems plausible that the execution time of the stats
test's function-under-test is so short that it sometimes
doesn't register as more than zero on a machine with poor
clock resolution.  It looks like that test only calls the
test function once or twice before checking that it's
accumulated some runtime, and the test function is nothing
more than

    CREATE FUNCTION test_stat_func() RETURNS VOID LANGUAGE plpgsql AS $$BEGIN END;$$;

I'd call this a bug in that test TBH.  It'd be saner to
make the function do something like pg_sleep for 1ms.

            regards, tom lane



Re: GNU/Hurd portability patches

От
Alexander Lakhin
Дата:
Hello Michael and Tom,

Thank you for spending time on this!

24.09.2025 13:45, Michael Banck wrote:
btw, the stats test failed in a similar way on hamerkop (Windows Server
2016) once, 35 days ago:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamerkop&dt=2025-08-19%2013%3A56%3A17

Yes, and all of such failures are counted at [1].
And we had discussed that (Windows-specific) anomaly too: [2]. Probably
something has changed in that environment, so that we see no such failures
for the last month.

How much timer resolution do we require from the system? GNU Mach seems
to (at least try to) guarantee that the timer won't go backwards, but it
does not guarantee (currently) that two consecutive clock_gettime()
calls will return something different in all cases.

Regarding the lowest timer resolution, as I mentioned at [3], 32k_counter
gives only 0.030517 sec...

[1] https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures#subscription.sql_sporadically_fails_on_hamerkop_due_to_zero_time_difference
[2] https://www.postgresql.org/message-id/CANhcyEX4hH9POyTM3vh%3D58newEF0%3DqgK46xF5i-RDir2zAZ4og%40mail.gmail.com
[3] https://www.postgresql.org/message-id/af1d252c-738f-46ab-99c6-a00e0d65aa04%40gmail.com

Best regards,
Alexander

Re: GNU/Hurd portability patches

От
Tom Lane
Дата:
Alexander Lakhin <exclusion@gmail.com> writes:
> 24.09.2025 13:45, Michael Banck wrote:
>> How much timer resolution do we require from the system? GNU Mach seems
>> to (at least try to) guarantee that the timer won't go backwards, but it
>> does not guarantee (currently) that two consecutive clock_gettime()
>> calls will return something different in all cases.

> Regarding the lowest timer resolution, as I mentioned at [3], 32k_counter
> gives only 0.030517 sec...

We are currently doing a short pg_test_timing run in every BF run,
but with only a cursory regex-based sanity check on the output.
Since it's a TAP test, we could easily report the full output in
the TAP log without causing problems.  I was already thinking about
doing that, and if there's some question about the minimum expected
timer resolution then it's really silly to not be capturing that
data.

I will go do that, and in a few day's time we should have enough
reports to see what we can realistically expect.

            regards, tom lane



Re: GNU/Hurd portability patches

От
Michael Banck
Дата:
Hi,

On Wed, Sep 24, 2025 at 10:28:46AM -0400, Tom Lane wrote:
> It seems plausible that the execution time of the stats
> test's function-under-test is so short that it sometimes
> doesn't register as more than zero on a machine with poor
> clock resolution.  It looks like that test only calls the
> test function once or twice before checking that it's
> accumulated some runtime, and the test function is nothing
> more than
> 
>     CREATE FUNCTION test_stat_func() RETURNS VOID LANGUAGE plpgsql AS $$BEGIN END;$$;
> 
> I'd call this a bug in that test TBH.  It'd be saner to
> make the function do something like pg_sleep for 1ms.

I did that in the attached, so far my Hurd VM ran the stats test more
than 1000 times without a failure with it. I have the loop running till
10000, I'll report back tomorrow.


Michael

Вложения

Re: GNU/Hurd portability patches

От
Alexander Lakhin
Дата:
24.09.2025 18:52, Michael Banck wrote:
On Wed, Sep 24, 2025 at 10:28:46AM -0400, Tom Lane wrote:
It seems plausible that the execution time of the stats
test's function-under-test is so short that it sometimes
doesn't register as more than zero on a machine with poor
clock resolution.  It looks like that test only calls the
test function once or twice before checking that it's
accumulated some runtime, and the test function is nothing
more than
    CREATE FUNCTION test_stat_func() RETURNS VOID LANGUAGE plpgsql AS $$BEGIN END;$$;

I'd call this a bug in that test TBH.  It'd be saner to
make the function do something like pg_sleep for 1ms.
I did that in the attached, so far my Hurd VM ran the stats test more
than 1000 times without a failure with it. I have the loop running till
10000, I'll report back tomorrow.

If the stats test could be fixed this way, I wonder how to deal with
regress/subscription.sql. When running:
TESTS="$(printf "subscription %.0s" `seq 1000`)" make -s check-tests

on the same Hurd VM, I'm observing:
...
ok 986       - subscription                               53 ms
not ok 987   - subscription                               53 ms
ok 988       - subscription                               53 ms
...
# 4 of 1000 tests failed.
# The differences that caused some tests to fail can be viewed in the file "/home/demo/postgresql/src/test/regress/regression.diffs".

$ cat "/home/demo/postgresql/src/test/regress/regression.diffs"

--- /home/demo/postgresql/src/test/regress/expected/subscription.out    2025-09-24 19:49:53.000000000 +0100
+++ /home/demo/postgresql/src/test/regress/results/subscription.out     2025-09-24 20:06:48.000000000 +0100
@@ -70,7 +70,7 @@
 SELECT :'prev_stats_reset' < stats_reset FROM pg_stat_subscription_stats WHERE subname = 'regress_testsub';
  ?column?
 ----------
- t
+ f
 (1 row)
 
 -- fail - name already exists
diff -U3 /home/demo/postgresql/src/test/regress/expected/subscription.out /home/demo/postgresql/src/test/regress/results/subscription.out
--- /home/demo/postgresql/src/test/regress/expected/subscription.out    2025-09-24 19:49:53.000000000 +0100
+++ /home/demo/postgresql/src/test/regress/results/subscription.out     2025-09-24 20:07:13.000000000 +0100
@@ -70,7 +70,7 @@
 SELECT :'prev_stats_reset' < stats_reset FROM pg_stat_subscription_stats WHERE subname = 'regress_testsub';
  ?column?
 ----------
- t
+ f
 (1 row)
 
 -- fail - name already exists
diff -U3 /home/demo/postgresql/src/test/regress/expected/subscription.out /home/demo/postgresql/src/test/regress/results/subscription.out
--- /home/demo/postgresql/src/test/regress/expected/subscription.out    2025-09-24 19:49:53.000000000 +0100
+++ /home/demo/postgresql/src/test/regress/results/subscription.out     2025-09-24 20:07:28.000000000 +0100
@@ -70,7 +70,7 @@
 SELECT :'prev_stats_reset' < stats_reset FROM pg_stat_subscription_stats WHERE subname = 'regress_testsub';
  ?column?
 ----------
- t
+ f
 (1 row)
 
 -- fail - name already exists
diff -U3 /home/demo/postgresql/src/test/regress/expected/subscription.out /home/demo/postgresql/src/test/regress/results/subscription.out
--- /home/demo/postgresql/src/test/regress/expected/subscription.out    2025-09-24 19:49:53.000000000 +0100
+++ /home/demo/postgresql/src/test/regress/results/subscription.out     2025-09-24 20:07:33.000000000 +0100
@@ -70,7 +70,7 @@
 SELECT :'prev_stats_reset' < stats_reset FROM pg_stat_subscription_stats WHERE subname = 'regress_testsub';
  ?column?
 ----------
- t
+ f
 (1 row)
 
 -- fail - name already exists


Best regards,
Alexander

Re: GNU/Hurd portability patches

От
Michael Banck
Дата:
Hi,

On Wed, Sep 24, 2025 at 11:00:00PM +0300, Alexander Lakhin wrote:
> 24.09.2025 18:52, Michael Banck wrote:
> > I did that in the attached, so far my Hurd VM ran the stats test more
> > than 1000 times without a failure with it. I have the loop running till
> > 10000, I'll report back tomorrow.

(it ran for 7000 iterations without fault so far)
 
> If the stats test could be fixed this way, I wonder how to deal with
> regress/subscription.sql. When running:
> TESTS="$(printf "subscription %.0s" `seq 1000`)" make -s check-tests
> 
> on the same Hurd VM, I'm observing:
> ...
> ok 986       - subscription                               53 ms
> not ok 987   - subscription                               53 ms
> ok 988       - subscription                               53 ms
> ...
> # 4 of 1000 tests failed.

I ran that five times now without a problem, both with and without the
Mach patch I mentioned earlier, and on 32 and 64 bit. Not sure what is
going on here.

> # The differences that caused some tests to fail can be viewed in the file
> "/home/demo/postgresql/src/test/regress/regression.diffs".
> 
> $ cat "/home/demo/postgresql/src/test/regress/regression.diffs"
> 
> --- /home/demo/postgresql/src/test/regress/expected/subscription.out 2025-09-24 19:49:53.000000000 +0100
> +++ /home/demo/postgresql/src/test/regress/results/subscription.out 2025-09-24 20:06:48.000000000 +0100
> @@ -70,7 +70,7 @@
>  SELECT :'prev_stats_reset' < stats_reset FROM pg_stat_subscription_stats WHERE subname = 'regress_testsub';
>   ?column?
>  ----------
> - t
> + f
>  (1 row)

I saw those issues frequently on the initial 32bit Hurd VM I started to
run the buildfarm code on, before I switched it to HPET timers. Since
then, I don't think I saw that particular error again, but 4 out 1000 is
not a lot of course.

 
Michael



Re: GNU/Hurd portability patches

От
Alexander Lakhin
Дата:
Hello Michael,

25.09.2025 00:22, Michael Banck wrote:
I ran that five times now without a problem, both with and without the
Mach patch I mentioned earlier, and on 32 and 64 bit. Not sure what is
going on here.

Maybe you're running it against REL_15_STABLE, where this test case is
absent... (I tested that on REL_18_STABLE.) I don't know what can prevent
the test case from failing if the underlying defect is still here.

I saw those issues frequently on the initial 32bit Hurd VM I started to
run the buildfarm code on, before I switched it to HPET timers. Since
then, I don't think I saw that particular error again, but 4 out 1000 is
not a lot of course.

There is also contrib/pg_stat_statements/entry_timestamp, which fails for
me when running in a loop:
for i in `seq 100`; do echo "ITERATION $i"; NO_TEMP_INSTALL=1 make -s check -C contrib/pg_stat_statements || break; done

on iterations 42, 60, 12, 5, 28:
ITERATION 28
...
ok 8         - wal                                        14 ms
not ok 9     - entry_timestamp                            14 ms
ok 10        - privileges                                 16 ms
...
1..15
# 1 of 15 tests failed.
# The differences that caused some tests to fail can be viewed in the file "/tst/postgresql/contrib/pg_stat_statements/regression.diffs".

$ cat contrib/pg_stat_statements/regression.diffs
diff -U3 /tst/postgresql/contrib/pg_stat_statements/expected/entry_timestamp.out /tst/postgresql/contrib/pg_stat_statements/results/entry_timestamp.out
--- /tst/postgresql/contrib/pg_stat_statements/expected/entry_timestamp.out     2025-09-25 04:26:23.000000000 +0100
+++ /tst/postgresql/contrib/pg_stat_statements/results/entry_timestamp.out      2025-09-25 04:50:43.000000000 +0100
@@ -147,7 +147,7 @@
 WHERE query LIKE '%STMTTS%';
  total | minmax_exec_zero | minmax_ts_after_ref | stats_since_after_ref
 -------+------------------+---------------------+-----------------------
-     2 |                1 |                   2 |                     0
+     2 |                2 |                   2 |                     0
 (1 row)
 
 -- Cleanup

Best regards,
Alexander

Re: GNU/Hurd portability patches

От
Michael Banck
Дата:
Hi,

On Thu, Sep 25, 2025 at 08:00:00AM +0300, Alexander Lakhin wrote:
> 25.09.2025 00:22, Michael Banck wrote:
> > I ran that five times now without a problem, both with and without the
> > Mach patch I mentioned earlier, and on 32 and 64 bit. Not sure what is
> > going on here.
> 
> Maybe you're running it against REL_15_STABLE, where this test case is
> absent... (I tested that on REL_18_STABLE.) I don't know what can prevent
> the test case from failing if the underlying defect is still here.

No, I ran it on HEAD an REL_18_STABLE looks similar.

> There is also contrib/pg_stat_statements/entry_timestamp, which fails for
> me when running in a loop:
> for i in `seq 100`; do echo "ITERATION $i"; NO_TEMP_INSTALL=1 make -s check -C contrib/pg_stat_statements || break;
done

The two test cases above failed frequently for me before I switched to
HPET timers, but on amd64 (which you are running as I understand), those
are activated by default, so this should not be a problem.

What does pg_test_timing from HEAD report as "Average loop time
including overhead" for your VM?


Michael



Re: GNU/Hurd portability patches

От
Michael Banck
Дата:
Hi,

On Wed, Sep 24, 2025 at 05:52:02PM +0200, Michael Banck wrote:
> I did that in the attached, so far my Hurd VM ran the stats test more
> than 1000 times without a failure with it. I have the loop running till
> 10000, I'll report back tomorrow.

For the record, the stats test ran 10000 times without a failure with
that patch applied.


Michael



Re: GNU/Hurd portability patches

От
Tom Lane
Дата:
Michael Banck <mbanck@gmx.net> writes:
> On Wed, Sep 24, 2025 at 05:52:02PM +0200, Michael Banck wrote:
>> I did that in the attached, so far my Hurd VM ran the stats test more
>> than 1000 times without a failure with it. I have the loop running till
>> 10000, I'll report back tomorrow.

> For the record, the stats test ran 10000 times without a failure with
> that patch applied.

Okay.  Elsewhere on the playing field, 32 buildfarm animals have now
reported in with full pg_test_timing output.  (I'd thought we might
get more, but apparently there's still only a minority of animals
configured with --enable-tap-tests.)  Here's the URLs for those runs,
along with my notes about what the results look like:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=alligator&dt=2025-09-25%2014%3A02%3A34
Ubuntu/x86_64: timing seems to be ns-precise

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=basilisk&dt=2025-09-25%2011%3A01%3A29
Alpine Linux/x86_64: looks to have 10ns ticks

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=batta&dt=2025-09-25%2014%3A05%3A01
Debian/aarch64: looks to have 10ns ticks

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=billbug&dt=2025-09-25%2003%3A00%3A03
Solaris/sparc: looks to have 15ns ticks

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bumblebee&dt=2025-09-25%2015%3A00%3A02
CentOS/x86_64: looks to have 10ns ticks

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bushmaster&dt=2025-09-25%2013%3A40%3A21
Debian/x86_64: looks to have 10ns ticks

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=canebrake&dt=2025-09-24%2019%3A14%3A48
Debian/x86_64: looks to have 10ns ticks

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=copperhead&dt=2025-09-24%2018%3A49%3A47
Debian/riscv64: looks to have 1000ns ticks

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dikkop&dt=2025-09-24%2018%3A05%3A01
FreeBSD/arm64: possibly 20ns ticks, or very noisy 10ns

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gokiburi&dt=2025-09-25%2000%3A05%3A05
Debian/aarch64: looks to have 10ns ticks

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison&dt=2025-09-24%2022%3A39%3A25
Raspbian/armv7: looks to have 50ns ticks

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hachi&dt=2025-09-24%2021%3A05%3A05
Debian/aarch64: looks to have 10ns ticks

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hippopotamus&dt=2025-09-25%2013%3A35%3A48
openSUSE/x86_64: timing seems to be ns-precise

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=indri&dt=2025-09-25%2014%3A34%3A22
macOS/arm64: looks to have 40ns ticks

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jay&dt=2025-09-24%2016%3A47%3A05
openSUSE/x86_64: timing seems to be ns-precise

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=loach&dt=2025-09-25%2014%3A53%3A05
FreeBSD/x86_64: odd behavior, but I bet it's really 10ns resolution

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2025-09-25%2013%3A35%3A50
macOS/x86_64: timing seems to be ns-precise

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2025-09-25%2003%3A39%3A01
NetBSD/macppc: looks to have 50ns ticks

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=margay&dt=2025-09-25%2004%3A00%3A02
Solaris/sparc: looks to have 15ns ticks

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=morepork&dt=2025-09-24%2017%3A04%3A38
OpenBSD/x86_64: horrible loop time (> 4us), but perhaps 15ns resolution underneath?

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mule&dt=2025-09-24%2019%3A30%3A01
Debian/x86_64: timing seems to be ns-precise

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pollock&dt=2025-09-25%2005%3A35%3A13
illumos/x86_64: timing seems to be ns-precise

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2025-09-25%2012%3A43%3A07
Amazon Linux/86_64: timing seems to be ns-precise

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=schnauzer&dt=2025-09-24%2016%3A49%3A23
OpenBSD/x86_64: horrible loop time (> 4us), but perhaps 15ns resolution underneath?

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2025-09-24%2018%3A35%3A01
NetBSD/x86_64: horrible loop time (> 4us), but perhaps 15ns resolution underneath?

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sifaka&dt=2025-09-25%2013%3A56%3A16
macOS/arm64: looks to have 40ns ticks

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skimmer&dt=2025-09-25%2014%3A20%3A54
CentOS/x86_64: looks to have 10ns ticks

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snakefly&dt=2025-09-25%2013%3A32%3A08
Amazon Linux/aarch64: looks to have 10ns ticks

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=taipan&dt=2025-09-25%2013%3A51%3A19
Debian/x86_64: looks to have 10ns ticks

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=turaco&dt=2025-09-24%2018%3A15%3A03
Debian/armv7: 500ns loop time, but looks to have 15ns ticks

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=urutu&dt=2025-09-25%2012%3A55%3A44
Debian/x86_64: looks to have 10ns ticks

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=widowbird&dt=2025-09-24%2016%3A54%3A03
Debian/aarch64: looks to have 20ns ticks?

So your Hurd setup seems to be in the same camp as some of the BSDen:
very slow to read the clock, but the underlying hardware resolution
is not bad, perhaps 10ns.  Given that, I don't quite understand how
it's sometimes able to report the same reading twice, but nonetheless
that's what it's doing.

In any case, given these results, it's really hard to credit that
there are any platforms out there today that would fail to
distinguish clock readings more than a couple microseconds apart.
(This squares with conclusions we arrived at previously when
messing about with our clock code.)  So I'm inclined to set the
delay in stats.spec to 10us not 1ms, just to not slow the test
more than we have to.  Now, pg_sleep relies on WaitLatch which has a
sleep resolution of 1ms, so you might think there's little point in
asking for less.  But experimentation shows that if you ask for 1ms
you actually get a 2ms delay --- something odd about the float
roundoff behavior in pg_sleep, seems like.  I think we can set the
delay to less so that doesn't happen, and to be prepared in case
someday the sleep resolution gets better.

In short, your patch looks good and I'll go apply it with a slightly
smaller delay parameter.

            regards, tom lane



Re: GNU/Hurd portability patches

От
Michael Banck
Дата:
Hi,

On Thu, Sep 25, 2025 at 12:52:33PM -0400, Tom Lane wrote:
> In short, your patch looks good and I'll go apply it with a slightly
> smaller delay parameter.

Great, thanks! I let my VM run for a few thousand iterations with that
timeout and so far there were no failures.


Michael