Обсуждение: Re: Improving and extending int128.h to more of numeric.c

Поиск
Список
Период
Сортировка

Re: Improving and extending int128.h to more of numeric.c

От
John Naylor
Дата:
On Mon, Jun 23, 2025 at 3:01 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> 0001 is a trivial bug fix for the test code in src/tools/testint128.c
> -- it was using "union" instead of "struct" for test128.hl, which
> meant that it was only ever setting and checking half of each 128-bit
> integer in the tests.

Hi Dean, I went to take a look at this and got stuck at building the
test file. The usual pointing gcc to the src and build include
directories didn't cut it. How did you get it to work?

--
John Naylor
Amazon Web Services



Re: Improving and extending int128.h to more of numeric.c

От
Dean Rasheed
Дата:
On Wed, 9 Jul 2025 at 07:41, John Naylor <johncnaylorls@gmail.com> wrote:
>
> Hi Dean, I went to take a look at this and got stuck at building the
> test file. The usual pointing gcc to the src and build include
> directories didn't cut it. How did you get it to work?
>

Yes, it wasn't immediately obvious how to do it. I first built
postgres as normal, including the  pg_config tool, and then used that
to compile the test as follows:

gcc -O3 -g \
    src/tools/testint128.c \
    -I$(pg_config --includedir-server) \
    -o src/tools/testint128 \
    $(pg_config --libs)

It actually only needs -lpgcommon -lpgport -lm, but it seemed easier
just to include all of the pg_config --libs.

Regards,
Dean



Re: Improving and extending int128.h to more of numeric.c

От
Andres Freund
Дата:
Hi,

On 2025-07-09 10:38:31 +0100, Dean Rasheed wrote:
> On Wed, 9 Jul 2025 at 07:41, John Naylor <johncnaylorls@gmail.com> wrote:
> >
> > Hi Dean, I went to take a look at this and got stuck at building the
> > test file. The usual pointing gcc to the src and build include
> > directories didn't cut it. How did you get it to work?
> >
>
> Yes, it wasn't immediately obvious how to do it. I first built
> postgres as normal, including the  pg_config tool, and then used that
> to compile the test as follows:
>
> gcc -O3 -g \
>     src/tools/testint128.c \
>     -I$(pg_config --includedir-server) \
>     -o src/tools/testint128 \
>     $(pg_config --libs)
>
> It actually only needs -lpgcommon -lpgport -lm, but it seemed easier
> just to include all of the pg_config --libs.

I think we should wire this up to the buildsystem and our testsuite...  Having
testcode that is not run automatically may be helpful while originally
developing something, but it doesn't do anything to detect portability issues
or regressions.

Greetings,

Andres Freund



Re: Improving and extending int128.h to more of numeric.c

От
Dean Rasheed
Дата:
On Wed, 9 Jul 2025 at 18:27, Andres Freund <andres@anarazel.de> wrote:
>
> I think we should wire this up to the buildsystem and our testsuite...  Having
> testcode that is not run automatically may be helpful while originally
> developing something, but it doesn't do anything to detect portability issues
> or regressions.

Yes, perhaps we should convert src/tools/testint128.c into a new test
extension, src/test/modules/test_int128

Regards,
Dean



Re: Improving and extending int128.h to more of numeric.c

От
Dean Rasheed
Дата:
On Wed, 9 Jul 2025 at 22:31, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> On Wed, 9 Jul 2025 at 18:27, Andres Freund <andres@anarazel.de> wrote:
> >
> > I think we should wire this up to the buildsystem and our testsuite...  Having
> > testcode that is not run automatically may be helpful while originally
> > developing something, but it doesn't do anything to detect portability issues
> > or regressions.
>
> Yes, perhaps we should convert src/tools/testint128.c into a new test
> extension, src/test/modules/test_int128

Here's an update doing that (in 0001). 0002-0005 are unchanged.

Regards,
Dean

Вложения

Re: Improving and extending int128.h to more of numeric.c

От
Dean Rasheed
Дата:
On Thu, 10 Jul 2025 at 15:06, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> > Yes, perhaps we should convert src/tools/testint128.c into a new test
> > extension, src/test/modules/test_int128
>
> Here's an update doing that (in 0001). 0002-0005 are unchanged.

v3 attached, fixing a couple of issues revealed by the cfbot:

1. The tests, as currently written, require a native int128 type to
run. To fix that, for now at least, skip the tests if the platform
lacks a native int128 type. We could perhaps improve on that by using
numerics to compute the expected results.

2. Fix Visual Studio compiler warning about applying a unary minus
operator to an unsigned type.

Regards,
Dean

Вложения

Re: Improving and extending int128.h to more of numeric.c

От
John Naylor
Дата:
On Thu, Jul 10, 2025 at 9:06 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> On Wed, 9 Jul 2025 at 22:31, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> >
> > On Wed, 9 Jul 2025 at 18:27, Andres Freund <andres@anarazel.de> wrote:
> > >
> > > I think we should wire this up to the buildsystem and our testsuite...  Having
> > > testcode that is not run automatically may be helpful while originally
> > > developing something, but it doesn't do anything to detect portability issues
> > > or regressions.
> >
> > Yes, perhaps we should convert src/tools/testint128.c into a new test
> > extension, src/test/modules/test_int128
>
> Here's an update doing that (in 0001). 0002-0005 are unchanged.

(Looking at v3) The new test module runs 10 million rather than a
billion iterations. That still takes 1.2s (after 0005), which seems
excessive for regular buildfarm testing. It seems like we could get by
with fewer than that, by using the time of day for the PRNG seed
(which would also need to be logged on error).

On Mon, Jun 23, 2025 at 3:01 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> 0002 is a bit of preparatory refactoring of int128.h -- instead of
> having all the native implementations at the top of the file, and the
> non-native implementations at the bottom, this brings them together
> (more like include/common/int.h).

+1

> 0003 optimises the non-native addition code. Specifically, the test
> for whether it needs to propagate a carry to the high part can be made
> much simpler by noting that the low-part addition is unsigned integer
> arithmetic, which is just modular arithmetic, so all it needs to do is
> check for modular wrap-around, which can be done with a single "new <
> old" test. In addition, it's possible to code this in a way that is
> typically branchless, and produces the same machine code as the native
> int128 code (e.g., an ADD and an ADC instruction). For me, this
> significantly reduces the runtime of testint128 (from 31s to 16s).

I see 1/3 less time with the new module, but still noticeably better.

> 0004 simplifies the non-native multiplication code a bit by using
> signed integer multiplication for the first three product terms, which
> simplifies the code needed to add the products to the result. Looking
> on godbolt.org, this typically leads to significantly smaller output,
> with less branching, though I found it only gave around a 3%
> improvement to the runtime of testint128. Nonetheless, I still think
> it's worth doing, to make the code simpler and more readable.

+1

> 0005 is the main patch. It adds a few more functions to int128.h and
> uses them in numeric.c to allow various functions (mainly aggregate
> functions) to use 128-bit integers unconditionally on all platforms.
> This applies to the following aggregates:
>
> - sum(int8)
> - avg(int8)
> - stddev_pop(int4)
> - stddev_samp(int4)
> - var_pop(int4)
> - var_samp(int4)
>
> Excluding the new test code, 0005 gives a slight net reduction in the
> total line count, and eliminates nearly all "#ifdef HAVE_INT128"
> conditional code from numeric.c, making it significantly simpler and
> easier to follow.

I haven't looked too closely, but wanted to point out:

+ /* check 128/32-bit division */
+ t3.hl.hi = x;
+ t3.hl.lo = y;
+ t1.i128 = t3.i128 / z32;
+ r1 = (int32) (t3.i128 % z32);
+ t2 = t3;
+ int128_div_mod_int32(&t2.I128, z32, &r2);
+
+ if (t1.hl.hi != t2.hl.hi || t1.hl.lo != t2.hl.lo)
+ {
+ printf("%016lX%016lX / signed %lX\n", t3.hl.hi, t3.hl.lo, z32);

On gcc 14.3 -Og this gives

warning: format ‘%lX’ expects argument of type ‘long unsigned int’,
but argument 4 has type ‘int32’ {aka ‘int’} [-Wformat=]

...and printing r1 and r2 has the same warnings.

+ if (r1 != r2)
+ {
+ printf("%016lX%016lX % signed %lX\n", t3.hl.hi, t3.hl.lo, z32);

And this gives the above plus

warning: ' ' flag used with ‘%s’ gnu_printf format [-Wformat=]
warning: format ‘%s’ expects argument of type ‘char *’, but argument 4
has type ‘int32’ {aka ‘int’} [-Wformat=]

> Testing on a 32-bit system without native int128 support, I see
> something like a 1.3-1.5x speedup in a couple of simple queries using
> those aggregates.

Nice!

--
John Naylor
Amazon Web Services



Re: Improving and extending int128.h to more of numeric.c

От
Dean Rasheed
Дата:
On Mon, 14 Jul 2025 at 11:22, John Naylor <johncnaylorls@gmail.com> wrote:
>
> (Looking at v3) The new test module runs 10 million rather than a
> billion iterations. That still takes 1.2s (after 0005), which seems
> excessive for regular buildfarm testing. It seems like we could get by
> with fewer than that, by using the time of day for the PRNG seed
> (which would also need to be logged on error).

Thanks for looking!

I have reduced the number of iterations and changed it to use the
current time for the PRNG seed. I don't see much value in logging the
seed though, since we already log the inputs that cause any failure.

> > 0005 is the main patch.
> >
> I haven't looked too closely, but wanted to point out:
>
> warning: format ‘%lX’ expects argument of type ‘long unsigned int’,
> but argument 4 has type ‘int32’ {aka ‘int’} [-Wformat=]
>
> ...

Ah yes, thanks for pointing that out.

(The cfbot reports the same warnings, but you have to scroll through a
lot of output to see them. It would be nice if the commitfest app had
an indicator to show if there were any compiler warnings.)

v4 attached.

Regards,
Dean

Вложения

Re: Improving and extending int128.h to more of numeric.c

От
Dean Rasheed
Дата:
On Mon, 14 Jul 2025 at 22:07, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> > warning: format ‘%lX’ expects argument of type ‘long unsigned int’,
> > but argument 4 has type ‘int32’ {aka ‘int’} [-Wformat=]
> >
> v4 attached.

v5 attached, fixing some more printf-related compiler warnings, this
time from the original test code.

Regards,
Dean

Вложения

Re: Improving and extending int128.h to more of numeric.c

От
Andres Freund
Дата:
Hi,

On 2025-07-14 22:07:38 +0100, Dean Rasheed wrote:
> (The cfbot reports the same warnings, but you have to scroll through a
> lot of output to see them. It would be nice if the commitfest app had
> an indicator to show if there were any compiler warnings.)

FWIW, for many warnings the CompilerWarnings task will fail. It's "just" the
32bit build and msvc windows builds that currently don't...

There was a patch adding it for the msvc build at some point, but ...

Greetings,

Andres Freund



Re: Improving and extending int128.h to more of numeric.c

От
John Naylor
Дата:
On Tue, Jul 15, 2025 at 4:07 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> I have reduced the number of iterations and changed it to use the
> current time for the PRNG seed. I don't see much value in logging the
> seed though, since we already log the inputs that cause any failure.

Ah, right.

On Mon, Jun 23, 2025 at 3:01 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> 0005 is the main patch. It adds a few more functions to int128.h and
> uses them in numeric.c to allow various functions (mainly aggregate
> functions) to use 128-bit integers unconditionally on all platforms.
> This applies to the following aggregates:
>
> - sum(int8)
> - avg(int8)
> - stddev_pop(int4)
> - stddev_samp(int4)
> - var_pop(int4)
> - var_samp(int4)

> Testing on a 32-bit system without native int128 support, I see
> something like a 1.3-1.5x speedup in a couple of simple queries using
> those aggregates.

With v5, I don't see any difference from master when building on x86
with -m32 for these queries:

select sum(i) from generate_series(1e10, 1e10+1e6, 1) i;
select var_pop(i) from generate_series(1e9, 1e9+1e6, 1) i;

Which queries were you testing?

(Also, unrelated to the patch set, but I was surprised to find
replacing the numeric expressions above with bigint ones
(10_000_000_000 etc) makes the queries at least 5 times slower, and
that's true with a normal 64-bit build as well.)

--
John Naylor
Amazon Web Services



Re: Improving and extending int128.h to more of numeric.c

От
Dean Rasheed
Дата:
On Wed, 16 Jul 2025 at 10:02, John Naylor <johncnaylorls@gmail.com> wrote:
>
> On Mon, Jun 23, 2025 at 3:01 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> > 0005 is the main patch. It adds a few more functions to int128.h and
> > uses them in numeric.c to allow various functions (mainly aggregate
> > functions) to use 128-bit integers unconditionally on all platforms.
> > This applies to the following aggregates:
> >
> > - sum(int8)
> > - avg(int8)
> > - stddev_pop(int4)
> > - stddev_samp(int4)
> > - var_pop(int4)
> > - var_samp(int4)
>
> > Testing on a 32-bit system without native int128 support, I see
> > something like a 1.3-1.5x speedup in a couple of simple queries using
> > those aggregates.
>
> With v5, I don't see any difference from master when building on x86
> with -m32 for these queries:

Thanks for testing!

> select sum(i) from generate_series(1e10, 1e10+1e6, 1) i;
> select var_pop(i) from generate_series(1e9, 1e9+1e6, 1) i;

The patch won't make any difference to those because "i" is numeric in
those queries, and the patch doesn't touch sum(numeric) or
var_pop(numeric).

> Which queries were you testing?

I used the following 2 queries:

SELECT count(*), sum(x), avg(x)
  FROM generate_series(1::bigint, 10000000::bigint) g(x);

SELECT count(*), var_pop(x), var_samp(x), stddev_pop(x), stddev_samp(x)
  FROM generate_series(1::int, 10000000::int) g(x);

On 64-bit Linux with gcc 14.2 and native int128 support disabled I got
the following results:

Query 1:
  HEAD:  1404.096 ms
  Patch:  992.818 ms

Query 2:
  HEAD:  1498.949 ms
  Patch:  935.654 ms

And on a 32-bit Linux VM I got:

Query 1:
  HEAD:  2465.202 ms
  Patch: 1874.590 ms
Query 2:
  HEAD:  2491.991 ms
  Patch: 1682.992 ms

I didn't originally try "-m32" on 64-bit Linux because I wasn't sure
how realistic a test that would be, but doing that now I get:

Query 1:
  HEAD:  1830.652 ms
  Patch: 1411.438 ms
Query 2:
  HEAD:  1882.299 ms
  Patch: 1299.546 ms

> (Also, unrelated to the patch set, but I was surprised to find
> replacing the numeric expressions above with bigint ones
> (10_000_000_000 etc) makes the queries at least 5 times slower, and
> that's true with a normal 64-bit build as well.)

Hmm, are you sure? I don't see that. With -m32, I see:

select sum(i) from generate_series(1e10, 1e10+1e6, 1) i;
  HEAD:  204.774 ms
  Patch: 204.206 ms
(not expecting any difference)

select sum(i) from generate_series(10_000_000_000, 10_001_000_000, 1) i;
  HEAD:  187.426 ms
  Patch: 140.741 ms
(as expected, faster than the previous query in HEAD because bigint
generate_series should be faster than numeric generate_series, and
faster still with the sum(bigint) optimisations made by this patch)

select var_pop(i) from generate_series(1e9, 1e9+1e6, 1) i;
  HEAD:  228.386 ms
  Patch: 226.712 ms
(not expecting any difference)

select var_pop(i) from generate_series(10_000_000_000, 10_001_000_000, 1) i;
  HEAD:  211.749 ms
  Patch: 210.870 ms
(as expected, faster than previous query because of bigint
generate_series, but the patch makes no difference because it doesn't
touch var_pop(bigint))

And another query:

select sum(i::bigint) from generate_series(1e10, 1e10+1e6, 1) i;
  HEAD:  271.888 ms
  Patch: 227.898 ms
(as expected, slower than the pure numeric version in HEAD because of
the cast, while still using numeric in the aggregate, but with a
decent speedup from the patch, using INT128 in the aggregate)

Regards,
Dean



Re: Improving and extending int128.h to more of numeric.c

От
Dean Rasheed
Дата:
On Wed, 16 Jul 2025 at 19:23, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> On 64-bit Linux with gcc 14.2 and native int128 support disabled I got
> the following results:
>
> Query 1:
>   HEAD:  1404.096 ms
>   Patch:  992.818 ms
>
> Query 2:
>   HEAD:  1498.949 ms
>   Patch:  935.654 ms
>

BTW, my other motivation for doing this was to simplify the numeric
code. Even if this had zero performance benefit, as long as it didn't
make things any slower, I would argue that it's worth doing.

The other 2 places in numeric.c that have conditional 128-bit integer
code would require more complex hand-written code to replace, such as
128-bit-by-128-bit division. That's obviously doable, but perhaps not
worth the effort as long as it's only those 2 numeric functions that
need it. OTOH, if there's a wider demand for 128-bit integers, that
might change.

Regards,
Dean



Re: Improving and extending int128.h to more of numeric.c

От
John Naylor
Дата:
On Thu, Jul 17, 2025 at 1:24 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> On Wed, 16 Jul 2025 at 10:02, John Naylor <johncnaylorls@gmail.com> wrote:
>
> > Which queries were you testing?
>
> I used the following 2 queries:
>
> SELECT count(*), sum(x), avg(x)
>   FROM generate_series(1::bigint, 10000000::bigint) g(x);
>
> SELECT count(*), var_pop(x), var_samp(x), stddev_pop(x), stddev_samp(x)
>   FROM generate_series(1::int, 10000000::int) g(x);
>
> On 64-bit Linux with gcc 14.2 and native int128 support disabled I got
> the following results:
>
> Query 1:
>   HEAD:  1404.096 ms
>   Patch:  992.818 ms
>
> Query 2:
>   HEAD:  1498.949 ms
>   Patch:  935.654 ms

While testing something else on s390x, I noticed that __int128 support
is broken on that platform at least for some versions of clang [1],
and I see improvement there with this patch:

Query 1:
HEAD:  3015ms
Patch: 2206ms

Query 2:
HEAD:  3394ms
Patch: 2514ms

> > (Also, unrelated to the patch set, but I was surprised to find
> > replacing the numeric expressions above with bigint ones
> > (10_000_000_000 etc) makes the queries at least 5 times slower, and
> > that's true with a normal 64-bit build as well.)
>
> Hmm, are you sure? I don't see that. With -m32, I see:
>
> select sum(i) from generate_series(1e10, 1e10+1e6, 1) i;
>   HEAD:  204.774 ms
>   Patch: 204.206 ms
> (not expecting any difference)
>
> select sum(i) from generate_series(10_000_000_000, 10_001_000_000, 1) i;
>   HEAD:  187.426 ms
>   Patch: 140.741 ms
> (as expected, faster than the previous query in HEAD because bigint
> generate_series should be faster than numeric generate_series, and
> faster still with the sum(bigint) optimisations made by this patch)

Hmm, at the time I was surprised too, and ran multiple times but today
I can't reproduce my earlier results, so not sure what happened. :/

[1] https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=treehopper&dt=2025-07-17%2019%3A26%3A04&stg=configure

--
John Naylor
Amazon Web Services



Re: Improving and extending int128.h to more of numeric.c

От
John Naylor
Дата:
On Thu, Jul 17, 2025 at 2:30 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> BTW, my other motivation for doing this was to simplify the numeric
> code. Even if this had zero performance benefit, as long as it didn't
> make things any slower, I would argue that it's worth doing.

I gathered that was the main motivation, and I agree. I looked over
0005 and don't see any issues.

--
John Naylor
Amazon Web Services



Re: Improving and extending int128.h to more of numeric.c

От
Dean Rasheed
Дата:
On Fri, 18 Jul 2025 at 07:42, John Naylor <johncnaylorls@gmail.com> wrote:
>
> While testing something else on s390x, I noticed that __int128 support
> is broken on that platform at least for some versions of clang [1],
> and I see improvement there with this patch:
>
> Query 1:
> HEAD:  3015ms
> Patch: 2206ms
>
> Query 2:
> HEAD:  3394ms
> Patch: 2514ms

Thanks for testing.

On Fri, 18 Jul 2025 at 07:47, John Naylor <johncnaylorls@gmail.com> wrote:
>
> On Thu, Jul 17, 2025 at 2:30 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> >
> > BTW, my other motivation for doing this was to simplify the numeric
> > code. Even if this had zero performance benefit, as long as it didn't
> > make things any slower, I would argue that it's worth doing.
>
> I gathered that was the main motivation, and I agree. I looked over
> 0005 and don't see any issues.

Thanks for reviewing. If there are no objections, I'll push this
shortly (though I'll change INT64_HEX_FORMAT to PRIx64, since it looks
like the former is about to go away).

Regards,
Dean