Обсуждение: Re: Improving and extending int128.h to more of numeric.c
On Mon, Jun 23, 2025 at 3:01 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > 0001 is a trivial bug fix for the test code in src/tools/testint128.c > -- it was using "union" instead of "struct" for test128.hl, which > meant that it was only ever setting and checking half of each 128-bit > integer in the tests. Hi Dean, I went to take a look at this and got stuck at building the test file. The usual pointing gcc to the src and build include directories didn't cut it. How did you get it to work? -- John Naylor Amazon Web Services
On Wed, 9 Jul 2025 at 07:41, John Naylor <johncnaylorls@gmail.com> wrote: > > Hi Dean, I went to take a look at this and got stuck at building the > test file. The usual pointing gcc to the src and build include > directories didn't cut it. How did you get it to work? > Yes, it wasn't immediately obvious how to do it. I first built postgres as normal, including the pg_config tool, and then used that to compile the test as follows: gcc -O3 -g \ src/tools/testint128.c \ -I$(pg_config --includedir-server) \ -o src/tools/testint128 \ $(pg_config --libs) It actually only needs -lpgcommon -lpgport -lm, but it seemed easier just to include all of the pg_config --libs. Regards, Dean
Hi, On 2025-07-09 10:38:31 +0100, Dean Rasheed wrote: > On Wed, 9 Jul 2025 at 07:41, John Naylor <johncnaylorls@gmail.com> wrote: > > > > Hi Dean, I went to take a look at this and got stuck at building the > > test file. The usual pointing gcc to the src and build include > > directories didn't cut it. How did you get it to work? > > > > Yes, it wasn't immediately obvious how to do it. I first built > postgres as normal, including the pg_config tool, and then used that > to compile the test as follows: > > gcc -O3 -g \ > src/tools/testint128.c \ > -I$(pg_config --includedir-server) \ > -o src/tools/testint128 \ > $(pg_config --libs) > > It actually only needs -lpgcommon -lpgport -lm, but it seemed easier > just to include all of the pg_config --libs. I think we should wire this up to the buildsystem and our testsuite... Having testcode that is not run automatically may be helpful while originally developing something, but it doesn't do anything to detect portability issues or regressions. Greetings, Andres Freund
On Wed, 9 Jul 2025 at 18:27, Andres Freund <andres@anarazel.de> wrote: > > I think we should wire this up to the buildsystem and our testsuite... Having > testcode that is not run automatically may be helpful while originally > developing something, but it doesn't do anything to detect portability issues > or regressions. Yes, perhaps we should convert src/tools/testint128.c into a new test extension, src/test/modules/test_int128 Regards, Dean
On Wed, 9 Jul 2025 at 22:31, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > On Wed, 9 Jul 2025 at 18:27, Andres Freund <andres@anarazel.de> wrote: > > > > I think we should wire this up to the buildsystem and our testsuite... Having > > testcode that is not run automatically may be helpful while originally > > developing something, but it doesn't do anything to detect portability issues > > or regressions. > > Yes, perhaps we should convert src/tools/testint128.c into a new test > extension, src/test/modules/test_int128 Here's an update doing that (in 0001). 0002-0005 are unchanged. Regards, Dean
Вложения
- v2-0005-Extend-int128.h-to-support-more-numeric-code.patch
- v2-0002-Refactor-int128.h-bringing-the-native-and-non-nat.patch
- v2-0001-Convert-src-tools-testint128.c-into-a-test-module.patch
- v2-0003-Optimise-non-native-128-bit-addition-in-int128.h.patch
- v2-0004-Simplify-non-native-64x64-bit-multiplication-in-i.patch
On Thu, 10 Jul 2025 at 15:06, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > > Yes, perhaps we should convert src/tools/testint128.c into a new test > > extension, src/test/modules/test_int128 > > Here's an update doing that (in 0001). 0002-0005 are unchanged. v3 attached, fixing a couple of issues revealed by the cfbot: 1. The tests, as currently written, require a native int128 type to run. To fix that, for now at least, skip the tests if the platform lacks a native int128 type. We could perhaps improve on that by using numerics to compute the expected results. 2. Fix Visual Studio compiler warning about applying a unary minus operator to an unsigned type. Regards, Dean
Вложения
- v3-0001-Convert-src-tools-testint128.c-into-a-test-module.patch
- v3-0002-Refactor-int128.h-bringing-the-native-and-non-nat.patch
- v3-0004-Simplify-non-native-64x64-bit-multiplication-in-i.patch
- v3-0005-Extend-int128.h-to-support-more-numeric-code.patch
- v3-0003-Optimise-non-native-128-bit-addition-in-int128.h.patch
On Thu, Jul 10, 2025 at 9:06 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > On Wed, 9 Jul 2025 at 22:31, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > > > On Wed, 9 Jul 2025 at 18:27, Andres Freund <andres@anarazel.de> wrote: > > > > > > I think we should wire this up to the buildsystem and our testsuite... Having > > > testcode that is not run automatically may be helpful while originally > > > developing something, but it doesn't do anything to detect portability issues > > > or regressions. > > > > Yes, perhaps we should convert src/tools/testint128.c into a new test > > extension, src/test/modules/test_int128 > > Here's an update doing that (in 0001). 0002-0005 are unchanged. (Looking at v3) The new test module runs 10 million rather than a billion iterations. That still takes 1.2s (after 0005), which seems excessive for regular buildfarm testing. It seems like we could get by with fewer than that, by using the time of day for the PRNG seed (which would also need to be logged on error). On Mon, Jun 23, 2025 at 3:01 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > 0002 is a bit of preparatory refactoring of int128.h -- instead of > having all the native implementations at the top of the file, and the > non-native implementations at the bottom, this brings them together > (more like include/common/int.h). +1 > 0003 optimises the non-native addition code. Specifically, the test > for whether it needs to propagate a carry to the high part can be made > much simpler by noting that the low-part addition is unsigned integer > arithmetic, which is just modular arithmetic, so all it needs to do is > check for modular wrap-around, which can be done with a single "new < > old" test. In addition, it's possible to code this in a way that is > typically branchless, and produces the same machine code as the native > int128 code (e.g., an ADD and an ADC instruction). For me, this > significantly reduces the runtime of testint128 (from 31s to 16s). I see 1/3 less time with the new module, but still noticeably better. > 0004 simplifies the non-native multiplication code a bit by using > signed integer multiplication for the first three product terms, which > simplifies the code needed to add the products to the result. Looking > on godbolt.org, this typically leads to significantly smaller output, > with less branching, though I found it only gave around a 3% > improvement to the runtime of testint128. Nonetheless, I still think > it's worth doing, to make the code simpler and more readable. +1 > 0005 is the main patch. It adds a few more functions to int128.h and > uses them in numeric.c to allow various functions (mainly aggregate > functions) to use 128-bit integers unconditionally on all platforms. > This applies to the following aggregates: > > - sum(int8) > - avg(int8) > - stddev_pop(int4) > - stddev_samp(int4) > - var_pop(int4) > - var_samp(int4) > > Excluding the new test code, 0005 gives a slight net reduction in the > total line count, and eliminates nearly all "#ifdef HAVE_INT128" > conditional code from numeric.c, making it significantly simpler and > easier to follow. I haven't looked too closely, but wanted to point out: + /* check 128/32-bit division */ + t3.hl.hi = x; + t3.hl.lo = y; + t1.i128 = t3.i128 / z32; + r1 = (int32) (t3.i128 % z32); + t2 = t3; + int128_div_mod_int32(&t2.I128, z32, &r2); + + if (t1.hl.hi != t2.hl.hi || t1.hl.lo != t2.hl.lo) + { + printf("%016lX%016lX / signed %lX\n", t3.hl.hi, t3.hl.lo, z32); On gcc 14.3 -Og this gives warning: format ‘%lX’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘int32’ {aka ‘int’} [-Wformat=] ...and printing r1 and r2 has the same warnings. + if (r1 != r2) + { + printf("%016lX%016lX % signed %lX\n", t3.hl.hi, t3.hl.lo, z32); And this gives the above plus warning: ' ' flag used with ‘%s’ gnu_printf format [-Wformat=] warning: format ‘%s’ expects argument of type ‘char *’, but argument 4 has type ‘int32’ {aka ‘int’} [-Wformat=] > Testing on a 32-bit system without native int128 support, I see > something like a 1.3-1.5x speedup in a couple of simple queries using > those aggregates. Nice! -- John Naylor Amazon Web Services
On Mon, 14 Jul 2025 at 11:22, John Naylor <johncnaylorls@gmail.com> wrote: > > (Looking at v3) The new test module runs 10 million rather than a > billion iterations. That still takes 1.2s (after 0005), which seems > excessive for regular buildfarm testing. It seems like we could get by > with fewer than that, by using the time of day for the PRNG seed > (which would also need to be logged on error). Thanks for looking! I have reduced the number of iterations and changed it to use the current time for the PRNG seed. I don't see much value in logging the seed though, since we already log the inputs that cause any failure. > > 0005 is the main patch. > > > I haven't looked too closely, but wanted to point out: > > warning: format ‘%lX’ expects argument of type ‘long unsigned int’, > but argument 4 has type ‘int32’ {aka ‘int’} [-Wformat=] > > ... Ah yes, thanks for pointing that out. (The cfbot reports the same warnings, but you have to scroll through a lot of output to see them. It would be nice if the commitfest app had an indicator to show if there were any compiler warnings.) v4 attached. Regards, Dean
Вложения
- v4-0001-Convert-src-tools-testint128.c-into-a-test-module.patch
- v4-0004-Simplify-non-native-64x64-bit-multiplication-in-i.patch
- v4-0005-Extend-int128.h-to-support-more-numeric-code.patch
- v4-0003-Optimise-non-native-128-bit-addition-in-int128.h.patch
- v4-0002-Refactor-int128.h-bringing-the-native-and-non-nat.patch
On Mon, 14 Jul 2025 at 22:07, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > > warning: format ‘%lX’ expects argument of type ‘long unsigned int’, > > but argument 4 has type ‘int32’ {aka ‘int’} [-Wformat=] > > > v4 attached. v5 attached, fixing some more printf-related compiler warnings, this time from the original test code. Regards, Dean
Вложения
- v5-0001-Convert-src-tools-testint128.c-into-a-test-module.patch
- v5-0005-Extend-int128.h-to-support-more-numeric-code.patch
- v5-0002-Refactor-int128.h-bringing-the-native-and-non-nat.patch
- v5-0004-Simplify-non-native-64x64-bit-multiplication-in-i.patch
- v5-0003-Optimise-non-native-128-bit-addition-in-int128.h.patch
Hi, On 2025-07-14 22:07:38 +0100, Dean Rasheed wrote: > (The cfbot reports the same warnings, but you have to scroll through a > lot of output to see them. It would be nice if the commitfest app had > an indicator to show if there were any compiler warnings.) FWIW, for many warnings the CompilerWarnings task will fail. It's "just" the 32bit build and msvc windows builds that currently don't... There was a patch adding it for the msvc build at some point, but ... Greetings, Andres Freund
On Tue, Jul 15, 2025 at 4:07 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > I have reduced the number of iterations and changed it to use the > current time for the PRNG seed. I don't see much value in logging the > seed though, since we already log the inputs that cause any failure. Ah, right. On Mon, Jun 23, 2025 at 3:01 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > 0005 is the main patch. It adds a few more functions to int128.h and > uses them in numeric.c to allow various functions (mainly aggregate > functions) to use 128-bit integers unconditionally on all platforms. > This applies to the following aggregates: > > - sum(int8) > - avg(int8) > - stddev_pop(int4) > - stddev_samp(int4) > - var_pop(int4) > - var_samp(int4) > Testing on a 32-bit system without native int128 support, I see > something like a 1.3-1.5x speedup in a couple of simple queries using > those aggregates. With v5, I don't see any difference from master when building on x86 with -m32 for these queries: select sum(i) from generate_series(1e10, 1e10+1e6, 1) i; select var_pop(i) from generate_series(1e9, 1e9+1e6, 1) i; Which queries were you testing? (Also, unrelated to the patch set, but I was surprised to find replacing the numeric expressions above with bigint ones (10_000_000_000 etc) makes the queries at least 5 times slower, and that's true with a normal 64-bit build as well.) -- John Naylor Amazon Web Services
On Wed, 16 Jul 2025 at 10:02, John Naylor <johncnaylorls@gmail.com> wrote: > > On Mon, Jun 23, 2025 at 3:01 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > 0005 is the main patch. It adds a few more functions to int128.h and > > uses them in numeric.c to allow various functions (mainly aggregate > > functions) to use 128-bit integers unconditionally on all platforms. > > This applies to the following aggregates: > > > > - sum(int8) > > - avg(int8) > > - stddev_pop(int4) > > - stddev_samp(int4) > > - var_pop(int4) > > - var_samp(int4) > > > Testing on a 32-bit system without native int128 support, I see > > something like a 1.3-1.5x speedup in a couple of simple queries using > > those aggregates. > > With v5, I don't see any difference from master when building on x86 > with -m32 for these queries: Thanks for testing! > select sum(i) from generate_series(1e10, 1e10+1e6, 1) i; > select var_pop(i) from generate_series(1e9, 1e9+1e6, 1) i; The patch won't make any difference to those because "i" is numeric in those queries, and the patch doesn't touch sum(numeric) or var_pop(numeric). > Which queries were you testing? I used the following 2 queries: SELECT count(*), sum(x), avg(x) FROM generate_series(1::bigint, 10000000::bigint) g(x); SELECT count(*), var_pop(x), var_samp(x), stddev_pop(x), stddev_samp(x) FROM generate_series(1::int, 10000000::int) g(x); On 64-bit Linux with gcc 14.2 and native int128 support disabled I got the following results: Query 1: HEAD: 1404.096 ms Patch: 992.818 ms Query 2: HEAD: 1498.949 ms Patch: 935.654 ms And on a 32-bit Linux VM I got: Query 1: HEAD: 2465.202 ms Patch: 1874.590 ms Query 2: HEAD: 2491.991 ms Patch: 1682.992 ms I didn't originally try "-m32" on 64-bit Linux because I wasn't sure how realistic a test that would be, but doing that now I get: Query 1: HEAD: 1830.652 ms Patch: 1411.438 ms Query 2: HEAD: 1882.299 ms Patch: 1299.546 ms > (Also, unrelated to the patch set, but I was surprised to find > replacing the numeric expressions above with bigint ones > (10_000_000_000 etc) makes the queries at least 5 times slower, and > that's true with a normal 64-bit build as well.) Hmm, are you sure? I don't see that. With -m32, I see: select sum(i) from generate_series(1e10, 1e10+1e6, 1) i; HEAD: 204.774 ms Patch: 204.206 ms (not expecting any difference) select sum(i) from generate_series(10_000_000_000, 10_001_000_000, 1) i; HEAD: 187.426 ms Patch: 140.741 ms (as expected, faster than the previous query in HEAD because bigint generate_series should be faster than numeric generate_series, and faster still with the sum(bigint) optimisations made by this patch) select var_pop(i) from generate_series(1e9, 1e9+1e6, 1) i; HEAD: 228.386 ms Patch: 226.712 ms (not expecting any difference) select var_pop(i) from generate_series(10_000_000_000, 10_001_000_000, 1) i; HEAD: 211.749 ms Patch: 210.870 ms (as expected, faster than previous query because of bigint generate_series, but the patch makes no difference because it doesn't touch var_pop(bigint)) And another query: select sum(i::bigint) from generate_series(1e10, 1e10+1e6, 1) i; HEAD: 271.888 ms Patch: 227.898 ms (as expected, slower than the pure numeric version in HEAD because of the cast, while still using numeric in the aggregate, but with a decent speedup from the patch, using INT128 in the aggregate) Regards, Dean
On Wed, 16 Jul 2025 at 19:23, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > On 64-bit Linux with gcc 14.2 and native int128 support disabled I got > the following results: > > Query 1: > HEAD: 1404.096 ms > Patch: 992.818 ms > > Query 2: > HEAD: 1498.949 ms > Patch: 935.654 ms > BTW, my other motivation for doing this was to simplify the numeric code. Even if this had zero performance benefit, as long as it didn't make things any slower, I would argue that it's worth doing. The other 2 places in numeric.c that have conditional 128-bit integer code would require more complex hand-written code to replace, such as 128-bit-by-128-bit division. That's obviously doable, but perhaps not worth the effort as long as it's only those 2 numeric functions that need it. OTOH, if there's a wider demand for 128-bit integers, that might change. Regards, Dean
On Thu, Jul 17, 2025 at 1:24 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > On Wed, 16 Jul 2025 at 10:02, John Naylor <johncnaylorls@gmail.com> wrote: > > > Which queries were you testing? > > I used the following 2 queries: > > SELECT count(*), sum(x), avg(x) > FROM generate_series(1::bigint, 10000000::bigint) g(x); > > SELECT count(*), var_pop(x), var_samp(x), stddev_pop(x), stddev_samp(x) > FROM generate_series(1::int, 10000000::int) g(x); > > On 64-bit Linux with gcc 14.2 and native int128 support disabled I got > the following results: > > Query 1: > HEAD: 1404.096 ms > Patch: 992.818 ms > > Query 2: > HEAD: 1498.949 ms > Patch: 935.654 ms While testing something else on s390x, I noticed that __int128 support is broken on that platform at least for some versions of clang [1], and I see improvement there with this patch: Query 1: HEAD: 3015ms Patch: 2206ms Query 2: HEAD: 3394ms Patch: 2514ms > > (Also, unrelated to the patch set, but I was surprised to find > > replacing the numeric expressions above with bigint ones > > (10_000_000_000 etc) makes the queries at least 5 times slower, and > > that's true with a normal 64-bit build as well.) > > Hmm, are you sure? I don't see that. With -m32, I see: > > select sum(i) from generate_series(1e10, 1e10+1e6, 1) i; > HEAD: 204.774 ms > Patch: 204.206 ms > (not expecting any difference) > > select sum(i) from generate_series(10_000_000_000, 10_001_000_000, 1) i; > HEAD: 187.426 ms > Patch: 140.741 ms > (as expected, faster than the previous query in HEAD because bigint > generate_series should be faster than numeric generate_series, and > faster still with the sum(bigint) optimisations made by this patch) Hmm, at the time I was surprised too, and ran multiple times but today I can't reproduce my earlier results, so not sure what happened. :/ [1] https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=treehopper&dt=2025-07-17%2019%3A26%3A04&stg=configure -- John Naylor Amazon Web Services
On Thu, Jul 17, 2025 at 2:30 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > BTW, my other motivation for doing this was to simplify the numeric > code. Even if this had zero performance benefit, as long as it didn't > make things any slower, I would argue that it's worth doing. I gathered that was the main motivation, and I agree. I looked over 0005 and don't see any issues. -- John Naylor Amazon Web Services
On Fri, 18 Jul 2025 at 07:42, John Naylor <johncnaylorls@gmail.com> wrote: > > While testing something else on s390x, I noticed that __int128 support > is broken on that platform at least for some versions of clang [1], > and I see improvement there with this patch: > > Query 1: > HEAD: 3015ms > Patch: 2206ms > > Query 2: > HEAD: 3394ms > Patch: 2514ms Thanks for testing. On Fri, 18 Jul 2025 at 07:47, John Naylor <johncnaylorls@gmail.com> wrote: > > On Thu, Jul 17, 2025 at 2:30 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > > > BTW, my other motivation for doing this was to simplify the numeric > > code. Even if this had zero performance benefit, as long as it didn't > > make things any slower, I would argue that it's worth doing. > > I gathered that was the main motivation, and I agree. I looked over > 0005 and don't see any issues. Thanks for reviewing. If there are no objections, I'll push this shortly (though I'll change INT64_HEX_FORMAT to PRIx64, since it looks like the former is about to go away). Regards, Dean