Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.
От | Dean Rasheed |
---|---|
Тема | Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands. |
Дата | |
Msg-id | CAEZATCXVB15_RfzROFiO_7Fh3voOuOe7FotbvON-6a+ManuGJg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands. ("Joel Jacobson" <joel@compiler.org>) |
Ответы |
Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.
|
Список | pgsql-hackers |
On Wed, 3 Jul 2024 at 21:45, Joel Jacobson <joel@compiler.org> wrote: > > > On Wed, Jul 3, 2024, at 20:57, Dean Rasheed wrote: > >> I wouldn't expect it to ever be off by more than 1 > > > > OK, so then the cases I found where it was off by 2 for the mul_var_int() patch > > are unexpected? > > Sorry, I meant off by 2 for the mul_var_small() patch, these cases that I found: > Yeah, so that was another bug in mul_var_small(). If rscale is made small enough, the result index for the digits computed before the main loop overlaps the ones after, so it would overwrite digits already computed. Of course, that's fairly easy to fix, but at this point I think the better solution is to only use mul_var_small() when an exact product is requested. We would have to do that for mul_var_int() anyway, because of its accuracy issues discussed earlier. I think this is a reasonable thing to do because only functions like ln_var() and exp_var() will ask mul_var() for a reduced-rscale result, and those functions are likely to be dominated by computations involving larger numbers, for which this patch wouldn't help anyway. Also those functions are probably less widely used. If we make that decision, a lot of the complexity in mul_var_small() goes away, including all the conditional array accesses, making it simpler and more efficient. v6 patch attached. I also updated the mul_var_int() patch so that it is also only invoked when an exact product is requested, and I noticed a couple of other minor optimisations that could be made. Then I decided to try implementing mul_var_int64(). This gives a pretty decent speedup for 3-digit inputs, but unfortunately it is much slower for 4-digit inputs (for which most values will go through the 128-bit code path). I'm attaching that too, just for information, but it's clearly not going to be acceptable as-is. Running your benchmark queries, I got these results: SELECT SUM(var1*var2) FROM bench_mul_var_var1ndigits_1; Time: 4520.874 ms (00:04.521) -- HEAD Time: 3937.536 ms (00:03.938) -- v5-mul_var_int.patch Time: 3919.495 ms (00:03.919) -- v5-mul_var_small.patch Time: 3916.964 ms (00:03.917) -- v6-mul_var_int64.patch Time: 3811.118 ms (00:03.811) -- v6-mul_var_small.patch SELECT SUM(var1*var2) FROM bench_mul_var_var1ndigits_2; Time: 4762.528 ms (00:04.763) -- HEAD Time: 4075.546 ms (00:04.076) -- v5-mul_var_int.patch Time: 4055.180 ms (00:04.055) -- v5-mul_var_small.patch Time: 4037.866 ms (00:04.038) -- v6-mul_var_int64.patch Time: 4018.488 ms (00:04.018) -- v6-mul_var_small.patch SELECT SUM(var1*var2) FROM bench_mul_var_var1ndigits_3; Time: 5387.514 ms (00:05.388) -- HEAD Time: 5350.736 ms (00:05.351) -- v5-mul_var_int.patch Time: 4648.449 ms (00:04.648) -- v5-mul_var_small.patch Time: 4655.204 ms (00:04.655) -- v6-mul_var_int64.patch Time: 4645.962 ms (00:04.646) -- v6-mul_var_small.patch SELECT SUM(var1*var2) FROM bench_mul_var_var1ndigits_4; Time: 5617.150 ms (00:05.617) -- HEAD Time: 5505.913 ms (00:05.506) -- v5-mul_var_int.patch Time: 5486.441 ms (00:05.486) -- v5-mul_var_small.patch Time: 8203.081 ms (00:08.203) -- v6-mul_var_int64.patch Time: 5598.909 ms (00:05.599) -- v6-mul_var_small.patch So v6-mul_var_int64 improves on v5-mul_var_int in the 3-digit case, but is terrible in the 4-digit case. None of the other patches touch the 4-digit case, but it might be interesting to try mul_var_small() with 4 digits. Regards, Dean
Вложения
В списке pgsql-hackers по дате отправления: