Some optimisations for numeric division
От | Dean Rasheed |
---|---|
Тема | Some optimisations for numeric division |
Дата | |
Msg-id | CAEZATCVwsBi-ND-t82Cuuh1=8ee6jdOpzsmGN+CUZB6yjLg9jw@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Some optimisations for numeric division
|
Список | pgsql-hackers |
Attached are 3 small patches that improve the performance of numeric division. Patch 0002 seems to have the biggest impact, but I think they're all worth including, since they're quite simple changes, with noticeable performance gains. Patch 0001 vectorises the inner loop of div_var_fast(). This loop is almost identical to the inner loop of mul_var(), which was vectorised by commit 8870917623. The thing preventing the div_var_fast() loop from being vectorised is the assignment to div[qi + i], and simply replacing that with div_qi[i] where div_qi = &div[qi], in the same style as mul_var(), fixes that. One difference between this and the mul_var() code is that it is also necessary to add an explicit cast to get the compiler to generate matching output, and give the best results. This is because in mul_var() the compiler recognises that var1digit is actually 16-bit, rather than 32-bit, and so it doesn't need to multiply the high part, but in div_var_fast() it's not obvious to the compiler that qdigit also fits in 16 bits, hence the cast. The actual performance gain is typically quite modest, since div_var_fast() is always only a part of some more complex numeric computation, but it can be seen in cases like raising numbers to negative integer powers, e.g.: CREATE TEMP TABLE num_test(x numeric); SELECT setseed(0); INSERT INTO num_test SELECT (SELECT ('1.'||string_agg((random()*9)::int::text, '')||x)::numeric FROM generate_series(1,100)) FROM generate_series(1,100000) g(x); SELECT sum(x^(-2)) FROM num_test; Time: 112.949 ms (HEAD) Time: 98.537 ms (with patch) Patch 0002 simplifies the inner loop of div_var() (the guts of the public-facing numeric division operator) by more closely combining the multiplication and subtraction operations and folding the separate "carry" and "borrow" variables into a single "borrow", as suggested by the old code comment. IMO, this makes the code easier to understand, as well as giving more significant performance gains: CREATE TEMP TABLE div_test(x numeric); SELECT setseed(0); INSERT INTO div_test SELECT (SELECT ('1.'||string_agg((random()*9)::int::text, ''))::numeric + x FROM generate_series(1,50)) FROM generate_series(1,1000) g(x); SELECT sum(x/y) FROM div_test t1(x), div_test t2(y); Time: 1477.340 ms (HEAD) Time: 826.748 ms (with patch) Patch 0003 replaces some uses of div_var_fast() with div_var(). Specifically, when the divisor has just one or two base-NBASE digits, div_var() is faster. This is especially true for 1-digit divisors, due to the "fast path" code in div_var() to handle that. It's also just about true for 2-digit divisors, and it occurs to me that that case could potentially be optimised further with similar fast path code in div_var(), but I haven't tried that yet. One-digit divisors are quite common. For example, in the Taylor series expansions in exp_var() and ln_var(), since the number of Taylor series terms never exceeds a few hundred in practice. Also, exp_var()'s argument reduction step divides by 2^ndiv2, which is roughly 100 times the input, rounded up to a power of two, and valid inputs are less than 6000, so this will always be one or two digits. Testing this with a bunch of random exp() and ln() computations I saw a speed-up of 15-20%, and it reduced the run time of the numeric-big regression test by around 10%, which seems worth having. Regards, Dean
Вложения
В списке pgsql-hackers по дате отправления: