Re: BUG #19340: Wrong result from CORR() function
| От | Dean Rasheed |
|---|---|
| Тема | Re: BUG #19340: Wrong result from CORR() function |
| Дата | |
| Msg-id | CAEZATCUyPKDb5z7yasp5v5HN9dTzPKqqa4u20goJK2fC15LpcQ@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: BUG #19340: Wrong result from CORR() function (Tom Lane <tgl@sss.pgh.pa.us>) |
| Список | pgsql-bugs |
On Wed, 3 Dec 2025 at 22:52, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Poking at this, I soon found a test case where even with the separate > sqrt() calls we'd produce a result slightly outside [-1, 1] (running > this test over more values of x is sufficient). So now I think we > should do both the separate sqrt and the clamp. > I'm starting to have doubts about having 2 sqrt() calls. The problem is that it seems to produce a noticeable reduction in accuracy in quite a few cases. This is especially noticeable with fully-correlated data. For example: SELECT n, (SELECT corr(x, x) FROM generate_series(1, n) x) FROM generate_series(1, 10) g(n); n | corr ----+-------------------- 1 | 2 | 0.9999999999999998 3 | 0.9999999999999998 4 | 0.9999999999999998 5 | 0.9999999999999998 6 | 1 7 | 0.9999999999999999 8 | 1 9 | 0.9999999999999999 10 | 1 (10 rows) Now I'm not sure that the current code can be expected to get cases like this exactly right 100% of the time, but it's pretty close. For example, if I do this: WITH t1 AS ( SELECT n, random() * 1000 AS r FROM generate_series(1, 1000000) n ), t2 AS ( SELECT corr(r, r) FROM t1 GROUP BY n % 10000 ) SELECT count(*), count(*) FILTER (WHERE corr != 1) FROM t2; on HEAD it produced corr = 1 every time I ran it, whereas the patch gives rounding errors roughly 25% of the time, which seems likely to be noticed. Perhaps we should only use 2 sqrt()'s if the product Sxx * Syy overflows. Regards, Dean
В списке pgsql-bugs по дате отправления: