Re: BUG #19340: Wrong result from CORR() function

Поиск
Список
Период
Сортировка
От Dean Rasheed
Тема Re: BUG #19340: Wrong result from CORR() function
Дата
Msg-id CAEZATCU8=jjP1+dSSos6oXVzToGu-Rofyhv4BBpU31LN9Bo17Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #19340: Wrong result from CORR() function  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: BUG #19340: Wrong result from CORR() function
Список pgsql-bugs
On Wed, 3 Dec 2025 at 22:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Attached is a fleshed-out patch proposal that fixes the related
> aggregates and adds test cases.
>

Looking at float8_regr_accum(), I think it would be preferable to
arrange for it to leave Sxx, Syy, and Sxy zero until distinct X and Y
values are seen. I.e., something like this:

    if (newvalX != commonX || isnan(newvalX))
        commonX = get_float8_nan();
    if (newvalY != commonY || isnan(newvalY))
        commonY = get_float8_nan();

    if (isnan(commonX) || isnan(commonY))
    {
        tmpX = newvalX * N - Sx;
        tmpY = newvalY * N - Sy;
        scale = 1.0 / (N * transvalues[0]);
        if (isnan(commonX))
            Sxx += tmpX * tmpX * scale;
        if (isnan(commonY))
            Syy += tmpY * tmpY * scale;
        if (isnan(commonX) && isnan(commonY))
            Sxy += tmpX * tmpY * scale;

        ... Overflow check ...
    }

This would mean that float8_corr(), float8_regr_r2(),
float8_regr_slope(), and float8_regr_intercept() would not need to
look at commonX or commonY, and could simply rely on Sxx == 0 or Syy
== 0 to detect horizontal and vertical lines.

Aside from making the code simpler, this would guarantee that the
aggregate functions regr_sxx() and regr_syy() would return exactly
zero for all-constant X and Y inputs respectively, and that
regr_sxy(), covar_pop(), and covar_samp() would return exactly zero if
either the X or the Y inputs were all constant.

Something else that occurred to me was that float8_regr_avgx() and
float8_regr_avgy() might as well make use of commonX and commonY,
since we're calculating them, so they would return exact averages if
all the X or Y values were the same, rather than results with possible
rounding errors.

I also wonder if it would be worth doing something similar for the
single-variable aggregates so that var_pop(), var_samp(),
stddev_pop(), and stddev_samp() would all return exactly zero, and
avg() would return the exact common value, if all the inputs were
constant.

Regards,
Dean



В списке pgsql-bugs по дате отправления: