Thread
-
Re: BUG #19340: Wrong result from CORR() function
Dean Rasheed <dean.a.rasheed@gmail.com> — 2025-12-05T09:23:59Z
On Wed, 3 Dec 2025 at 22:52, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Attached is a fleshed-out patch proposal that fixes the related > aggregates and adds test cases. > Looking at float8_regr_accum(), I think it would be preferable to arrange for it to leave Sxx, Syy, and Sxy zero until distinct X and Y values are seen. I.e., something like this: if (newvalX != commonX || isnan(newvalX)) commonX = get_float8_nan(); if (newvalY != commonY || isnan(newvalY)) commonY = get_float8_nan(); if (isnan(commonX) || isnan(commonY)) { tmpX = newvalX * N - Sx; tmpY = newvalY * N - Sy; scale = 1.0 / (N * transvalues[0]); if (isnan(commonX)) Sxx += tmpX * tmpX * scale; if (isnan(commonY)) Syy += tmpY * tmpY * scale; if (isnan(commonX) && isnan(commonY)) Sxy += tmpX * tmpY * scale; ... Overflow check ... } This would mean that float8_corr(), float8_regr_r2(), float8_regr_slope(), and float8_regr_intercept() would not need to look at commonX or commonY, and could simply rely on Sxx == 0 or Syy == 0 to detect horizontal and vertical lines. Aside from making the code simpler, this would guarantee that the aggregate functions regr_sxx() and regr_syy() would return exactly zero for all-constant X and Y inputs respectively, and that regr_sxy(), covar_pop(), and covar_samp() would return exactly zero if either the X or the Y inputs were all constant. Something else that occurred to me was that float8_regr_avgx() and float8_regr_avgy() might as well make use of commonX and commonY, since we're calculating them, so they would return exact averages if all the X or Y values were the same, rather than results with possible rounding errors. I also wonder if it would be worth doing something similar for the single-variable aggregates so that var_pop(), var_samp(), stddev_pop(), and stddev_samp() would all return exactly zero, and avg() would return the exact common value, if all the inputs were constant. Regards, Dean