Thread
-
Re: BUG #19340: Wrong result from CORR() function
Tom Lane <tgl@sss.pgh.pa.us> — 2025-12-03T22:52:31Z
Attached is a fleshed-out patch proposal that fixes the related aggregates and adds test cases. Dean Rasheed <dean.a.rasheed@gmail.com> writes: > Also, given that any NaN input results in NaN output, regardless of > whether or not the inputs are all the same, the values for commonX and > commonY don't matter if any input is NaN. I think that allows the > accumulator function's tests to be simplified (no need to test if new > values are NaN). I'm not convinced about that. I have it as if (newvalX != commonX || isnan(newvalX)) commonX = get_float8_nan(); and I think you're saying we could just write if (newvalX != commonX) commonX = get_float8_nan(); My concern is that if newvalX is NaN but commonX isn't, then I believe that the non-NaN-aware "!=" test is supposed to return false, which'd cause us to not update commonX to NaN as required. Maybe we could make it work by writing if (!(newvalX == commonX)) commonX = get_float8_nan(); but I don't have a lot of faith in C compilers getting that right. > Then there's this: > SELECT corr(1e-100 + x*1e-105, 1e-100 + x*1e-105) FROM generate_series(1, 3) x; > The correct answer in this case is also 1, which could be achieved by > taking the square roots of Sxx and Syy separately, before multiplying, > but it might also be sensible to clamp the result to the range [-1,1]. Poking at this, I soon found a test case where even with the separate sqrt() calls we'd produce a result slightly outside [-1, 1] (running this test over more values of x is sufficient). So now I think we should do both the separate sqrt and the clamp. regards, tom lane