Re: BUG #19340: Wrong result from CORR() function
Tom Lane <tgl@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
To: Dean Rasheed <dean.a.rasheed@gmail.com>
Cc: Oleg Ivanov <o15611@gmail.com>, Laurenz Albe <laurenz.albe@cybertec.at>,
pgsql-bugs@lists.postgresql.org
Date: 2025-12-03T22:52:31Z
Lists: pgsql-bugs
Attachments
- v1-detect-constant-input-exactly.patch (text/x-diff)
Attached is a fleshed-out patch proposal that fixes the related
aggregates and adds test cases.
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> Also, given that any NaN input results in NaN output, regardless of
> whether or not the inputs are all the same, the values for commonX and
> commonY don't matter if any input is NaN. I think that allows the
> accumulator function's tests to be simplified (no need to test if new
> values are NaN).
I'm not convinced about that. I have it as
if (newvalX != commonX || isnan(newvalX))
commonX = get_float8_nan();
and I think you're saying we could just write
if (newvalX != commonX)
commonX = get_float8_nan();
My concern is that if newvalX is NaN but commonX isn't, then
I believe that the non-NaN-aware "!=" test is supposed to return
false, which'd cause us to not update commonX to NaN as required.
Maybe we could make it work by writing
if (!(newvalX == commonX))
commonX = get_float8_nan();
but I don't have a lot of faith in C compilers getting that right.
> Then there's this:
> SELECT corr(1e-100 + x*1e-105, 1e-100 + x*1e-105) FROM generate_series(1, 3) x;
> The correct answer in this case is also 1, which could be achieved by
> taking the square roots of Sxx and Syy separately, before multiplying,
> but it might also be sensible to clamp the result to the range [-1,1].
Poking at this, I soon found a test case where even with the separate
sqrt() calls we'd produce a result slightly outside [-1, 1] (running
this test over more values of x is sufficient). So now I think we
should do both the separate sqrt and the clamp.
regards, tom lane