Re: BUG #19340: Wrong result from CORR() function

Tom Lane <tgl@sss.pgh.pa.us>

From: Tom Lane <tgl@sss.pgh.pa.us>
To: Dean Rasheed <dean.a.rasheed@gmail.com>
Cc: Oleg Ivanov <o15611@gmail.com>, Laurenz Albe <laurenz.albe@cybertec.at>, pgsql-bugs@lists.postgresql.org
Date: 2025-12-03T22:52:31Z
Lists: pgsql-bugs

Attachments

Attached is a fleshed-out patch proposal that fixes the related
aggregates and adds test cases.

Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> Also, given that any NaN input results in NaN output, regardless of
> whether or not the inputs are all the same, the values for commonX and
> commonY don't matter if any input is NaN. I think that allows the
> accumulator function's tests to be simplified (no need to test if new
> values are NaN).

I'm not convinced about that.  I have it as

        if (newvalX != commonX || isnan(newvalX))
            commonX = get_float8_nan();

and I think you're saying we could just write

        if (newvalX != commonX)
            commonX = get_float8_nan();

My concern is that if newvalX is NaN but commonX isn't, then
I believe that the non-NaN-aware "!=" test is supposed to return
false, which'd cause us to not update commonX to NaN as required.
Maybe we could make it work by writing

        if (!(newvalX == commonX))
            commonX = get_float8_nan();

but I don't have a lot of faith in C compilers getting that right.

> Then there's this:
> SELECT corr(1e-100 + x*1e-105, 1e-100 + x*1e-105) FROM generate_series(1, 3) x;
> The correct answer in this case is also 1, which could be achieved by
> taking the square roots of Sxx and Syy separately, before multiplying,
> but it might also be sensible to clamp the result to the range [-1,1].

Poking at this, I soon found a test case where even with the separate
sqrt() calls we'd produce a result slightly outside [-1, 1] (running
this test over more values of x is sufficient).  So now I think we
should do both the separate sqrt and the clamp.

			regards, tom lane