Thread

  1. Re: BUG #19340: Wrong result from CORR() function

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-12-03T22:52:31Z

    Attached is a fleshed-out patch proposal that fixes the related
    aggregates and adds test cases.
    
    Dean Rasheed <dean.a.rasheed@gmail.com> writes:
    > Also, given that any NaN input results in NaN output, regardless of
    > whether or not the inputs are all the same, the values for commonX and
    > commonY don't matter if any input is NaN. I think that allows the
    > accumulator function's tests to be simplified (no need to test if new
    > values are NaN).
    
    I'm not convinced about that.  I have it as
    
            if (newvalX != commonX || isnan(newvalX))
                commonX = get_float8_nan();
    
    and I think you're saying we could just write
    
            if (newvalX != commonX)
                commonX = get_float8_nan();
    
    My concern is that if newvalX is NaN but commonX isn't, then
    I believe that the non-NaN-aware "!=" test is supposed to return
    false, which'd cause us to not update commonX to NaN as required.
    Maybe we could make it work by writing
    
            if (!(newvalX == commonX))
                commonX = get_float8_nan();
    
    but I don't have a lot of faith in C compilers getting that right.
    
    > Then there's this:
    > SELECT corr(1e-100 + x*1e-105, 1e-100 + x*1e-105) FROM generate_series(1, 3) x;
    > The correct answer in this case is also 1, which could be achieved by
    > taking the square roots of Sxx and Syy separately, before multiplying,
    > but it might also be sensible to clamp the result to the range [-1,1].
    
    Poking at this, I soon found a test case where even with the separate
    sqrt() calls we'd produce a result slightly outside [-1, 1] (running
    this test over more values of x is sufficient).  So now I think we
    should do both the separate sqrt and the clamp.
    
    			regards, tom lane