Thread

  1. Re: BUG #19340: Wrong result from CORR() function

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-12-02T20:26:43Z

    Dean Rasheed <dean.a.rasheed@gmail.com> writes:
    > On Tue, 2 Dec 2025 at 17:22, Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >> I wonder whether it'd be worth carrying additional state to
    >> check that explicitly (instead of assuming that "if (Sxx == 0 ||
    >> Syy == 0)" will catch it).
    
    > I wondered the same thing. It's not nice to have to do that, but
    > clearly the existing test for constant inputs is no good. The question
    > is, do we really want to spend extra cycles on every query just to
    > catch this odd corner case?
    
    I experimented with the attached patch, which is very incomplete;
    I just carried it far enough to be able to run performance checks on
    the modified code, and so all the binary statistics aggregates except
    corr() are broken.  I observe about 2% slowdown on this test case:
    
    SELECT corr( 0.09 , 0.09000001 ) FROM generate_series(1,100000000);
    
    I think that any real-world usage is going to expend more effort
    obtaining the input data than this test does, so 2% should be a
    conservative upper bound on the cost.  Seems to me that getting
    NULL-or-not right is probably worth a percent or so.
    
    If anyone feels differently, another idea could be to use a
    separate state transition function for corr() that skips the
    accumulation steps that corr() doesn't use.  But I agree with
    the pre-existing decision to use just one transition function
    for all the binary aggregates.
    
    If this seems like a reasonable approach, I'll see about finishing
    out the patch.
    
    			regards, tom lane