Re: BUG #19340: Wrong result from CORR() function
Tom Lane <tgl@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
To: Dean Rasheed <dean.a.rasheed@gmail.com>
Cc: Oleg Ivanov <o15611@gmail.com>, Laurenz Albe <laurenz.albe@cybertec.at>,
pgsql-bugs@lists.postgresql.org
Date: 2025-12-02T20:26:43Z
Lists: pgsql-bugs
Attachments
- wip-detect-constant-input-exactly.patch (text/x-diff)
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > On Tue, 2 Dec 2025 at 17:22, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I wonder whether it'd be worth carrying additional state to >> check that explicitly (instead of assuming that "if (Sxx == 0 || >> Syy == 0)" will catch it). > I wondered the same thing. It's not nice to have to do that, but > clearly the existing test for constant inputs is no good. The question > is, do we really want to spend extra cycles on every query just to > catch this odd corner case? I experimented with the attached patch, which is very incomplete; I just carried it far enough to be able to run performance checks on the modified code, and so all the binary statistics aggregates except corr() are broken. I observe about 2% slowdown on this test case: SELECT corr( 0.09 , 0.09000001 ) FROM generate_series(1,100000000); I think that any real-world usage is going to expend more effort obtaining the input data than this test does, so 2% should be a conservative upper bound on the cost. Seems to me that getting NULL-or-not right is probably worth a percent or so. If anyone feels differently, another idea could be to use a separate state transition function for corr() that skips the accumulation steps that corr() doesn't use. But I agree with the pre-existing decision to use just one transition function for all the binary aggregates. If this seems like a reasonable approach, I'll see about finishing out the patch. regards, tom lane