Re: BUG #19340: Wrong result from CORR() function

Tom Lane <tgl@sss.pgh.pa.us>

From: Tom Lane <tgl@sss.pgh.pa.us>
To: Dean Rasheed <dean.a.rasheed@gmail.com>
Cc: Oleg Ivanov <o15611@gmail.com>, Laurenz Albe <laurenz.albe@cybertec.at>, pgsql-bugs@lists.postgresql.org
Date: 2025-12-03T01:27:56Z
Lists: pgsql-bugs
I wrote:
> I'm coming around to the conclusion that your way is better,
> though.  It seems good that "any NaN in the input results in
> NaN output", which your way does and mine doesn't.

Poking further at this, I found that my v2 patch fails that principle
in one case:

regression=# SELECT corr( 0.1 , 'nan' ) FROM generate_series(1,1000) g;
 corr 
------
     
(1 row)

We see that Y is constant and therefore return NULL, despite the
other NaN input.

I think we can fix that along these lines:

@@ -3776,8 +3776,12 @@ float8_corr(PG_FUNCTION_ARGS)
 	if (N < 1.0)
 		PG_RETURN_NULL();
 
-	/* per spec, return NULL for horizontal and vertical lines */
-	if (!isnan(commonX) || !isnan(commonY))
+	/*
+	 * per spec, return NULL for horizontal and vertical lines; but not if the
+	 * result would otherwise be NaN
+	 */
+	if ((!isnan(commonX) || !isnan(commonY)) &&
+		(!isnan(Sxx) && !isnan(Syy)))
 		PG_RETURN_NULL();
 
 	/* at this point, Sxx and Syy cannot be zero or negative */

(don't think it should be necessary to also check Sxy)


BTW, HEAD is inconsistent: it will return NaN for this example, but
only because it's confused by roundoff error into thinking that Y
isn't constant.  With few enough inputs, it produces NULL too:

regression=# SELECT corr( 0.1 , 'nan' ) FROM generate_series(1,3) g;
 corr 
------
     
(1 row)

			regards, tom lane