Thread

  1. Re: BUG #19340: Wrong result from CORR() function

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-12-03T01:27:56Z

    I wrote:
    > I'm coming around to the conclusion that your way is better,
    > though.  It seems good that "any NaN in the input results in
    > NaN output", which your way does and mine doesn't.
    
    Poking further at this, I found that my v2 patch fails that principle
    in one case:
    
    regression=# SELECT corr( 0.1 , 'nan' ) FROM generate_series(1,1000) g;
     corr 
    ------
         
    (1 row)
    
    We see that Y is constant and therefore return NULL, despite the
    other NaN input.
    
    I think we can fix that along these lines:
    
    @@ -3776,8 +3776,12 @@ float8_corr(PG_FUNCTION_ARGS)
     	if (N < 1.0)
     		PG_RETURN_NULL();
     
    -	/* per spec, return NULL for horizontal and vertical lines */
    -	if (!isnan(commonX) || !isnan(commonY))
    +	/*
    +	 * per spec, return NULL for horizontal and vertical lines; but not if the
    +	 * result would otherwise be NaN
    +	 */
    +	if ((!isnan(commonX) || !isnan(commonY)) &&
    +		(!isnan(Sxx) && !isnan(Syy)))
     		PG_RETURN_NULL();
     
     	/* at this point, Sxx and Syy cannot be zero or negative */
    
    (don't think it should be necessary to also check Sxy)
    
    
    BTW, HEAD is inconsistent: it will return NaN for this example, but
    only because it's confused by roundoff error into thinking that Y
    isn't constant.  With few enough inputs, it produces NULL too:
    
    regression=# SELECT corr( 0.1 , 'nan' ) FROM generate_series(1,3) g;
     corr 
    ------
         
    (1 row)
    
    			regards, tom lane