Thread
-
Re: [PATCH] Fix overflow and underflow in regr_r2()
Dean Rasheed <dean.a.rasheed@gmail.com> — 2026-05-16T18:03:45Z
On Sat, 16 May 2026 at 17:45, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > BTW, on the principle of "where else did we make the same mistake", > I looked through the other aggregates using float8_regr_accum. > Most seem okay, but float8_regr_intercept does this: > > PG_RETURN_FLOAT8((Sy - Sx * Sxy / Sxx) / N); > > Seems to me that expression is also prone to internal > overflow/underflow. Underflow probably isn't a huge issue, > since the result will reduce to Sy/N which is likely to be good > enough. But can we do anything about overflow? > > One simple change that might make things better is to compute > > PG_RETURN_FLOAT8((Sy - Sx * (Sxy / Sxx)) / N); > > on the theory that the sums of products are likely to both be large. Hmm, that isn't necessarily better. For example, with this data: WITH t(x,y) AS ( SELECT 1e-155 + g*1e-160, 1e155 + g*1e150 FROM generate_series(1,10) g ) SELECT sum(x::float8) sx, sum(y::float8) sy, regr_sxx(y,x), regr_syy(y,x), regr_sxy(y,x), regr_intercept(y,x) FROM t; sx | sy | regr_sxx | regr_syy | regr_sxy | regr_intercept -------------------------+---------------+--------------+------------------------+-----------------------+------------------------- 1.0000550000000001e-154 | 1.000055e+156 | 8.24996e-319 | 8.249999999970085e+301 | 8.249999999965278e-09 | -5.144448004587567e+149 (1 row) The current regr_intercept() code works fine, but if you were to attempt to calculate Sxy / Sxx first, it would overflow. I think probably the least likely to overflow computation would be Sxy * (Sx / Sxx), because Sxx is likely to be very large/small whenever Sx is, so Sx / Sxx seems unlikely to overflow. There may well be examples disproving that theory too though, so maybe it needs to try multiple orderings. Regards, Dean