Thread

  1. Re: [PATCH] Fix overflow and underflow in regr_r2()

    Dean Rasheed <dean.a.rasheed@gmail.com> — 2026-05-16T18:03:45Z

    On Sat, 16 May 2026 at 17:45, Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >
    > BTW, on the principle of "where else did we make the same mistake",
    > I looked through the other aggregates using float8_regr_accum.
    > Most seem okay, but float8_regr_intercept does this:
    >
    >         PG_RETURN_FLOAT8((Sy - Sx * Sxy / Sxx) / N);
    >
    > Seems to me that expression is also prone to internal
    > overflow/underflow.  Underflow probably isn't a huge issue,
    > since the result will reduce to Sy/N which is likely to be good
    > enough.  But can we do anything about overflow?
    >
    > One simple change that might make things better is to compute
    >
    >         PG_RETURN_FLOAT8((Sy - Sx * (Sxy / Sxx)) / N);
    >
    > on the theory that the sums of products are likely to both be large.
    
    Hmm, that isn't necessarily better. For example, with this data:
    
    WITH t(x,y) AS (
      SELECT 1e-155 + g*1e-160, 1e155 + g*1e150
        FROM generate_series(1,10) g
    )
    SELECT sum(x::float8) sx, sum(y::float8) sy,
           regr_sxx(y,x), regr_syy(y,x), regr_sxy(y,x),
           regr_intercept(y,x)
      FROM t;
    
               sx            |      sy       |   regr_sxx   |
    regr_syy        |       regr_sxy        |     regr_intercept
    -------------------------+---------------+--------------+------------------------+-----------------------+-------------------------
     1.0000550000000001e-154 | 1.000055e+156 | 8.24996e-319 |
    8.249999999970085e+301 | 8.249999999965278e-09 |
    -5.144448004587567e+149
    (1 row)
    
    The current regr_intercept() code works fine, but if you were to
    attempt to calculate Sxy / Sxx first, it would overflow.
    
    I think probably the least likely to overflow computation would be Sxy
    * (Sx / Sxx), because Sxx is likely to be very large/small whenever Sx
    is, so Sx / Sxx seems unlikely to overflow. There may well be examples
    disproving that theory too though, so maybe it needs to try multiple
    orderings.
    
    Regards,
    Dean