Re: explain analyze rows=%.0f

Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>

From: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
To: Robert Haas <robertmhaas@gmail.com>, Tom Lane <tgl@sss.pgh.pa.us>
Cc: Matthias van de Meent <boekewurm+postgres@gmail.com>, Andrei Lepikhov <lepihov@gmail.com>, pgsql-hackers@lists.postgresql.org, Guillaume Lelarge <guillaume@lelarge.info>, Matheus Alcantara <matheusssilv97@gmail.com>, Alena Rybakina <a.rybakina@postgrespro.ru>
Date: 2025-04-09T19:40:42Z
Lists: pgsql-hackers

Attachments

On 31.03.2025 23:59, Ilia Evdokimov wrote:
>
> We definitely shouldn’t remove the row counts < 1 check, since there 
> are many places in the planner where we divide by rows. This mechanism 
> was added specifically to prevent division by zero. Also, allowing 
> rows estimates below 1 can sometimes make the planner overly 
> optimistic, leading it to prefer cheaper-looking plans that may not 
> perform well in practice. For example, choosing a Nested Loop instead 
> of a more appropriate Hash Join.
>
> Allowing fractional rows > 1 might help improve planner accuracy in 
> some cases, but this needs further study to fully understand the impact.
>

I've been investigating whether it's worth removing rounding in row 
estimates - and I believe it is.

[ v1-0001-Always-use-two-fractional-digits-for-estimated-rows_SRC.patch ]

Currently, we round most row estimates using rint() inside 
clamp_row_est(). However, this function is also used for rounding tuples 
and page counts. These should remain integral, but row estimates can and 
should remain fractional for better precision. To address this, I 
introduced a new function clamp_tuple_est() which retains the existing 
rounding behavior (via rint()), while clamp_tuple_est() no longer 
rounds. I use clamp_tuple_est() only for row estimates and 
clamp_tuple_est() for tuples and pages elsewhere.

After removing rounding, two small issues needed fixing. First, there 
was a check rows > 1 in the cost estimation for Nested Loop joins, which 
is no longer reliable for values like 1.3. I updated it to rows >= 2 to 
retain the original behavior. This can be refined further, but, in my 
opinion, it's a practical compromise. Second, there is still a call to 
rint() in cost of mergejoin which likely should be removed too - though 
I haven’t included that here yet.

Also, if we're no longer rounding estimates, EXPLAIN should display them 
with two decimal digits, just like it already does for actual rows.

[ v1-0002-Always-use-two-fractional-digits-for-estimated-rows_TESTS.patch ]

So, what does this change improve? Here are some of the observed plan 
improvements:

  * Previously, a Parallel Aggregate was chosen. With slightly more
    accurate estimation, the planner switches to FinalizeAggregate,
    which can be more efficient in distributed plans.
  * In certain nested joins with constant subqueries, the planner
    previously inserted an unnecessary Materialize. With improved
    estimates, it now skips that step, reducing memory usage and latency.
  * When the estimated number of iterations becomes non-integer but
    still justifies caching, the planner adds Memoize instead of
    re-running a function like generate_series(). This can speed up
    execution significantly.
  * In one case involving partitioned tables and filter conditions like
    t1.b = 0, the planner now chooses an index-based nested loop join
    instead of a hash join. This results in a more efficient plan with
    fewer memory and CPU costs.

I know this patch still needs documentation updates to describe the new 
estimation display and behaviors. But before that, I’d like to gather 
feedback: does community agree that more precise estimates and 
fractional values are better than always rounding?

If anyone would like to see the EXPLAIN ANALYZE VERBOSE output of 
changes in regression tests, I’d be happy to share them.

All feedback and suggestions welcome!

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.