Thread

Re: Is there value in having optimizer stats for joins/foreignkeys?

Tomas Vondra <tomas@vondra.me> — 2026-05-27T19:08:34Z

On 5/27/26 19:49, Alexandra Wang wrote:
> Hi Tom and Tomas,
> 
> Thank you so much for the feedback!
> 
> On Mon, May 25, 2026 at 8:04 AM Tom Lane <tgl@sss.pgh.pa.us
> <mailto:tgl@sss.pgh.pa.us>> wrote:
>> Tomas Vondra <tomas@vondra.me <mailto:tomas@vondra.me>> writes:
>> > On 5/21/26 22:25, Tom Lane wrote:
>> >> I don't love stxkeyrefs[].  I wonder if it's time to throw away
>> >> stxkeys[], represent all the target columns as regular expression
>> >> trees in stxexprs, and then special-case columns that are simple
>> >> Vars where appropriate at execution.
>> >> (In the same vein, I dislike the grammar's separation of plain
>> >> columns from expressions; I'd like to replace stats_params
>> >> with expr_list and sort it all out later.  But perhaps that's
>> >> material for a separate patch.)
>>
>> > FWIW the extended stats copied this from pg_index, which also stores
>> > keys and expressions separately. I suppose there was a reason for that,
>> > most likely performance - is cheaper to compare attnums than
>> > expressions, and plain keys are much more common.
>>
>> I think I might be to blame for the separate storage of indexprs.
>> If so, the motivation was to avoid breakage of older code that only
>> knew about indkey[].  (Of course, such code would necessarily fail
>> on indexes with expressions, but we wanted to avoid breakage for the
>> common case of no-expressions.)  I don't think that consideration is
>> nearly as pressing for extended stats.  There's probably a lot less
>> client-side code that knows about extended stats at all, and what
>> there is seems more likely to rely on the server-side display
>> functions than to dig into the catalog details for itself.  Also,
>> if there is anything that's looking at pg_statistic_ext details,
>> it will need work anyway after this patch; there's no way around that.
> 
> I'm working on removing stxkeys[] as a prerequisite commit before the
> main join
> stats patch, representing all target columns as Var nodes in stxexprs,
> as you
> both suggested.
> 
> One question about the pg_stats_ext view: currently it has two complementary
> columns:
> 
> - attnames (name[]) — Names of the columns included in the statistics object
> - exprs (text[]) — Expressions included in the statistics object
> 
> With stxkeys gone from the catalog, should the view:
> 
> (a) Stay as-is: keep attnames and exprs as separate columns with the same
> semantics. Implemented via a helper function that extracts plain column
> names
> from the unified stxexprs.
> 
> or
> 
> (b) Mirror the catalog: remove attnames, make exprs show all entries (both
> column names and expressions together in one text[] array).
> 
> Any preference?
> 

My 2c: AFAIR there's no fundamental reason to keep those two lists
separate, other than that expressions were "bolted on" later, after we
already had stats on plain attributes. In hindsight, it might have been
better to just unify the view back then, probably.

I personally would be OK with just unifying adjusting the view, and
showing a single list (with both attributes and expressions). IIUC the
plan is to just store a list of expressions anyway, with attributes
represented as Vars. So the view would have to do more work just to
produce the "old" output, with little benefit.

The one argument against this that I can think of is possibly breaking
tools that use this view. IIRC pg_dump is reading the view when
exporting/importing the statistics. That might need some adjustments
(and there's also pg_stats_ext_exprs), but maybe it's easier to keep the
view consistent.

Also, maybe this is one more argument against the "optimizer view" idea
Corey mentioned in Vancouver last week? Because surely we'll want to
include the join statistics in export/import, and for the view approach
we'd need to invent a fair amount of code to do that. Maybe?


regards

-- 
Tomas Vondra