Thread

  1. Re: Is there value in having optimizer stats for joins/foreignkeys?

    Tom Lane <tgl@sss.pgh.pa.us> — 2026-05-21T20:25:13Z

    Alexandra Wang <alexandra.wang.oss@gmail.com> writes:
    > Here's v7, another attempt to fix the unstable tests.
    
    Hi Alexandra,
    
    I signed up for an in-person review of this at PGConf.dev, but
    the schedule doesn't seem to be working in favor of making that
    happen.  If you see this and happen to run into me in the
    hallway, I'm happy to chat, but in any case here are my
    rather-hasty review notes.
    
    I think it's okay if v1 only handles 2-way joins, as long as the
    catalog representation is prepared for more.  Restricting to
    cases where we can do index-based sampling seems fine too.
    Those things could be relaxed later if it seems worthwhile,
    but we'd have a creditable feature even without.
    
    I didn't read the sampling code in any detail.  I think you will
    need to put more thought into what is user-friendly behavior
    in case the required index doesn't exist or doesn't have the
    right properties.  (I think the tests for that might not be
    strong enough, either.)
    
    I think you could simplify some code noticeably if you included the
    anchor rel's OID as the first element of stxjoinrels[].  Yeah,
    it'd be redundant with stxrelid, but so what?  It's not like 
    pg_statistic_ext rows are narrow enough that anyone would notice
    the extra 4 bytes.  I think this would simplify some of the
    relationships within the data structures, too, eg all varnos in
    the expressions could be considered to reference stxjoinrels[].
    
    I don't love stxkeyrefs[].  I wonder if it's time to throw away
    stxkeys[], represent all the target columns as regular expression
    trees in stxexprs, and then special-case columns that are simple
    Vars where appropriate at execution.
    
    (In the same vein, I dislike the grammar's separation of plain
    columns from expressions; I'd like to replace stats_params
    with expr_list and sort it all out later.  But perhaps that's
    material for a separate patch.)
    
    We will need to put more thought into permissions: I don't think
    requiring all the tables to have the same owner is workable.
    (What happens if someone tries to ALTER OWNER later?)  However,
    if they don't all have the same owner, there are potential security
    problems, so the right restriction is not obvious.  This is not
    necessary to solve now; there are bigger questions to worry about.
    But we'll need an answer before it's committable.
    
    It's not too soon to write some user-facing documentation.
    CREATE STATISTICS man page obviously needs attention, but
    also the discussion of extended stats in perform.sgml.
    And catalogs.sgml.  I find that writing that sort of stuff
    helps to clarify where one's design is weak.
    
    			regards, tom lane