Thread

  1. Re: pg_plan_advice

    Jakub Wartak <jakub.wartak@enterprisedb.com> — 2025-10-31T09:58:59Z

    On Thu, Oct 30, 2025 at 3:00 PM Robert Haas <robertmhaas@gmail.com> wrote:
    
    [..over 400kB of attachments, yay]
    
    Thank You for working on this!
    
    My gcc-13 was nitpicking a little bit (see
    compilation_warnings_v1.txt), so attached is just a tiny diff to fix
    some of those issues. After that, clang-20 run was clean too.
    
    > First, any form of user control over the planner tends to be a lightning rod for criticism around here.
    
    I do not know where this is coming from, but everybody I've talked to
    was saying this is needed to handle real enterprise databases and
    applications. I just really love it, how one could precisely adjust
    the plan with this even with the presence of heavy aliasing:
    
    postgres=# explain (plan_advice, costs off) SELECT * FROM (select *
    from t1 a join t2 b using (id)) a, t2 b, t3 c WHERE a.id = b.id and
    b.id = c.id;
                         QUERY PLAN
    -----------------------------------------------------
     Merge Join
       Merge Cond: (a.id = c.id)
       ->  Merge Join
             Merge Cond: (a.id = b.id)
             ->  Index Scan using t1_pkey on t1 a
             ->  Index Scan using t2_pkey on t2 b
       ->  Sort
             Sort Key: c.id
             ->  Seq Scan on t3 c
     Supplied Plan Advice:
       SEQ_SCAN(ble5) /* not matched */
     Generated Plan Advice:
       JOIN_ORDER(a#2 b#2 c)
       MERGE_JOIN_PLAIN(b#2 c)
       SEQ_SCAN(c)
       INDEX_SCAN(a#2 public.t1_pkey )
       NO_GATHER(c a#2 b#2)
    (17 rows)
    
    postgres=# set pg_plan_advice.advice = 'SEQ_SCAN(b#2)';
    SET
    postgres=# explain (plan_advice, costs off) SELECT * FROM (select *
    from t1 a join t2 b using (id)) a, t2 b, t3 c WHERE a.id = b.id and
    b.id = c.id;
                         QUERY PLAN
    ----------------------------------------------------
     Hash Join
       Hash Cond: (b.id = a.id)
       ->  Seq Scan on t2 b
       ->  Hash
             ->  Merge Join
                   Merge Cond: (a.id = c.id)
                   ->  Index Scan using t1_pkey on t1 a
                   ->  Sort
                         Sort Key: c.id
                         ->  Seq Scan on t3 c
     Supplied Plan Advice:
       SEQ_SCAN(b#2) /* matched */
     Generated Plan Advice:
       JOIN_ORDER(b#2 (a#2 c))
       MERGE_JOIN_PLAIN(c)
       HASH_JOIN(c)
       SEQ_SCAN(b#2 c)
       INDEX_SCAN(a#2 public.t1_pkey)
       NO_GATHER(c a#2 b#2)
    
    To attract a little attention to the $thread, the only bigger design
    (usability) question that keeps ringing in my head is how we are going
    to bind it to specific queries without even issuing any SETs(or ALTER
    USER) in the far future in the grand scheme of things. The discussed
    query id (hash), full query text comparison, maybe even strstr(query ,
    "partial hit") or regex all seem to be kind too limited in terms of
    what crazy ORMs can come up with (each query will be potentially
    slightly different, but if optimizer reference points are stable that
    should nail it good enough, but just enabling it for the very specific
    set of queries and not the others [with same aliases] is some major
    challenge).
    
    Due to this, at some point I was even thinking about some hashes for
    every plan node (including hashes of subplans), e.g.:
     Merge Join // hash(MERGE_JOIN_PLAIN(b#2) + ';' somehashval1 + ';'+
    somehahsval2 ) => somehashval3
       Merge Cond: (a.id = c.id)
       ->  Merge Join
             Merge Cond: (a.id = b.id)
             ->  Index Scan using t1_pkey on t1 a // hash(INDEX_SCAN(a#2
    public.t1_pkey)) => somehashval1
             ->  Index Scan using t2_pkey on t2 b // hash(INDEX_SCAN(b#2
    public.t2_pkey)) => somehashval2
    
    and then having a way to use `somehashval3` (let's say it's SHA1) as a
    way to activate the necessary advice. Something like having a way to
    express it using plan_advice.on_subplanhashes_plan_advice =
    'somehashval3: SEQ_SCAN(b#2)'. This would have the benefit of being
    able to override multiple similiar SQL queries in one go rather than
    collecting all possible query_ids, but probably it's stupid,
    heavyweight, but that would be my dream ;)
    
    -J.