Thread

  1. Re: pg_plan_advice

    Jakub Wartak <jakub.wartak@enterprisedb.com> — 2025-12-18T12:27:27Z

    On Wed, Dec 17, 2025 at 2:44 PM Jakub Wartak
    <jakub.wartak@enterprisedb.com> wrote:
    >
    > On Wed, Dec 17, 2025 at 11:12 AM Jakub Wartak
    > <jakub.wartak@enterprisedb.com> wrote:
    > >
    > > On Mon, Dec 15, 2025 at 9:06 PM Robert Haas <robertmhaas@gmail.com> wrote:
    > > >
    > > > Here's v7.
    > > [..]
    > >[..q20..]
    >
    > OK, now for the q10:
    
    Hi, this is a follow-up just to the q10.
    
    > So to me it looks like in Generated Plan Advice we:
    > - have proper HASH_JOIN(customer nation)
    > - but it somehow forgot to include "HASH_JOIN(orders)" to cover for
    > that Parallel Hash Join on (orders.o_custkey = customer.c_custkey)
    > with input from NL. After adding that manually, it achieves the same
    > input plan properly.
    [..]
    
    Well, it's quite a ride with the Q10 and I partially wrong with above:
    
    0. The reported earlier wrong missing "HASH_JOIN(orders customer)" -
    that part was okay
    1. The Incremental Sort is being used in the original plan, but is
    still IS not reflected in the generated advice.
    2a. I've noticed Memoize/Index Scan was not being respected for "nation"
    2b. Seq scan for nation was being done for "nation"
    
    So total modification list, I've ended up doing (+ for adding , - for removing):
    
    + HASH_JOIN(orders customer) -- from earlier reply
    + NESTED_LOOP_MEMOIZE(nation)
    + INDEX_SCAN(nation public.pk_nation)
    - HASH_JOIN(customer nation) -- as it was we were having NL() in org plan
    SEQ_SCAN(orders customer nation) ==> SEQ_SCAN(orders customer)
    
    In full the best shape seems to be Q10 with pg_plan_advice.advice =
    'HASH_JOIN(orders customer) JOIN_ORDER(orders lineitem customer
    nation)    NESTED_LOOP_PLAIN(lineitem)    SEQ_SCAN(orders customer)
    INDEX_SCAN(lineitem public.lineitem_l_orderkey_idx_l_returnflag)
    GATHER_MERGE((customer orders lineitem nation))
    NESTED_LOOP_MEMOIZE(nation)';
    
    which yields:
     Sort
       Sort Key: (sum((lineitem.l_extendedprice * ('1'::numeric -
    lineitem.l_discount)))) DESC
       ->  GroupAggregate
             Group Key: customer.c_custkey, nation.n_name
             ->  Gather Merge
                   Workers Planned: 2
                   ->  Sort
                         Sort Key: customer.c_custkey, nation.n_name
                         ->  Nested Loop
                               ->  Parallel Hash Join
                                     Hash Cond: (orders.o_custkey =
    customer.c_custkey)
                                     ->  Nested Loop
                                           ->  Parallel Seq Scan on orders
                                                 Filter: ((o_orderdate >=
    '1993-10-01'::date) AND (o_orderdate < '1994-01-01
    00:00:00'::timestamp without time zone))
                                           ->  Index Scan using
    lineitem_l_orderkey_idx_l_returnflag on lineitem
                                                 Index Cond: (l_orderkey =
    orders.o_orderkey)
                                     ->  Parallel Hash
                                           ->  Parallel Seq Scan on customer
                               ->  Memoize
                                     Cache Key: customer.c_nationkey
                                     Cache Mode: logical
                                     ->  Index Scan using pk_nation on nation
                                           Index Cond: (n_nationkey =
    customer.c_nationkey)
    
    but that Incremental Sort *is* still missing. In original plan we are doing
       Incremental Sort (Sort Key: customer.c_custkey, nation.n_name,
    Presorted Key: customer.c_custkey)
       <-- .... Sort(Sort Key: customer.c_custkey)
    
    However, even with my overrides I haven't found an immediately obvious
    way to force it to use Incremental Sort on a specific field, so it
    just sorts on two at once. Maybe it's something that should be
    expressed through GATHER_MERGE()?, but that's not obvious how and
    where. In terms of raw performance , it seems to be very similiar
    (98ms +/- 8ms even between those two).
    
    -J.