Thread

  1. Re: pg_plan_advice

    Jakub Wartak <jakub.wartak@enterprisedb.com> — 2025-12-17T10:12:40Z

    On Mon, Dec 15, 2025 at 9:06 PM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    > Here's v7.
    [..]
    
    OK, so I've tested today from Your's branch directly, so I hope that
    was also v7. Given the following q20 query:
    
    SELECT s_name, s_address
    FROM supplier, nation
    WHERE s_suppkey in
        (SELECT ps_suppkey
         FROM partsupp
         WHERE ps_partkey in
             (SELECT p_partkey
              FROM part
              WHERE p_name LIKE 'forest%' )
           AND ps_availqty >
             (SELECT 0.5 * sum(l_quantity)
              FROM lineitem
              WHERE l_partkey = ps_partkey
                AND l_suppkey = ps_suppkey
                AND l_shipdate >= DATE '1994-01-01'
                AND l_shipdate < DATE '1994-01-01' + INTERVAL '1' year ) )
      AND s_nationkey = n_nationkey
      AND n_name = 'CANADA'
    ORDER BY s_name;
    
    in normal conditions (w/o advice) the above query generates:
    
     Sort  (cost=1010985030.44..1010985030.59 rows=61 width=51)
       Sort Key: supplier.s_name
       ->  Nested Loop  (cost=0.42..1010985028.63 rows=61 width=51)
             Join Filter: (nation.n_nationkey = supplier.s_nationkey)
             ->  Seq Scan on nation  (cost=0.00..1.31 rows=1 width=4)
                   Filter: (n_name = 'CANADA'::bpchar)
             ->  Nested Loop Semi Join  (cost=0.42..1010985008.29
    rows=1522 width=55)
                   Join Filter: (partsupp.ps_suppkey = supplier.s_suppkey)
                   ->  Seq Scan on supplier  (cost=0.00..249.30 rows=7730 width=59)
                   ->  Materialize  (cost=0.42..1010755994.57 rows=1973 width=4)
                         ->  Nested Loop  (cost=0.42..1010755984.71
    rows=1973 width=4)
                               ->  Seq Scan on part  (cost=0.00..4842.25
    rows=1469 width=4)
                                     Filter: ((p_name)::text ~~ 'forest%'::text)
                               ->  Index Scan using pk_partsupp on
    partsupp  (cost=0.42..688053.87 rows=1 width=8)
                                     Index Cond: (ps_partkey = part.p_partkey)
                                     Filter: ((ps_availqty)::numeric >
    (SubPlan expr_1))
                                     SubPlan expr_1
                                       ->  Aggregate
    (cost=172009.42..172009.44 rows=1 width=32)
                                             ->  Seq Scan on lineitem
    (cost=0.00..172009.42 rows=1 width=5)
                                                   Filter: ((l_shipdate >=
    '1994-01-01'::date) AND (l_shipdate < '1995-01-01 00:00:00'::timestamp
    without time zone) AND (l_partkey = partsupp.ps_partkey) AND
    (l_suppkey = partsupp.ps_suppkey))
    
    
     Generated Plan Advice:
       JOIN_ORDER(nation (supplier (part partsupp)))
       NESTED_LOOP_PLAIN(partsupp partsupp) <--- [X]
       NESTED_LOOP_MATERIALIZE(partsupp)
       SEQ_SCAN(nation supplier part lineitem@expr_1)
       INDEX_SCAN(partsupp public.pk_partsupp)
       SEMIJOIN_NON_UNIQUE((partsupp part))
       NO_GATHER(supplier nation partsupp part lineitem@expr_1)
    
    Please see the - I think it's confusing? -
    NESTED_LOOP_MATERIALIZE(partsupp partsupp) - that's 2x the same
    string? This causes it to turn into below plan -- I've marked the
    problem with [X]
    
     Sort  (cost=50035755.50..50035755.66 rows=61 width=51)
       Sort Key: supplier.s_name
       ->  Nested Loop  (cost=12562154.32..50035753.70 rows=61 width=51)
             Join Filter: (nation.n_nationkey = supplier.s_nationkey)
             ->  Seq Scan on nation  (cost=0.00..1.31 rows=1 width=4)
                   Filter: (n_name = 'CANADA'::bpchar)
             ->  Nested Loop Semi Join  (cost=12562154.32..50035733.36
    rows=1522 width=55)
                 [X] -- missing Join Filter here
                   ->  Seq Scan on supplier  (cost=0.00..249.30 rows=7730 width=59)
                   [X] -- HJ instead of Materialize+Nested Loop below:
                   ->  Hash Join  (cost=12562154.32..12567002.09 rows=1 width=4)
                         Hash Cond: (part.p_partkey = partsupp.ps_partkey)
                         ->  Seq Scan on part  (cost=0.00..4842.25
    rows=1469 width=4)
                               Filter: ((p_name)::text ~~ 'forest%'::text)
                         ->  Hash  (cost=12562154.02..12562154.02 rows=24 width=8)
                               ->  Index Scan using pk_partsupp on
    partsupp  (cost=0.42..12562154.02 rows=24 width=8)
                                     [X] -- wrong Index Cond below
    (suppkey instead of partkey)
                                     Index Cond: (ps_suppkey = supplier.s_suppkey)
                                     Filter: ((ps_availqty)::numeric >
    (SubPlan expr_1))
                                     SubPlan expr_1
                                       ->  Aggregate
    (cost=172009.42..172009.44 rows=1 width=32)
                                             ->  Seq Scan on lineitem
    (cost=0.00..172009.42 rows=1 width=5)
                                                   Filter: ((l_shipdate >=
    '1994-01-01'::date) AND (l_shipdate < '1995-01-01 00:00:00'::timestamp
    without time zone) AND (l_partkey = partsupp.ps_partkey) AND
    (l_suppkey = partsupp.ps_suppkey))
    
    Supplied Plan Advice:
       SEQ_SCAN(nation) /* matched */
       SEQ_SCAN(supplier) /* matched */
       SEQ_SCAN(part) /* matched */
       SEQ_SCAN(lineitem@expr_1) /* matched */
       INDEX_SCAN(partsupp public.pk_partsupp) /* matched */
       JOIN_ORDER(nation (supplier (part partsupp))) /* matched, conflicting */
       NESTED_LOOP_PLAIN(partsupp) /* matched, conflicting */
       NESTED_LOOP_PLAIN(partsupp) /* matched, conflicting */
       NESTED_LOOP_MATERIALIZE(partsupp) /* matched, conflicting, failed */
       SEMIJOIN_NON_UNIQUE((partsupp part)) /* matched, conflicting */
       NO_GATHER(supplier) /* matched */
       NO_GATHER(nation) /* matched */
       NO_GATHER(partsupp) /* matched */
       NO_GATHER(part) /* matched */
       NO_GATHER(lineitem@expr_1) /* matched */
    
    So the difference is basically between:
        set pg_plan_advice.advice = '[..] NESTED_LOOP_PLAIN(partsupp
    partsupp) NESTED_LOOP_MATERIALIZE(partsupp) [..]';
    which causes wrong plan and outcome:
        NESTED_LOOP_MATERIALIZE(partsupp) /* matched, conflicting, failed */
    
    and apparently proper advice like below which has better yield:
        set pg_plan_advice.advice = '[..] NESTED_LOOP_PLAIN(part partsupp)
    NESTED_LOOP_MATERIALIZE(partsupp) [..]';
    which is not generated , but caused good plan, however it also prints:
       NESTED_LOOP_PLAIN(part) /* matched, conflicting, failed */
       NESTED_LOOP_MATERIALIZE(partsupp) /* matched, conflicting */
    but that seems "failed" there, seems to be untrue?
    
    Another idea is perhaps, we could have some elog(WARNING) - but not
    Asserts() - in assert-only enabled build that could alert us in case
    of duplicated entries being detected for the same ops in
    pg_plan_advice_explain_feedback()?
    
    -J.