Thread
-
Re: pg_plan_advice
Jakub Wartak <jakub.wartak@enterprisedb.com> — 2025-12-17T10:12:40Z
On Mon, Dec 15, 2025 at 9:06 PM Robert Haas <robertmhaas@gmail.com> wrote: > > Here's v7. [..] OK, so I've tested today from Your's branch directly, so I hope that was also v7. Given the following q20 query: SELECT s_name, s_address FROM supplier, nation WHERE s_suppkey in (SELECT ps_suppkey FROM partsupp WHERE ps_partkey in (SELECT p_partkey FROM part WHERE p_name LIKE 'forest%' ) AND ps_availqty > (SELECT 0.5 * sum(l_quantity) FROM lineitem WHERE l_partkey = ps_partkey AND l_suppkey = ps_suppkey AND l_shipdate >= DATE '1994-01-01' AND l_shipdate < DATE '1994-01-01' + INTERVAL '1' year ) ) AND s_nationkey = n_nationkey AND n_name = 'CANADA' ORDER BY s_name; in normal conditions (w/o advice) the above query generates: Sort (cost=1010985030.44..1010985030.59 rows=61 width=51) Sort Key: supplier.s_name -> Nested Loop (cost=0.42..1010985028.63 rows=61 width=51) Join Filter: (nation.n_nationkey = supplier.s_nationkey) -> Seq Scan on nation (cost=0.00..1.31 rows=1 width=4) Filter: (n_name = 'CANADA'::bpchar) -> Nested Loop Semi Join (cost=0.42..1010985008.29 rows=1522 width=55) Join Filter: (partsupp.ps_suppkey = supplier.s_suppkey) -> Seq Scan on supplier (cost=0.00..249.30 rows=7730 width=59) -> Materialize (cost=0.42..1010755994.57 rows=1973 width=4) -> Nested Loop (cost=0.42..1010755984.71 rows=1973 width=4) -> Seq Scan on part (cost=0.00..4842.25 rows=1469 width=4) Filter: ((p_name)::text ~~ 'forest%'::text) -> Index Scan using pk_partsupp on partsupp (cost=0.42..688053.87 rows=1 width=8) Index Cond: (ps_partkey = part.p_partkey) Filter: ((ps_availqty)::numeric > (SubPlan expr_1)) SubPlan expr_1 -> Aggregate (cost=172009.42..172009.44 rows=1 width=32) -> Seq Scan on lineitem (cost=0.00..172009.42 rows=1 width=5) Filter: ((l_shipdate >= '1994-01-01'::date) AND (l_shipdate < '1995-01-01 00:00:00'::timestamp without time zone) AND (l_partkey = partsupp.ps_partkey) AND (l_suppkey = partsupp.ps_suppkey)) Generated Plan Advice: JOIN_ORDER(nation (supplier (part partsupp))) NESTED_LOOP_PLAIN(partsupp partsupp) <--- [X] NESTED_LOOP_MATERIALIZE(partsupp) SEQ_SCAN(nation supplier part lineitem@expr_1) INDEX_SCAN(partsupp public.pk_partsupp) SEMIJOIN_NON_UNIQUE((partsupp part)) NO_GATHER(supplier nation partsupp part lineitem@expr_1) Please see the - I think it's confusing? - NESTED_LOOP_MATERIALIZE(partsupp partsupp) - that's 2x the same string? This causes it to turn into below plan -- I've marked the problem with [X] Sort (cost=50035755.50..50035755.66 rows=61 width=51) Sort Key: supplier.s_name -> Nested Loop (cost=12562154.32..50035753.70 rows=61 width=51) Join Filter: (nation.n_nationkey = supplier.s_nationkey) -> Seq Scan on nation (cost=0.00..1.31 rows=1 width=4) Filter: (n_name = 'CANADA'::bpchar) -> Nested Loop Semi Join (cost=12562154.32..50035733.36 rows=1522 width=55) [X] -- missing Join Filter here -> Seq Scan on supplier (cost=0.00..249.30 rows=7730 width=59) [X] -- HJ instead of Materialize+Nested Loop below: -> Hash Join (cost=12562154.32..12567002.09 rows=1 width=4) Hash Cond: (part.p_partkey = partsupp.ps_partkey) -> Seq Scan on part (cost=0.00..4842.25 rows=1469 width=4) Filter: ((p_name)::text ~~ 'forest%'::text) -> Hash (cost=12562154.02..12562154.02 rows=24 width=8) -> Index Scan using pk_partsupp on partsupp (cost=0.42..12562154.02 rows=24 width=8) [X] -- wrong Index Cond below (suppkey instead of partkey) Index Cond: (ps_suppkey = supplier.s_suppkey) Filter: ((ps_availqty)::numeric > (SubPlan expr_1)) SubPlan expr_1 -> Aggregate (cost=172009.42..172009.44 rows=1 width=32) -> Seq Scan on lineitem (cost=0.00..172009.42 rows=1 width=5) Filter: ((l_shipdate >= '1994-01-01'::date) AND (l_shipdate < '1995-01-01 00:00:00'::timestamp without time zone) AND (l_partkey = partsupp.ps_partkey) AND (l_suppkey = partsupp.ps_suppkey)) Supplied Plan Advice: SEQ_SCAN(nation) /* matched */ SEQ_SCAN(supplier) /* matched */ SEQ_SCAN(part) /* matched */ SEQ_SCAN(lineitem@expr_1) /* matched */ INDEX_SCAN(partsupp public.pk_partsupp) /* matched */ JOIN_ORDER(nation (supplier (part partsupp))) /* matched, conflicting */ NESTED_LOOP_PLAIN(partsupp) /* matched, conflicting */ NESTED_LOOP_PLAIN(partsupp) /* matched, conflicting */ NESTED_LOOP_MATERIALIZE(partsupp) /* matched, conflicting, failed */ SEMIJOIN_NON_UNIQUE((partsupp part)) /* matched, conflicting */ NO_GATHER(supplier) /* matched */ NO_GATHER(nation) /* matched */ NO_GATHER(partsupp) /* matched */ NO_GATHER(part) /* matched */ NO_GATHER(lineitem@expr_1) /* matched */ So the difference is basically between: set pg_plan_advice.advice = '[..] NESTED_LOOP_PLAIN(partsupp partsupp) NESTED_LOOP_MATERIALIZE(partsupp) [..]'; which causes wrong plan and outcome: NESTED_LOOP_MATERIALIZE(partsupp) /* matched, conflicting, failed */ and apparently proper advice like below which has better yield: set pg_plan_advice.advice = '[..] NESTED_LOOP_PLAIN(part partsupp) NESTED_LOOP_MATERIALIZE(partsupp) [..]'; which is not generated , but caused good plan, however it also prints: NESTED_LOOP_PLAIN(part) /* matched, conflicting, failed */ NESTED_LOOP_MATERIALIZE(partsupp) /* matched, conflicting */ but that seems "failed" there, seems to be untrue? Another idea is perhaps, we could have some elog(WARNING) - but not Asserts() - in assert-only enabled build that could alert us in case of duplicated entries being detected for the same ops in pg_plan_advice_explain_feedback()? -J.