Thread
-
Re: BUG #18234: Nested Loop joint strategy is ignored for a tiny table joined with UNION ALL of two filtered parts
Dmytro Astapov <dastapov@gmail.com> — 2023-12-08T01:19:09Z
Hi! To quickly address the valid point about non-self-contained bug reports, here is the export of the schema from the DB Fiddle: /* Huge inheritance-partitioned table */ create table huge(id bigint, filter_out bool); create table huge_partition1(id bigint, filter_out bool); create table huge_partition2(id bigint, filter_out bool); alter table huge_partition1 inherit huge; alter table huge_partition2 inherit huge; insert into huge_partition1(id, filter_out) select id, mod(id,7) = 0 from generate_series(1,100000) id; insert into huge_partition2(id, filter_out) select id, mod(id,7) = 0 from generate_series(1,100000) id; create index on huge_partition1(id); create index on huge_partition2(id); analyze huge_partition1; analyze huge_partition2; /* Medium inheritance-partitioned table (same structure, but 100x smaller) */ create table medium(id bigint, filter_out bool); create table medium_partition1(id bigint, filter_out bool); create table medium_partition2(id bigint, filter_out bool); alter table medium_partition1 inherit medium; alter table medium_partition2 inherit medium; insert into medium_partition1(id, filter_out) select id, mod(id,7) = 0 from generate_series(1,1000) id; insert into medium_partition2(id, filter_out) select id, mod(id,7) = 0 from generate_series(1,1000) id; create index on medium_partition1(id); create index on medium_partition2(id); analyze medium_partition1; analyze medium_partition2; /* Tiny table of just 5 values */ create table tiny(id bigint); insert into tiny(id) values (100),(200),(300),(400),(500); analyze tiny; /* Views that UNION ALL all non-filtered rows of HUGE and MEDIUM */ create view vw_broken as select id from huge where filter_out union all select id from medium where filter_out; create view vw_not_broken as select id,filter_out from ( select id,filter_out from huge union all select id,filter_out from medium ) q where filter_out; /* This query does NOT use nested loops unexpectedly */ explain select * from tiny join vw_broken on tiny.id = vw_broken.id; QUERY PLAN --------------------------------------------------------------------------------------------- Hash Join (cost=1.11..3798.30 rows=712 width=16) Hash Cond: (huge.id = tiny.id) -> Append (cost=0.00..3683.32 rows=28466 width=8) -> Append (cost=0.00..3222.91 rows=28181 width=8) -> Seq Scan on huge huge_1 (cost=0.00..0.00 rows=1 width=8) Filter: filter_out -> Seq Scan on huge_partition1 huge_2 (cost=0.00..1541.00 rows=13987 width=8) Filter: filter_out -> Seq Scan on huge_partition2 huge_3 (cost=0.00..1541.00 rows=14193 width=8) Filter: filter_out -> Append (cost=0.00..33.42 rows=285 width=8) -> Seq Scan on medium medium_1 (cost=0.00..0.00 rows=1 width=8) Filter: filter_out -> Seq Scan on medium_partition1 medium_2 (cost=0.00..16.00 rows=142 width=8) Filter: filter_out -> Seq Scan on medium_partition2 medium_3 (cost=0.00..16.00 rows=142 width=8) Filter: filter_out -> Hash (cost=1.05..1.05 rows=5 width=8) -> Seq Scan on tiny (cost=0.00..1.05 rows=5 width=8) /* This query DOES use nested loops as expected */ explain select * from tiny join vw_not_broken on tiny.id = vw_not_broken.id; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..151.53 rows=712 width=17) -> Seq Scan on tiny (cost=0.00..1.05 rows=5 width=8) -> Append (cost=0.00..30.04 rows=6 width=9) -> Seq Scan on huge (cost=0.00..0.00 rows=1 width=9) Filter: (filter_out AND (tiny.id = id)) -> Index Scan using huge_partition1_id_idx on huge_partition1 huge_1 (cost=0.29..8.31 rows=1 width=9) Index Cond: (id = tiny.id) Filter: filter_out -> Index Scan using huge_partition2_id_idx on huge_partition2 huge_2 (cost=0.29..8.31 rows=1 width=9) Index Cond: (id = tiny.id) Filter: filter_out -> Seq Scan on medium (cost=0.00..0.00 rows=1 width=9) Filter: (filter_out AND (tiny.id = id)) -> Index Scan using medium_partition1_id_idx on medium_partition1 medium_1 (cost=0.28..6.69 rows=1 width=9) Index Cond: (id = tiny.id) Filter: filter_out -> Index Scan using medium_partition2_id_idx on medium_partition2 medium_2 (cost=0.28..6.69 rows=1 width=9) Index Cond: (id = tiny.id) Filter: filter_out -- Best regards, Dmytro On Thu, Dec 7, 2023 at 9:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > PG Bug reporting form <noreply@postgresql.org> writes: > > Summary of the issue: for a (5-row recordset) JOIN (massive partitioned > > recordset indexed by id) USING (id), the (Nested Loop over 5 values) > > strategy is completely ignored, and Hash Join or Merge Join is done > instead, > > which does SeqScan over the "massive recordset". > > > Reproduction in DB Fiddle: > > https://www.db-fiddle.com/f/sJUUWNgW7pqPWcJwihVoj5/1 (this demonstrates > both > > the bad behaviour and a way to work around it) > > We are generally not too happy with non-self-contained bug reports. > Once that DB Fiddle entry disappears, this bug report will be useless. > However ... > > > 1)The massive recordset on the right side of the JOIN must come from the > > UNION ALL of two parts, both of which have a filter, like this view in my > > reproduction: > > > create view vw_broken as > > select id from huge where filter_out > > union all > > select id from medium where filter_out; > > I suspect the WHERE clauses trigger the problem because the resulting > sub-selects can't be pulled up to become an "appendrel", per > is_safe_append_member: > > * It's only safe to pull up the child if its jointree contains exactly > * one RTE, else the AppendRelInfo data structure breaks. The one base > RTE > * could be buried in several levels of FromExpr, however. Also, if > the > * child's jointree is completely empty, we can pull up because > * pull_up_simple_subquery will insert a single RTE_RESULT RTE instead. > * > * Also, the child can't have any WHERE quals because there's no place > to > * put them in an appendrel. (This is a bit annoying...) > > That means the sub-selects will be planned independently and there's > no chance to consider the nestloop-with-inner-indexscan plan you are > hoping for. > > This is a longstanding wart, but improving matters would require some > fairly painstaking work. The "appendrel" mechanism is core to both > traditional inheritance and partitioning; I don't recommend trying > to blow it up and start over. I vaguely recall previous discussions > that identified some semantic issues with trying to just attach > WHERE clauses to appendrel members, but it was a long time ago and > the details escape me. > > regards, tom lane >