Thread

  1. Re: BUG #18234: Nested Loop joint strategy is ignored for a tiny table joined with UNION ALL of two filtered parts

    Dmytro Astapov <dastapov@gmail.com> — 2023-12-08T01:19:09Z

    Hi!
    
    To quickly address the valid point about non-self-contained bug reports,
    here is the export of the schema from the DB Fiddle:
    
    /* Huge inheritance-partitioned table */
    create table huge(id bigint, filter_out bool);
    create table huge_partition1(id bigint, filter_out bool);
    create table huge_partition2(id bigint, filter_out bool);
    alter table huge_partition1 inherit huge;
    alter table huge_partition2 inherit huge;
    
    insert into huge_partition1(id, filter_out)
      select id, mod(id,7) = 0
      from generate_series(1,100000) id;
    
    insert into huge_partition2(id, filter_out)
      select id, mod(id,7) = 0
      from generate_series(1,100000) id;
    
    create index on huge_partition1(id);
    create index on huge_partition2(id);
    analyze huge_partition1;
    analyze huge_partition2;
    
    /* Medium inheritance-partitioned table (same structure, but 100x smaller)
    */
    create table medium(id bigint, filter_out bool);
    create table medium_partition1(id bigint, filter_out bool);
    create table medium_partition2(id bigint, filter_out bool);
    alter table medium_partition1 inherit medium;
    alter table medium_partition2 inherit medium;
    
    insert into medium_partition1(id, filter_out)
      select id, mod(id,7) = 0
      from generate_series(1,1000) id;
    
    insert into medium_partition2(id, filter_out)
      select id, mod(id,7) = 0
      from generate_series(1,1000) id;
    
    create index on medium_partition1(id);
    create index on medium_partition2(id);
    analyze medium_partition1;
    analyze medium_partition2;
    
    /* Tiny table of just 5 values */
    create table tiny(id bigint);
    insert into tiny(id) values (100),(200),(300),(400),(500);
    analyze tiny;
    
    /* Views that UNION ALL all non-filtered rows of HUGE and MEDIUM */
    create view vw_broken as
      select id from huge where filter_out
      union all
      select id from medium where filter_out;
    
    create view vw_not_broken as
      select id,filter_out from (
        select id,filter_out from huge
        union all
        select id,filter_out from medium
      ) q
      where filter_out;
    
    /* This query does NOT use nested loops unexpectedly */
    explain select * from tiny join vw_broken on tiny.id = vw_broken.id;
    
    QUERY PLAN
    ---------------------------------------------------------------------------------------------
    Hash Join  (cost=1.11..3798.30 rows=712 width=16)
      Hash Cond: (huge.id = tiny.id)
      ->  Append  (cost=0.00..3683.32 rows=28466 width=8)
            ->  Append  (cost=0.00..3222.91 rows=28181 width=8)
                  ->  Seq Scan on huge huge_1  (cost=0.00..0.00 rows=1 width=8)
                        Filter: filter_out
                  ->  Seq Scan on huge_partition1 huge_2  (cost=0.00..1541.00
    rows=13987 width=8)
                        Filter: filter_out
                  ->  Seq Scan on huge_partition2 huge_3  (cost=0.00..1541.00
    rows=14193 width=8)
                        Filter: filter_out
            ->  Append  (cost=0.00..33.42 rows=285 width=8)
                  ->  Seq Scan on medium medium_1  (cost=0.00..0.00 rows=1
    width=8)
                        Filter: filter_out
                  ->  Seq Scan on medium_partition1 medium_2  (cost=0.00..16.00
    rows=142 width=8)
                        Filter: filter_out
                  ->  Seq Scan on medium_partition2 medium_3  (cost=0.00..16.00
    rows=142 width=8)
                        Filter: filter_out
      ->  Hash  (cost=1.05..1.05 rows=5 width=8)
            ->  Seq Scan on tiny  (cost=0.00..1.05 rows=5 width=8)
    
    
    /* This query DOES use nested loops as expected */
    explain select * from tiny join vw_not_broken on tiny.id = vw_not_broken.id;
    
    QUERY PLAN
    ---------------------------------------------------------------------------------------------------------------------
    Nested Loop  (cost=0.00..151.53 rows=712 width=17)
      ->  Seq Scan on tiny  (cost=0.00..1.05 rows=5 width=8)
      ->  Append  (cost=0.00..30.04 rows=6 width=9)
            ->  Seq Scan on huge  (cost=0.00..0.00 rows=1 width=9)
                  Filter: (filter_out AND (tiny.id = id))
            ->  Index Scan using huge_partition1_id_idx on huge_partition1
    huge_1  (cost=0.29..8.31 rows=1 width=9)
                  Index Cond: (id = tiny.id)
                  Filter: filter_out
            ->  Index Scan using huge_partition2_id_idx on huge_partition2
    huge_2  (cost=0.29..8.31 rows=1 width=9)
                  Index Cond: (id = tiny.id)
                  Filter: filter_out
            ->  Seq Scan on medium  (cost=0.00..0.00 rows=1 width=9)
                  Filter: (filter_out AND (tiny.id = id))
            ->  Index Scan using medium_partition1_id_idx on medium_partition1
    medium_1  (cost=0.28..6.69 rows=1 width=9)
                  Index Cond: (id = tiny.id)
                  Filter: filter_out
            ->  Index Scan using medium_partition2_id_idx on medium_partition2
    medium_2  (cost=0.28..6.69 rows=1 width=9)
                  Index Cond: (id = tiny.id)
                  Filter: filter_out
    
    --
    Best regards, Dmytro
    
    
    On Thu, Dec 7, 2023 at 9:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    
    > PG Bug reporting form <noreply@postgresql.org> writes:
    > > Summary of the issue: for a (5-row recordset) JOIN (massive partitioned
    > > recordset indexed by id) USING (id), the (Nested Loop over 5 values)
    > > strategy is completely ignored, and Hash Join or Merge Join is done
    > instead,
    > > which does SeqScan over the "massive recordset".
    >
    > > Reproduction in DB Fiddle:
    > > https://www.db-fiddle.com/f/sJUUWNgW7pqPWcJwihVoj5/1 (this demonstrates
    > both
    > > the bad behaviour and a way to work around it)
    >
    > We are generally not too happy with non-self-contained bug reports.
    > Once that DB Fiddle entry disappears, this bug report will be useless.
    > However ...
    >
    > > 1)The massive recordset on the right side of the JOIN must come from the
    > > UNION ALL of two parts, both of which have a filter, like this view in my
    > > reproduction:
    >
    > > create view vw_broken as
    > >   select id from huge where filter_out
    > >   union all
    > >   select id from medium where filter_out;
    >
    > I suspect the WHERE clauses trigger the problem because the resulting
    > sub-selects can't be pulled up to become an "appendrel", per
    > is_safe_append_member:
    >
    >      * It's only safe to pull up the child if its jointree contains exactly
    >      * one RTE, else the AppendRelInfo data structure breaks. The one base
    > RTE
    >      * could be buried in several levels of FromExpr, however.  Also, if
    > the
    >      * child's jointree is completely empty, we can pull up because
    >      * pull_up_simple_subquery will insert a single RTE_RESULT RTE instead.
    >      *
    >      * Also, the child can't have any WHERE quals because there's no place
    > to
    >      * put them in an appendrel.  (This is a bit annoying...)
    >
    > That means the sub-selects will be planned independently and there's
    > no chance to consider the nestloop-with-inner-indexscan plan you are
    > hoping for.
    >
    > This is a longstanding wart, but improving matters would require some
    > fairly painstaking work.  The "appendrel" mechanism is core to both
    > traditional inheritance and partitioning; I don't recommend trying
    > to blow it up and start over.  I vaguely recall previous discussions
    > that identified some semantic issues with trying to just attach
    > WHERE clauses to appendrel members, but it was a long time ago and
    > the details escape me.
    >
    >                         regards, tom lane
    >