Thread

Re: Surprising SeqScan of appendRel that can't contribute any rows to the result

Dmytro Astapov <dastapov@gmail.com> — 2024-12-18T00:36:32Z
Thank you for the very detailed answer, much appreciated!

For the benefit of people who might find this in the future via search:  so
far the best workaround seems to be something along the lines of:

alter table partB add column always_null bigint;
create index on partB(always_null);

And then I change vw so that instead of constant NULL I expose this column
instead. Thenat instead of seqscan I get index scan on always_null which
(relatively) quickly yields zero rows.



On Wed, 18 Dec 2024, 00:08 David Rowley, <dgrowleyml@gmail.com> wrote:

> On Wed, 18 Dec 2024 at 12:17, Dmytro Astapov <dastapov@gmail.com> wrote:
> > Surprisingly, this does SeqScan on partB in NestedLoops over some_ids
> with a filter `some_ids.id = NULL::bigint`:
> >
> > Nested Loop (cost=0.29..359.16 rows=200 width=24)
> >   -> Seq Scan on some_ids i (cost=0.00..1.02 rows=2 width=8)
> >   -> Append (cost=0.29..178.56 rows=51 width=16)
> >         -> Index Scan using parta_payload_idx on parta (cost=0.29..8.30
> rows=1 width=16)
> >               Index Cond: (payload = i.id)
> >         -> Seq Scan on partb (cost=0.00..170.00 rows=50 width=16)
> >               Filter: (i.id = NULL::bigint)
> >
> > At the same time `explain select * from vw where payload = 1` correctly
> skips over partB entirely (the node is eliminated from execution plan), and
> so does:
> > explain select * from vw where payload in (1,2);
> >
> > However, any query that does not use explicit literal values still leads
> to SeqScan access on partB, such as:
> > explain select * from vw where payload in (select id from some_ids);
> > explain select * from vw where payload = ANY(ARRAY(select id from
> some_ids));
> > or various forms of joins
> >
> > Do you know if this is expected/documented, or is this a bug?
>
> TL;DR is it's not a bug and expected behaviour.
>
> We tend not to do much in terms of documentation about which
> optimisations the query planner does, so it's probably not documented
> anywhere aside from perhaps the source code.  It might be possible for
> us to eliminate the scan to "partb" for the first of the plans shown
> above. However, the code that applies in your example case where the
> planner does manage to eliminate the scan does so using "base" quals,
> i.e. quals that are pushed down into the scan level.  See
> apply_child_basequals().  For the Nested Loop example, the i.id =
> NULL::bigint isn't a base qual, so it does not work for that case.
> When we're building paramerised paths, as per what's used in your
> Nested Loop example above, we've already done the work to eliminate
> non-matching union children. We don't really have any concept of "this
> union child does not match for this specific parameterisation", so
> we'd need to invent something to do that (which perhaps is just
> removing or not adding the particular unneeded subpath from the Append
> pathlist.)
>
> For the other cases that depend on the results from subqueries, it's
> more tricky and in many cases not possible to eliminate the scans
> during query planner for those cases as the planner does not have
> information to know what will be returned by the subqueries. There
> might be something very limited we can do in terms of looking to see
> if the operator is strict or not so that we at least know that NULLs
> will never match, but that might be quite a corner case that it might
> not be worth the complexity to make that work.  Someone might need to
> write it and see how complex it is to implement before we'd know if it
> was a worthwhile optimisation or not.
>
> David
>