Re: Surprising SeqScan of appendRel that can't contribute any rows to the result
Dmytro Astapov <dastapov@gmail.com>
From: Dmytro Astapov <dastapov@gmail.com>
To: David Rowley <dgrowleyml@gmail.com>
Cc: pgsql-bugs@lists.postgresql.org
Date: 2024-12-18T00:36:32Z
Lists: pgsql-bugs
Thank you for the very detailed answer, much appreciated! For the benefit of people who might find this in the future via search: so far the best workaround seems to be something along the lines of: alter table partB add column always_null bigint; create index on partB(always_null); And then I change vw so that instead of constant NULL I expose this column instead. Thenat instead of seqscan I get index scan on always_null which (relatively) quickly yields zero rows. On Wed, 18 Dec 2024, 00:08 David Rowley, <dgrowleyml@gmail.com> wrote: > On Wed, 18 Dec 2024 at 12:17, Dmytro Astapov <dastapov@gmail.com> wrote: > > Surprisingly, this does SeqScan on partB in NestedLoops over some_ids > with a filter `some_ids.id = NULL::bigint`: > > > > Nested Loop (cost=0.29..359.16 rows=200 width=24) > > -> Seq Scan on some_ids i (cost=0.00..1.02 rows=2 width=8) > > -> Append (cost=0.29..178.56 rows=51 width=16) > > -> Index Scan using parta_payload_idx on parta (cost=0.29..8.30 > rows=1 width=16) > > Index Cond: (payload = i.id) > > -> Seq Scan on partb (cost=0.00..170.00 rows=50 width=16) > > Filter: (i.id = NULL::bigint) > > > > At the same time `explain select * from vw where payload = 1` correctly > skips over partB entirely (the node is eliminated from execution plan), and > so does: > > explain select * from vw where payload in (1,2); > > > > However, any query that does not use explicit literal values still leads > to SeqScan access on partB, such as: > > explain select * from vw where payload in (select id from some_ids); > > explain select * from vw where payload = ANY(ARRAY(select id from > some_ids)); > > or various forms of joins > > > > Do you know if this is expected/documented, or is this a bug? > > TL;DR is it's not a bug and expected behaviour. > > We tend not to do much in terms of documentation about which > optimisations the query planner does, so it's probably not documented > anywhere aside from perhaps the source code. It might be possible for > us to eliminate the scan to "partb" for the first of the plans shown > above. However, the code that applies in your example case where the > planner does manage to eliminate the scan does so using "base" quals, > i.e. quals that are pushed down into the scan level. See > apply_child_basequals(). For the Nested Loop example, the i.id = > NULL::bigint isn't a base qual, so it does not work for that case. > When we're building paramerised paths, as per what's used in your > Nested Loop example above, we've already done the work to eliminate > non-matching union children. We don't really have any concept of "this > union child does not match for this specific parameterisation", so > we'd need to invent something to do that (which perhaps is just > removing or not adding the particular unneeded subpath from the Append > pathlist.) > > For the other cases that depend on the results from subqueries, it's > more tricky and in many cases not possible to eliminate the scans > during query planner for those cases as the planner does not have > information to know what will be returned by the subqueries. There > might be something very limited we can do in terms of looking to see > if the operator is strict or not so that we at least know that NULLs > will never match, but that might be quite a corner case that it might > not be worth the complexity to make that work. Someone might need to > write it and see how complex it is to implement before we'd know if it > was a worthwhile optimisation or not. > > David >