Re: Reduce "Var IS [NOT] NULL" quals during constant folding

Chengpeng Yan <chengpeng_yan@outlook.com>

From: Chengpeng Yan <chengpeng_yan@Outlook.com>
To: Richard Guo <guofenglinux@gmail.com>
Cc: Robert Haas <robertmhaas@gmail.com>, Tom Lane <tgl@sss.pgh.pa.us>, David Rowley <dgrowleyml@gmail.com>, Tender Wang <tndrwang@gmail.com>, "pgsql-hackers@lists.postgresql.org" <pgsql-hackers@lists.postgresql.org>
Date: 2025-05-03T10:48:44Z
Lists: pgsql-hackers

> On May 1, 2025, at 16:33, Richard Guo <guofenglinux@gmail.com> wrote:
> 
> Here is the patchset that implements this optimization.  0001 moves
> the expansion of virtual generated columns to occur before sublink
> pull-up.  0002 introduces a new function, preprocess_relation_rtes,
> which scans the rangetable for relation RTEs and performs inh flag
> updates and virtual generated column expansion in a single loop, so
> that only one table_open/table_close call is required for each
> relation.  0003 collects NOT NULL attribute information for each
> relation within the same loop, stores it in a relation OID based hash
> table, and uses this information to reduce NullTest quals during
> constant folding.
> 
> I think the code now more closely resembles the phase 1 and phase 2
> described earlier: it collects all required early-stage catalog
> information within a single loop over the rangetable, allowing each
> relation to be opened and closed only once.  It also avoids the
> has_subclass() call along the way.
> 
> Thanks
> Richard
> <v4-0001-Expand-virtual-generated-columns-before-sublink-p.patch><v4-0002-Centralize-collection-of-catalog-info-needed-earl.patch><v4-0003-Reduce-Var-IS-NOT-NULL-quals-during-constant-fold.patch>

Hi,

I've been following the V4 patches (focusing on 1 and 2 for now): Patch 2's preprocess_relation_rtes is a nice improvement for efficiently gathering early catalog info like inh and attgenerated definitions in one pass.

However, Patch 1 needs to add expansion calls inside specific pull-up functions (like convert_EXISTS_sublink_to_join) because the main expansion work was moved before pull_up_sublinks.

Could we perhaps simplify this? What if preprocess_relation_rtes only collected the attgenerated definitions (storing them, maybe in a hashtable like planned for attnotnull in Patch 3), but didn't perform the actual expansion (Var replacement)?

Then, we could perform the actual expansion (Var replacement) in a separate, single, global step later on. Perhaps after pull_up_sublinks (closer to the original timing), or maybe even later still, for instance after flatten_simple_union_all, once the main query structure including pulled-up subqueries/links has stabilized? A unified expansion after the major structural changes seems cleaner. I'm not sure where is the better position now.

This might avoid the need for the extra expansion calls within convert_EXISTS_sublink_to_join, etc., keeping the information gathering separate from the expression transformation and potentially making the overall flow a bit cleaner.

Any thoughts?

Thanks,

Chengpeng Yan