Re: Reduce "Var IS [NOT] NULL" quals during constant folding

Andrei Lepikhov <lepihov@gmail.com>

From: Andrei Lepikhov <lepihov@gmail.com>
To: Richard Guo <guofenglinux@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>, Robert Haas <robertmhaas@gmail.com>, Peter Eisentraut <peter@eisentraut.org>, David Rowley <dgrowleyml@gmail.com>, Tender Wang <tndrwang@gmail.com>, Pg Hackers <pgsql-hackers@lists.postgresql.org>
Date: 2025-07-03T09:08:54Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Fix misuse of Relids for storing attribute numbers

  2. Reduce "Var IS [NOT] NULL" quals during constant folding

  3. Centralize collection of catalog info needed early in the planner

  4. Expand virtual generated columns before sublink pull-up

  5. Expand virtual generated columns in the planner

On 3/7/2025 02:30, Richard Guo wrote:
> On Wed, Jul 2, 2025 at 6:44 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
>> I apologise for the confusion in my previous message. I am not
>> suggesting that we postpone this. Instead, I would like an explanation
>> of why you believe that accessing the table statistics earlier could
>> negatively impact planner performance. As I mentioned before, I have
>> only envisioned rare instances where join eliminations may reduce the
>> number of relations and clause evaluations resulting in a constant.
> 
> I wonder how you arrived at the conclusion that these cases are rare.
> If they truly are, then why have we invested so much effort in
> optimizing for them?
There is no direct connection between effort and frequency; it primarily 
depends on personal desire. As you might find, much of the effort goes 
into convincing the community.
These specific cases should be rare from the Postgres perspective, the 
planner's code remains simple based on the assumption that crafting the 
appropriate query is the user's responsibility.

> 
> I also wonder why you think we should collect all catalog information
> at the very early stage of the planner, given that most of it is only
> used much later -- after RelOptInfos have been created.  If the goal
> is to avoid redundant catalog retrieval for the same relation in
> get_relation_info(), perhaps adding a caching mechanism within that
> function would be a more targeted solution.  I don't see a strong
> reason for moving get_relation_info() to the very beginning of the
> planner.
This indicates that there is still room for further exploration and 
discussion. For starters, the 'Redundant NullTest' issue is not the only 
concern. Additionally, Postgres processes pull-up transformation blindly 
without considering the cost model. However, each pull-up has its corner 
case, and in practice, we often see new complaints arise after a new 
pull-up technique is committed. One possible solution I envision could 
be to examine indexes and/or make raw initial estimations to avoid 
problematic pull-up cases.

-- 
regards, Andrei Lepikhov