Thread

Re: [HACKERS] subselect

Vadim B. Mikheev <vadim@sable.krasnoyarsk.su> — 1998-01-05T19:55:57Z
Bruce Momjian wrote:
> 
> > always inner one in this case. The same for handling ALL modifier.
> > Note, that we generaly can't use aggregates here: we can't add MAX to
> > subquery in the case of > ALL (subquery), because of > ALL should return FALSE
> > if subquery returns NULL(s) but aggregates don't take NULLs into account.
> 
> OK, here are my ideas.  First, I think you have to handle subselects in
> the outer node because a subquery could have its own subquery.  Also, we

I hope that this is no matter: if results of subquery (with/without sub-subqueries)
will go into temp table then this table will be re-scanned for each outer tuple.

> now have a field in Aggreg to all us to 'usenulls'.
                                           ^^^^^^^^
 This can't help:

vac=> select * from x;
y
-
1
2
3
 <<< this is NULL
(4 rows)

vac=> select max(y) from x;
max
---
  3

==> we can't replace 

select * from A where A.a > ALL (select y from x);
                                 ^^^^^^^^^^^^^^^
           (NULL will be returned and so A.a > ALL is FALSE - this is what 
            Sybase does, is it right ?)
with

select * from A where A.a > (select max(y) from x);
                             ^^^^^^^^^^^^^^^^^^^^
just because of we lose knowledge about NULLs here.

Also, I would like to handle ANY and ALL modifiers for all bool
operators, either built-in or user-defined, for all data types -
isn't PostgreSQL OO-like RDBMS -:)

> OK, here it is.  I recommend we pass the outer and subquery through
> the parser and optimizer separately.

I don't like this. I would like to get parse-tree from parser for
entire query and let optimizer (on upper level) decide how to rewrite
parse-tree and what plans to produce and how these plans should be
merged. Note, that I don't object your methods below, but only where
to place handling of this. I don't understand why should we add
new part to the system which will do optimizer' work (parse-tree --> 
execution plan) and deal with optimizer nodes. Imho, upper optimizer
level is nice place to do this.

> 
> We parse the subquery first.  If the subquery is not correlated, it
> should parse fine.  If it is correlated, any columns we find in the
> subquery that are not already in the FROM list, we add the table to the
> subquery FROM list, and add the referenced column to the target list of
> the subquery.
> 
> When we are finished parsing the subquery, we create a catalog cache
> entry for it called 'sub1' and make its fields match the target
> list of the subquery.
> 
> In the outer query, we add 'sub1' to its target list, and change
> the subquery reference to point to the new range table.  We also add
> WHERE clauses to do any correlated joins.
...
> Here is a more complex correlated subquery:
> 
>         select *
>         from taba
>         where col1 = (select col2
>                       from tabb
>                       where taba.col3 = tabb.col4)
> 
> Here we must add 'taba' to the subquery's FROM list, and add col3 to the
> target list of the subquery.  After we parse the subquery, add 'sub1' to
> the FROM list of the outer query, change 'col1 = (subquery)' to 'col1 =
> sub1.col2', and add to the outer WHERE clause 'AND taba.col3 = sub1.col3'.
> THe optimizer will do the correlation for us.
> 
> In the optimizer, we can parse the subquery first, then the outer query,
> and then replace all 'sub1' references in the outer query to use the
> subquery plan.
> 
> I realize making merging the two plans and doing IN and NOT IN is the
                   ^^^^^^^^^^^^^^^^^^^^^
This is very easy to do! As I already said we have just change sub1
access path (SeqScan of sub1) with SeqScan of Material node with 
subquery plan.

> real challenge, but I hoped this would give us a start.

Decision about how to record subquery stuff in to parse-tree
would be very good start -:)

BTW, note that for _expression_ subqueries (which are introduced without
IN, EXISTS, ALL, ANY - this follows Sybase' naming) - as in your examples - 
we have to check that subquery returns single tuple...

Vadim