Thread

  1. Re: [HACKERS] subselect

    Vadim B. Mikheev <vadim@sable.krasnoyarsk.su> — 1998-01-05T19:55:57Z

    Bruce Momjian wrote:
    > 
    > > always inner one in this case. The same for handling ALL modifier.
    > > Note, that we generaly can't use aggregates here: we can't add MAX to
    > > subquery in the case of > ALL (subquery), because of > ALL should return FALSE
    > > if subquery returns NULL(s) but aggregates don't take NULLs into account.
    > 
    > OK, here are my ideas.  First, I think you have to handle subselects in
    > the outer node because a subquery could have its own subquery.  Also, we
    
    I hope that this is no matter: if results of subquery (with/without sub-subqueries)
    will go into temp table then this table will be re-scanned for each outer tuple.
    
    > now have a field in Aggreg to all us to 'usenulls'.
                                               ^^^^^^^^
     This can't help:
    
    vac=> select * from x;
    y
    -
    1
    2
    3
     <<< this is NULL
    (4 rows)
    
    vac=> select max(y) from x;
    max
    ---
      3
    
    ==> we can't replace 
    
    select * from A where A.a > ALL (select y from x);
                                     ^^^^^^^^^^^^^^^
               (NULL will be returned and so A.a > ALL is FALSE - this is what 
                Sybase does, is it right ?)
    with
    
    select * from A where A.a > (select max(y) from x);
                                 ^^^^^^^^^^^^^^^^^^^^
    just because of we lose knowledge about NULLs here.
    
    Also, I would like to handle ANY and ALL modifiers for all bool
    operators, either built-in or user-defined, for all data types -
    isn't PostgreSQL OO-like RDBMS -:)
    
    > OK, here it is.  I recommend we pass the outer and subquery through
    > the parser and optimizer separately.
    
    I don't like this. I would like to get parse-tree from parser for
    entire query and let optimizer (on upper level) decide how to rewrite
    parse-tree and what plans to produce and how these plans should be
    merged. Note, that I don't object your methods below, but only where
    to place handling of this. I don't understand why should we add
    new part to the system which will do optimizer' work (parse-tree --> 
    execution plan) and deal with optimizer nodes. Imho, upper optimizer
    level is nice place to do this.
    
    > 
    > We parse the subquery first.  If the subquery is not correlated, it
    > should parse fine.  If it is correlated, any columns we find in the
    > subquery that are not already in the FROM list, we add the table to the
    > subquery FROM list, and add the referenced column to the target list of
    > the subquery.
    > 
    > When we are finished parsing the subquery, we create a catalog cache
    > entry for it called 'sub1' and make its fields match the target
    > list of the subquery.
    > 
    > In the outer query, we add 'sub1' to its target list, and change
    > the subquery reference to point to the new range table.  We also add
    > WHERE clauses to do any correlated joins.
    ...
    > Here is a more complex correlated subquery:
    > 
    >         select *
    >         from taba
    >         where col1 = (select col2
    >                       from tabb
    >                       where taba.col3 = tabb.col4)
    > 
    > Here we must add 'taba' to the subquery's FROM list, and add col3 to the
    > target list of the subquery.  After we parse the subquery, add 'sub1' to
    > the FROM list of the outer query, change 'col1 = (subquery)' to 'col1 =
    > sub1.col2', and add to the outer WHERE clause 'AND taba.col3 = sub1.col3'.
    > THe optimizer will do the correlation for us.
    > 
    > In the optimizer, we can parse the subquery first, then the outer query,
    > and then replace all 'sub1' references in the outer query to use the
    > subquery plan.
    > 
    > I realize making merging the two plans and doing IN and NOT IN is the
                       ^^^^^^^^^^^^^^^^^^^^^
    This is very easy to do! As I already said we have just change sub1
    access path (SeqScan of sub1) with SeqScan of Material node with 
    subquery plan.
    
    > real challenge, but I hoped this would give us a start.
    
    Decision about how to record subquery stuff in to parse-tree
    would be very good start -:)
    
    BTW, note that for _expression_ subqueries (which are introduced without
    IN, EXISTS, ALL, ANY - this follows Sybase' naming) - as in your examples - 
    we have to check that subquery returns single tuple...
    
    Vadim