Thread

  1. Re: subselects

    Vadim B. Mikheev <vadim@sable.krasnoyarsk.su> — 1998-01-09T15:10:06Z

    Bruce Momjian wrote:
    > 
    > Vadim, I know you are still thinking about subselects, but I have some
    > more clarification that may help.
    > 
    > We have to add phantom range table entries to correlated subselects so
    > they will pass the parser.  We might as well add those fields to the
    > target list of the subquery at the same time:
    > 
    >         select *
    >         from taba
    >         where col1 = (select col2
    >                       from tabb
    >                       where taba.col3 = tabb.col4)
    > 
    > becomes:
    > 
    >         select *
    >         from taba
    >         where col1 = (select col2, tabb.col4 <---
    >                       from tabb, taba  <---
    >                       where taba.col3 = tabb.col4)
    > 
    > We add a field to TargetEntry and RangeTblEntry to mark the fact that it
    > was entered as a correlation entry:
    > 
    >         bool    isCorrelated;
    
    No, I don't like to add anything in parser. Example:
    
            select *
            from tabA
            where col1 = (select col2
                          from tabB
                          where tabA.col3 = tabB.col4
                          and exists (select * 
                                      from tabC 
                                      where tabB.colX = tabC.colX and
                                            tabC.colY = tabA.col2)
                         )
    
    : a column of tabA is referenced in sub-subselect 
    (is it allowable by standards ?) - in this case it's better 
    to don't add tabA to 1st subselect but add tabA to second one
    and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
    this gives us 2-tables join in 1st subquery instead of 3-tables join.
    (And I'm still not sure that using temp tables is best of what can be 
    done in all cases...)
    
    Instead of using isCorrelated in TE & RTE we can add 
    
    Index varlevel;
    
    to Var node to reflect (sub)query from where this Var is come
    (where is range table to find var's relation using varno). Upmost query
    will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on.
                            ^^^                          ^^^^^^^^^^^^
    (I don't see problems with distinguishing Vars of different children
    on the same level...)
    
    > 
    > Second, we need to hook the subselect to the main query.  I recommend we
    > add two fields to Query for this:
    > 
    >         Query *parentQuery;
    >         List *subqueries;
    
    Agreed. And maybe Index queryLevel.
    
    > In the parent query, to parse the WHERE clause, we create a new operator
    > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
                                                   ^^^^^^^^^^^^^^^^^^
    No. We have to handle (a,b,c) OP (select x, y, z ...) and 
    '_a_constant_' OP (select ...) - I don't know is last in standards,
    Sybase has this.
    
    Well,
    
    typedef enum OpType
    {
        OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR
    
    + OP_EXISTS, OP_ALL, OP_ANY
    
    } OpType;
    
    typedef struct Expr
    {
        NodeTag     type;
        Oid         typeOid;        /* oid of the type of this expr */
        OpType      opType;         /* type of the op */
        Node       *oper;           /* could be Oper or Func */
        List       *args;           /* list of argument nodes */
    } Expr;
    
    OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries
               List, following your suggestion)
    
    OP_ALL, OP_ANY:
    
    oper is List of Oper nodes. We need in list because of data types of
    a, b, c (above) can be different and so Oper nodes will be different too.
    
    lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) -
    left side of subquery' operator.
    lsecond(args) is SubSelect.
    
    Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
    IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
    by parser into corresponding ANY and ALL. At the moment we can do:
    
    IN --> = ANY, NOT IN --> <> ALL
    
    but this will be "known bug": this breaks OO-nature of Postgres, because of
    operators can be overrided and '=' can mean  s o m e t h i n g (not equality).
    Example: box data type. For boxes, = means equality of _areas_ and =~
    means that boxes are the same ==> =~ ANY should be used for IN.
    
    > right side is an index to a slot in the subqueries List.
    
    Vadim