Thread

Re: subselects

Vadim B. Mikheev <vadim@sable.krasnoyarsk.su> — 1998-01-09T15:10:06Z
Bruce Momjian wrote:
> 
> Vadim, I know you are still thinking about subselects, but I have some
> more clarification that may help.
> 
> We have to add phantom range table entries to correlated subselects so
> they will pass the parser.  We might as well add those fields to the
> target list of the subquery at the same time:
> 
>         select *
>         from taba
>         where col1 = (select col2
>                       from tabb
>                       where taba.col3 = tabb.col4)
> 
> becomes:
> 
>         select *
>         from taba
>         where col1 = (select col2, tabb.col4 <---
>                       from tabb, taba  <---
>                       where taba.col3 = tabb.col4)
> 
> We add a field to TargetEntry and RangeTblEntry to mark the fact that it
> was entered as a correlation entry:
> 
>         bool    isCorrelated;

No, I don't like to add anything in parser. Example:

        select *
        from tabA
        where col1 = (select col2
                      from tabB
                      where tabA.col3 = tabB.col4
                      and exists (select * 
                                  from tabC 
                                  where tabB.colX = tabC.colX and
                                        tabC.colY = tabA.col2)
                     )

: a column of tabA is referenced in sub-subselect 
(is it allowable by standards ?) - in this case it's better 
to don't add tabA to 1st subselect but add tabA to second one
and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
this gives us 2-tables join in 1st subquery instead of 3-tables join.
(And I'm still not sure that using temp tables is best of what can be 
done in all cases...)

Instead of using isCorrelated in TE & RTE we can add 

Index varlevel;

to Var node to reflect (sub)query from where this Var is come
(where is range table to find var's relation using varno). Upmost query
will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on.
                        ^^^                          ^^^^^^^^^^^^
(I don't see problems with distinguishing Vars of different children
on the same level...)

> 
> Second, we need to hook the subselect to the main query.  I recommend we
> add two fields to Query for this:
> 
>         Query *parentQuery;
>         List *subqueries;

Agreed. And maybe Index queryLevel.

> In the parent query, to parse the WHERE clause, we create a new operator
> type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
                                               ^^^^^^^^^^^^^^^^^^
No. We have to handle (a,b,c) OP (select x, y, z ...) and 
'_a_constant_' OP (select ...) - I don't know is last in standards,
Sybase has this.

Well,

typedef enum OpType
{
    OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR

+ OP_EXISTS, OP_ALL, OP_ANY

} OpType;

typedef struct Expr
{
    NodeTag     type;
    Oid         typeOid;        /* oid of the type of this expr */
    OpType      opType;         /* type of the op */
    Node       *oper;           /* could be Oper or Func */
    List       *args;           /* list of argument nodes */
} Expr;

OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries
           List, following your suggestion)

OP_ALL, OP_ANY:

oper is List of Oper nodes. We need in list because of data types of
a, b, c (above) can be different and so Oper nodes will be different too.

lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) -
left side of subquery' operator.
lsecond(args) is SubSelect.

Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
by parser into corresponding ANY and ALL. At the moment we can do:

IN --> = ANY, NOT IN --> <> ALL

but this will be "known bug": this breaks OO-nature of Postgres, because of
operators can be overrided and '=' can mean  s o m e t h i n g (not equality).
Example: box data type. For boxes, = means equality of _areas_ and =~
means that boxes are the same ==> =~ ANY should be used for IN.

> right side is an index to a slot in the subqueries List.

Vadim