Thread

  1. subselects

    Bruce Momjian <maillist@candle.pha.pa.us> — 1998-01-09T03:55:03Z

    Vadim, I know you are still thinking about subselects, but I have some
    more clarification that may help.
    
    We have to add phantom range table entries to correlated subselects so
    they will pass the parser.  We might as well add those fields to the
    target list of the subquery at the same time:
    
    	select *
    	from taba
    	where col1 = (select col2
    		      from tabb
    		      where taba.col3 = tabb.col4)
    
    becomes:
    
    	select *
    	from taba
    	where col1 = (select col2, tabb.col4 <---
    		      from tabb, taba  <---
    		      where taba.col3 = tabb.col4)
    
    We add a field to TargetEntry and RangeTblEntry to mark the fact that it
    was entered as a correlation entry:
    
    	bool	isCorrelated;
    
    Second, we need to hook the subselect to the main query.  I recommend we
    add two fields to Query for this:
    
    	Query *parentQuery;
    	List *subqueries;
    
    The parentQuery pointer is used to resolve field names in the correlated
    subquery.
    
    	select *
    	from taba
    	where col1 = (select col2, tabb.col4 <---
    		      from tabb, taba  <---
    		      where taba.col3 = tabb.col4)
    
    In the query above, the subquery can be easily parsed, and we add the
    subquery to the parsent's parentQuery list.
    
    In the parent query, to parse the WHERE clause, we create a new operator
    type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
    right side is an index to a slot in the subqueries List.
    
    We can then do the rest in the upper optimizer.
    
    -- 
    Bruce Momjian
    maillist@candle.pha.pa.us
    
    
  2. Re: subselects

    Vadim B. Mikheev <vadim@sable.krasnoyarsk.su> — 1998-01-09T15:10:06Z

    Bruce Momjian wrote:
    > 
    > Vadim, I know you are still thinking about subselects, but I have some
    > more clarification that may help.
    > 
    > We have to add phantom range table entries to correlated subselects so
    > they will pass the parser.  We might as well add those fields to the
    > target list of the subquery at the same time:
    > 
    >         select *
    >         from taba
    >         where col1 = (select col2
    >                       from tabb
    >                       where taba.col3 = tabb.col4)
    > 
    > becomes:
    > 
    >         select *
    >         from taba
    >         where col1 = (select col2, tabb.col4 <---
    >                       from tabb, taba  <---
    >                       where taba.col3 = tabb.col4)
    > 
    > We add a field to TargetEntry and RangeTblEntry to mark the fact that it
    > was entered as a correlation entry:
    > 
    >         bool    isCorrelated;
    
    No, I don't like to add anything in parser. Example:
    
            select *
            from tabA
            where col1 = (select col2
                          from tabB
                          where tabA.col3 = tabB.col4
                          and exists (select * 
                                      from tabC 
                                      where tabB.colX = tabC.colX and
                                            tabC.colY = tabA.col2)
                         )
    
    : a column of tabA is referenced in sub-subselect 
    (is it allowable by standards ?) - in this case it's better 
    to don't add tabA to 1st subselect but add tabA to second one
    and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
    this gives us 2-tables join in 1st subquery instead of 3-tables join.
    (And I'm still not sure that using temp tables is best of what can be 
    done in all cases...)
    
    Instead of using isCorrelated in TE & RTE we can add 
    
    Index varlevel;
    
    to Var node to reflect (sub)query from where this Var is come
    (where is range table to find var's relation using varno). Upmost query
    will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on.
                            ^^^                          ^^^^^^^^^^^^
    (I don't see problems with distinguishing Vars of different children
    on the same level...)
    
    > 
    > Second, we need to hook the subselect to the main query.  I recommend we
    > add two fields to Query for this:
    > 
    >         Query *parentQuery;
    >         List *subqueries;
    
    Agreed. And maybe Index queryLevel.
    
    > In the parent query, to parse the WHERE clause, we create a new operator
    > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
                                                   ^^^^^^^^^^^^^^^^^^
    No. We have to handle (a,b,c) OP (select x, y, z ...) and 
    '_a_constant_' OP (select ...) - I don't know is last in standards,
    Sybase has this.
    
    Well,
    
    typedef enum OpType
    {
        OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR
    
    + OP_EXISTS, OP_ALL, OP_ANY
    
    } OpType;
    
    typedef struct Expr
    {
        NodeTag     type;
        Oid         typeOid;        /* oid of the type of this expr */
        OpType      opType;         /* type of the op */
        Node       *oper;           /* could be Oper or Func */
        List       *args;           /* list of argument nodes */
    } Expr;
    
    OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries
               List, following your suggestion)
    
    OP_ALL, OP_ANY:
    
    oper is List of Oper nodes. We need in list because of data types of
    a, b, c (above) can be different and so Oper nodes will be different too.
    
    lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) -
    left side of subquery' operator.
    lsecond(args) is SubSelect.
    
    Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
    IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
    by parser into corresponding ANY and ALL. At the moment we can do:
    
    IN --> = ANY, NOT IN --> <> ALL
    
    but this will be "known bug": this breaks OO-nature of Postgres, because of
    operators can be overrided and '=' can mean  s o m e t h i n g (not equality).
    Example: box data type. For boxes, = means equality of _areas_ and =~
    means that boxes are the same ==> =~ ANY should be used for IN.
    
    > right side is an index to a slot in the subqueries List.
    
    Vadim
    
    
  3. Re: subselects

    Bruce Momjian <maillist@candle.pha.pa.us> — 1998-01-09T22:31:41Z

    > 
    > Bruce Momjian wrote:
    > > 
    > > Vadim, I know you are still thinking about subselects, but I have some
    > > more clarification that may help.
    > > 
    > > We have to add phantom range table entries to correlated subselects so
    > > they will pass the parser.  We might as well add those fields to the
    > > target list of the subquery at the same time:
    > > 
    > >         select *
    > >         from taba
    > >         where col1 = (select col2
    > >                       from tabb
    > >                       where taba.col3 = tabb.col4)
    > > 
    > > becomes:
    > > 
    > >         select *
    > >         from taba
    > >         where col1 = (select col2, tabb.col4 <---
    > >                       from tabb, taba  <---
    > >                       where taba.col3 = tabb.col4)
    > > 
    > > We add a field to TargetEntry and RangeTblEntry to mark the fact that it
    > > was entered as a correlation entry:
    > > 
    > >         bool    isCorrelated;
    > 
    > No, I don't like to add anything in parser. Example:
    > 
    >         select *
    >         from tabA
    >         where col1 = (select col2
    >                       from tabB
    >                       where tabA.col3 = tabB.col4
    >                       and exists (select * 
    >                                   from tabC 
    >                                   where tabB.colX = tabC.colX and
    >                                         tabC.colY = tabA.col2)
    >                      )
    > 
    > : a column of tabA is referenced in sub-subselect 
    
    This is a strange case that I don't think we need to handle in our first
    implementation.
    
    > (is it allowable by standards ?) - in this case it's better 
    > to don't add tabA to 1st subselect but add tabA to second one
    > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
    > this gives us 2-tables join in 1st subquery instead of 3-tables join.
    > (And I'm still not sure that using temp tables is best of what can be 
    > done in all cases...)
    
    I don't see any use for temp tables in subselects anymore.  After having
    implemented UNIONS, I now see how much can be done in the upper
    optimizer.  I see you just putting the subquery PLAN into the proper
    place in the plan tree, with some proper JOIN nodes for IN, NOT IN.
    
    > 
    > Instead of using isCorrelated in TE & RTE we can add 
    > 
    > Index varlevel;
    
    OK.  Sounds good.
    
    > 
    > to Var node to reflect (sub)query from where this Var is come
    > (where is range table to find var's relation using varno). Upmost query
    > will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on.
    >                         ^^^                          ^^^^^^^^^^^^
    > (I don't see problems with distinguishing Vars of different children
    > on the same level...)
    > 
    > > 
    > > Second, we need to hook the subselect to the main query.  I recommend we
    > > add two fields to Query for this:
    > > 
    > >         Query *parentQuery;
    > >         List *subqueries;
    > 
    > Agreed. And maybe Index queryLevel.
    
    Sure.  If it helps.
    
    > 
    > > In the parent query, to parse the WHERE clause, we create a new operator
    > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
    >                                                ^^^^^^^^^^^^^^^^^^
    > No. We have to handle (a,b,c) OP (select x, y, z ...) and 
    > '_a_constant_' OP (select ...) - I don't know is last in standards,
    > Sybase has this.
    
    I have never seen this in my eight years of SQL.  Perhaps we can leave
    this for later, maybe much later.
    
    > 
    > Well,
    > 
    > typedef enum OpType
    > {
    >     OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR
    > 
    > + OP_EXISTS, OP_ALL, OP_ANY
    > 
    > } OpType;
    > 
    > typedef struct Expr
    > {
    >     NodeTag     type;
    >     Oid         typeOid;        /* oid of the type of this expr */
    >     OpType      opType;         /* type of the op */
    >     Node       *oper;           /* could be Oper or Func */
    >     List       *args;           /* list of argument nodes */
    > } Expr;
    > 
    > OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries
    >            List, following your suggestion)
    > 
    > OP_ALL, OP_ANY:
    > 
    > oper is List of Oper nodes. We need in list because of data types of
    > a, b, c (above) can be different and so Oper nodes will be different too.
    > 
    > lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) -
    > left side of subquery' operator.
    > lsecond(args) is SubSelect.
    > 
    > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
    > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
    > by parser into corresponding ANY and ALL. At the moment we can do:
    > 
    > IN --> = ANY, NOT IN --> <> ALL
    > 
    > but this will be "known bug": this breaks OO-nature of Postgres, because of
    > operators can be overrided and '=' can mean  s o m e t h i n g (not equality).
    > Example: box data type. For boxes, = means equality of _areas_ and =~
    > means that boxes are the same ==> =~ ANY should be used for IN.
    
    That is interesting, to use =~ for ANY.
    
    Yes, but how many operators take a SUBQUERY as an operand.  This is a
    special case to me.
    
    I think I see where you are trying to go.  You want subselects to behave
    like any other operator, with a subselect type, and you do all the
    subselect handling in the optimizer, with special Nodes and actions.
    
    I think this may be just too much of a leap.  We have such clean query
    logic for single queries, I can't imagine having an operator that has a
    Query operand, and trying to get everything to properly handle it. 
    UNIONS were very easy to implement as a List off of Query, with some
    foreach()'s in rewrite and the high optimizer.
    
    Subselects are SQL standard, and are never going to be over-ridden by a
    user.  Same with UNION.  They want UNION, they get UNION.  They want
    Subselect, we are going to spin through the Query structure and give
    them what they want.
    
    The complexities of subselects and correlated queries and range tables
    and stuff is so bizarre that trying to get it to work inside the type
    system could be a huge project.
    
    > 
    > > right side is an index to a slot in the subqueries List.
    
    I guess the question is what can we have by February 1?
    
    I have been reading some postings, and it seems to me that subselects
    are the litmus test for many evaluators when deciding if a database
    engine is full-featured.
    
    Sorry to be so straightforward, but I want to keep hashing this around
    until we get a conclusion, so coding can start.
    
    My suggestions have been, I believe, trying to get subselects working
    with the fullest functionality by adding the least amount of code, and
    keeping the logic clean.
    
    Have you checked out the UNION code?  It is very small, but it works.  I
    think it could make a good sample for subselects.
    
    -- 
    Bruce Momjian
    maillist@candle.pha.pa.us
    
    
  4. Re: subselects

    Vadim B. Mikheev <vadim@sable.krasnoyarsk.su> — 1998-01-10T17:19:08Z

    Bruce Momjian wrote:
    > 
    > > No, I don't like to add anything in parser. Example:
    > >
    > >         select *
    > >         from tabA
    > >         where col1 = (select col2
    > >                       from tabB
    > >                       where tabA.col3 = tabB.col4
    > >                       and exists (select *
    > >                                   from tabC
    > >                                   where tabB.colX = tabC.colX and
    > >                                         tabC.colY = tabA.col2)
    > >                      )
    > >
    > > : a column of tabA is referenced in sub-subselect
    > 
    > This is a strange case that I don't think we need to handle in our first
    > implementation.
    
    I don't know is this strange case or not :)
    But I would like to know is this allowed by standards - can someone
    comment on this ?
    And I don't see problems with handling this...
    
    > 
    > > (is it allowable by standards ?) - in this case it's better
    > > to don't add tabA to 1st subselect but add tabA to second one
    > > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
    > > this gives us 2-tables join in 1st subquery instead of 3-tables join.
    > > (And I'm still not sure that using temp tables is best of what can be
    > > done in all cases...)
    > 
    > I don't see any use for temp tables in subselects anymore.  After having
    > implemented UNIONS, I now see how much can be done in the upper
    > optimizer.  I see you just putting the subquery PLAN into the proper
    > place in the plan tree, with some proper JOIN nodes for IN, NOT IN.
    
    When saying about temp tables, I meant tables created by node Material
    for subquery plan. This is one of two ways - run subquery once for all
    possible upper plan tuples and then just join result table with upper
    query. Another way is re-run subquery for each upper query tuple,
    without temp table but may be with caching results by some ways.
    Actually, there is special case - when subquery can be alternatively 
    formulated as joins, - but this is just special case.
    
    > > > In the parent query, to parse the WHERE clause, we create a new operator
    > > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
    > >                                                ^^^^^^^^^^^^^^^^^^
    > > No. We have to handle (a,b,c) OP (select x, y, z ...) and
    > > '_a_constant_' OP (select ...) - I don't know is last in standards,
    > > Sybase has this.
    > 
    > I have never seen this in my eight years of SQL.  Perhaps we can leave
    > this for later, maybe much later.
    
    Are you saying about (a, b, c) or about 'a_constant' ?
    Again, can someone comment on are they in standards or not ?
    Tom ?
    If yes then please add parser' support for them now...
    
    > > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
    > > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
    > > by parser into corresponding ANY and ALL. At the moment we can do:
    > >
    > > IN --> = ANY, NOT IN --> <> ALL
    > >
    > > but this will be "known bug": this breaks OO-nature of Postgres, because of
    > > operators can be overrided and '=' can mean  s o m e t h i n g (not equality).
    > > Example: box data type. For boxes, = means equality of _areas_ and =~
    > > means that boxes are the same ==> =~ ANY should be used for IN.
    > 
    > That is interesting, to use =~ for ANY.
    > 
    > Yes, but how many operators take a SUBQUERY as an operand.  This is a
    > special case to me.
    > 
    > I think I see where you are trying to go.  You want subselects to behave
    > like any other operator, with a subselect type, and you do all the
    > subselect handling in the optimizer, with special Nodes and actions.
    > 
    > I think this may be just too much of a leap.  We have such clean query
    > logic for single queries, I can't imagine having an operator that has a
    > Query operand, and trying to get everything to properly handle it.
    > UNIONS were very easy to implement as a List off of Query, with some
    > foreach()'s in rewrite and the high optimizer.
    > 
    > Subselects are SQL standard, and are never going to be over-ridden by a
    > user.  Same with UNION.  They want UNION, they get UNION.  They want
    > Subselect, we are going to spin through the Query structure and give
    > them what they want.
    > 
    > The complexities of subselects and correlated queries and range tables
    > and stuff is so bizarre that trying to get it to work inside the type
    > system could be a huge project.
    
    PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS),
    derived from the Berkeley Postgres database management system. While
    PostgreSQL retains the powerful object-relational data model, rich data types and
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    easy extensibility of Postgres, it replaces the PostQuel query language with an
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    extended subset of SQL.
    ^^^^^^^^^^^^^^^^^^^^^^
    
    Should we say users that subselect will work for standard data types only ?
    I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ?
    Is there difference between handling = ANY and ~ ANY ? I don't see any.
    Currently we can't get IN working properly for boxes (and may be for others too)
    and I don't like to try to resolve these problems now, but hope that someday
    we'll be able to do this. At the moment - just convert IN into = ANY and
    NOT IN into <> ALL in parser.
    
    (BTW, do you know how DISTINCT is implemented ? It doesn't use = but
    use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...)
    
    > >
    > > > right side is an index to a slot in the subqueries List.
    > 
    > I guess the question is what can we have by February 1?
    > 
    > I have been reading some postings, and it seems to me that subselects
    > are the litmus test for many evaluators when deciding if a database
    > engine is full-featured.
    > 
    > Sorry to be so straightforward, but I want to keep hashing this around
    > until we get a conclusion, so coding can start.
    > 
    > My suggestions have been, I believe, trying to get subselects working
    > with the fullest functionality by adding the least amount of code, and
    > keeping the logic clean.
    > 
    > Have you checked out the UNION code?  It is very small, but it works.  I
    > think it could make a good sample for subselects.
    
    There is big difference between subqueries and queries in UNION - 
    there are not dependences between UNION queries.
    
    Ok, opened issues:
    
    1. Is using upper query' vars in all subquery levels in standard ?
    2. Is (a, b, c) OP (subselect) in standard ?
    3. What types of expressions (Var, Const, ...) are allowed on the left
       side of operator with subquery on the right ?
    4. What types of operators should we support (=, >, ..., like, ~, ...) ?
       (My vote for all boolean operators).
    
    And - did we get consensus on presentation subqueries stuff in Query,
    Expr and Var ?
    I would like to have something done in parser near Jan 17 to get
    subqueries working by Feb 1. I vote for support of all standard
    things (1. - 3.) in parser right now - if there will be no time
    to implement something like (a, b, c) then optimizer will call
    elog(WARN) (oh, sorry, - elog(ERROR)).
    
    Vadim
    
    
  5. Re: subselects

    Thomas Lockhart <lockhart@alumni.caltech.edu> — 1998-01-10T18:01:03Z

    > > > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
    > > > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
    > > > by parser into corresponding ANY and ALL. At the moment we can do:
    > > >
    > > > IN --> = ANY, NOT IN --> <> ALL
    > > >
    > > > but this will be "known bug": this breaks OO-nature of Postgres, because of
    > > > operators can be overrided and '=' can mean  s o m e t h i n g (not equality).
    > > > Example: box data type. For boxes, = means equality of _areas_ and =~
    > > > means that boxes are the same ==> =~ ANY should be used for IN.
    > >
    > > That is interesting, to use =~ for ANY.
    
    If I understand the discussion, I would think is is fine to make an assumption about
    which operator is used to implement a subselect expression. If someone remaps an
    operator to mean something different, then they will get a different result (or a
    nonsensical one) from a subselect.
    
    I'd be happy to remap existing operators to fit into a convention which would work
    with subselects (especially if I got to help choose :).
    
    > > Subselects are SQL standard, and are never going to be over-ridden by a
    > > user.  Same with UNION.  They want UNION, they get UNION.  They want
    > > Subselect, we are going to spin through the Query structure and give
    > > them what they want.
    >
    > PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS),
    > derived from the Berkeley Postgres database management system. While
    > PostgreSQL retains the powerful object-relational data model, rich data types and
    >            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    > easy extensibility of Postgres, it replaces the PostQuel query language with an
    > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    > extended subset of SQL.
    > ^^^^^^^^^^^^^^^^^^^^^^
    >
    > Should we say users that subselect will work for standard data types only ?
    > I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ?
    > Is there difference between handling = ANY and ~ ANY ? I don't see any.
    > Currently we can't get IN working properly for boxes (and may be for others too)
    > and I don't like to try to resolve these problems now, but hope that someday
    > we'll be able to do this. At the moment - just convert IN into = ANY and
    > NOT IN into <> ALL in parser.
    >
    > (BTW, do you know how DISTINCT is implemented ? It doesn't use = but
    > use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...)
    
    ?? I didn't know that. Wouldn't we want it to eventually use "=" through a sorted
    list? That would give more consistant behavior...
    
    > > I have been reading some postings, and it seems to me that subselects
    > > are the litmus test for many evaluators when deciding if a database
    > > engine is full-featured.
    > >
    > > Sorry to be so straightforward, but I want to keep hashing this around
    > > until we get a conclusion, so coding can start.
    > >
    > > My suggestions have been, I believe, trying to get subselects working
    > > with the fullest functionality by adding the least amount of code, and
    > > keeping the logic clean.
    > >
    > > Have you checked out the UNION code?  It is very small, but it works.  I
    > > think it could make a good sample for subselects.
    >
    > There is big difference between subqueries and queries in UNION -
    > there are not dependences between UNION queries.
    >
    > Ok, opened issues:
    >
    > 1. Is using upper query' vars in all subquery levels in standard ?
    
    I'm not certain. Let me know if you do not get an answer from someone else and I will
    research it.
    
    > 2. Is (a, b, c) OP (subselect) in standard ?
    
    Yes. In fact, it _is_ the standard, and "a OP (subselect)" is a special case where
    the parens are allowed to be omitted from a one element list.
    
    > 3. What types of expressions (Var, Const, ...) are allowed on the left
    >    side of operator with subquery on the right ?
    
    I think most expressions are allowed. The "constant OP (subselect)" case you were
    asking about is just a simplified case since "(a, b, constant) OP (subselect)" where
    a and b are column references should be allowed. Of course, our optimizer could
    perhaps change this to "(a, b) OP (subselect where x = constant)", or for the first
    example "EXISTS (subselect where x = constant)".
    
    > 4. What types of operators should we support (=, >, ..., like, ~, ...) ?
    >    (My vote for all boolean operators).
    
    Sounds good. But I'll vote with Bruce (and I'll bet you already agree) that it is
    important to get an initial implementation for v6.3 which covers a little, some, or
    all of the usual SQL subselect constructs. If we have to revisit this for v6.4 then
    we will have the benefit of feedback from others in practical applications which
    always uncovers new things to consider.
    
    > And - did we get consensus on presentation subqueries stuff in Query,
    > Expr and Var ?
    > I would like to have something done in parser near Jan 17 to get
    > subqueries working by Feb 1. I vote for support of all standard
    > things (1. - 3.) in parser right now - if there will be no time
    > to implement something like (a, b, c) then optimizer will callelog(WARN) (oh,
    > sorry, - elog(ERROR)).
    
    Great. I'd like to help with the remaining parser issues; at the moment "row_expr"
    does the right thing with expression comparisions but just parses then ignores
    subselect expressions. Let me know what structures you want passed back and I'll put
    them in, or if you prefer put in the first one and I'll go through and clean up and
    add the rest.
    
                                                      - Tom
    
    
    
  6. Re: [HACKERS] Re: subselects

    Marc G. Fournier <scrappy@hub.org> — 1998-01-10T18:51:56Z

    On Sun, 11 Jan 1998, Vadim B. Mikheev wrote:
    
    > > > No, I don't like to add anything in parser. Example:
    > > >
    > > >         select *
    > > >         from tabA
    > > >         where col1 = (select col2
    > > >                       from tabB
    > > >                       where tabA.col3 = tabB.col4
    > > >                       and exists (select *
    > > >                                   from tabC
    > > >                                   where tabB.colX = tabC.colX and
    > > >                                         tabC.colY = tabA.col2)
    > > >                      )
    > > >
    > > > : a column of tabA is referenced in sub-subselect
    > > 
    > > This is a strange case that I don't think we need to handle in our first
    > > implementation.
    > 
    > I don't know is this strange case or not :)
    > But I would like to know is this allowed by standards - can someone
    > comment on this ?
    > And I don't see problems with handling this...
    
    	I don't know about "the standards", but in my mind, the above should
    work if subselects work...so what if you add a third or fourth level subselect
    to the overall query?  IMHO, the "outer most" (inner most?) subselect should
    be resolved to provide the "EXISTS" list, the the next should be resolved,
    etc...
    
    	Hell...looking at this, I'd almost think that you could use subselects to
    force a pseudo-ordering onto a large complex JOIN (ya ya, really messy though)
    
    Marc G. Fournier                                
    Systems Administrator @ hub.org 
    primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 
    
    
    
  7. Re: [HACKERS] Re: subselects

    Thomas Lockhart <lockhart@alumni.caltech.edu> — 1998-01-10T19:31:29Z

    > Are you saying about (a, b, c) or about 'a_constant' ?
    > Again, can someone comment on are they in standards or not ?
    > Tom ?
    > If yes then please add parser' support for them now...
    
    As I mentioned a few minutes ago in my last message, I parse the row descriptors and
    the subselects but for subselect expressions (e.g. "(a,b) OP (subselect)" I currently
    ignore the result. I didn't want to pass things back as lists until something in the
    backend was ready to receive them.
    
    If it is OK, I'll go ahead and start passing back a list of expressions when a row
    descriptor is present. So, what you will find is lexpr or rexpr in the A_Expr node
    being a list rather than an atomic node.
    
    Also, I can start passing back the subselect expression as the rexpr; right now the
    parser calls elog() and quits.
    
    btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called
    makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions.
    If lists are handled farther back, this routine should move to there also and the
    parser will just pass the lists. Note that some assumptions have to be made about the
    meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of
    "a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK
    to disallow those cases or to look for specific appearance of the operator to guess
    the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if
    it has "<>" or "!" then build as "or"s.
    
    Let me know what you want...
    
                                                           - Tom
    
    
    
  8. Re: [HACKERS] Re: subselects

    Thomas Lockhart <lockhart@alumni.caltech.edu> — 1998-01-10T19:55:08Z

    > I would like to have something done in parser near Jan 17 to get
    > subqueries working by Feb 1.
    
    Here are some changes to gram.y and to keywords.c which start to pass through
    subselect constructs. I won't commit until/unless you have a chance to look at it and
    agree that this is something close to the right direction to head.
    
                                                                          - Tom
    
    postgres=> create table x (i int);
    CREATE
    postgres=> insert into x values (1);
    INSERT 18121 1
    postgres=> select i from x where i = 1;
    i
    -
    1
    (1 row)
    
    postgres=> select i from x where i in (select i from x);
    ERROR:  transformExpr: does not know how to transform node 604
    postgres=> select i from x where (i, 1) in (select i, 1 from x);
    ERROR:  transformExpr: does not know how to transform node 501
    postgres=>
    
  9. Re: [HACKERS] Re: subselects

    Bruce Momjian <maillist@candle.pha.pa.us> — 1998-01-11T02:37:34Z

    > 
    > This is a multi-part message in MIME format.
    > --------------130974A8F3C8025EB9E3F25C
    > Content-Type: text/plain; charset=us-ascii
    > Content-Transfer-Encoding: 7bit
    > 
    > > I would like to have something done in parser near Jan 17 to get
    > > subqueries working by Feb 1.
    > 
    > Here are some changes to gram.y and to keywords.c which start to pass through
    > subselect constructs. I won't commit until/unless you have a chance to look at it and
    > agree that this is something close to the right direction to head.
    > 
    
    Do you realize these are the files, and not context diffs?
    
    -- 
    Bruce Momjian
    maillist@candle.pha.pa.us
    
    
  10. Re: [HACKERS] Re: subselects

    Thomas Lockhart <lockhart@alumni.caltech.edu> — 1998-01-11T03:31:33Z

    > > > I would like to have something done in parser near Jan 17 to get
    > > > subqueries working by Feb 1.
    > >
    > > Here are some changes to gram.y and to keywords.c which start to pass through
    > > subselect constructs. I won't commit until/unless you have a chance to look at it and
    > > agree that this is something close to the right direction to head.
    > >
    >
    > Do you realize these are the files, and not context diffs?
    
    Yup. Thought it would be easier for you, but probably should have sent a diff. Sorry.
    
                                                  - Tom
    
    
    
  11. Re: [HACKERS] Re: subselects

    Thomas Lockhart <lockhart@alumni.caltech.edu> — 1998-01-11T05:58:01Z

    Here are context diffs of gram.y and keywords.c; sorry about sending the full files.
    These start sending lists of arguments toward the backend from the parser to
    implement row descriptors and subselects.
    
    They should apply OK even over Bruce's recent changes...
    
                                                 - Tom
    
  12. Re: subselects

    Bruce Momjian <maillist@candle.pha.pa.us> — 1998-01-11T05:59:23Z

    > I would like to have something done in parser near Jan 17 to get
    > subqueries working by Feb 1. I vote for support of all standard
    > things (1. - 3.) in parser right now - if there will be no time
    > to implement something like (a, b, c) then optimizer will call
    > elog(WARN) (oh, sorry, - elog(ERROR)).
    
    First, let me say I am glad we are still on schedule for Feb 1.  I was
    panicking because I thought we wouldn't make it in time.
    
    
    > > > (is it allowable by standards ?) - in this case it's better
    > > > to don't add tabA to 1st subselect but add tabA to second one
    > > > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
    > > > this gives us 2-tables join in 1st subquery instead of 3-tables join.
    > > > (And I'm still not sure that using temp tables is best of what can be
    > > > done in all cases...)
    > > 
    > > I don't see any use for temp tables in subselects anymore.  After having
    > > implemented UNIONS, I now see how much can be done in the upper
    > > optimizer.  I see you just putting the subquery PLAN into the proper
    > > place in the plan tree, with some proper JOIN nodes for IN, NOT IN.
    > 
    > When saying about temp tables, I meant tables created by node Material
    > for subquery plan. This is one of two ways - run subquery once for all
    > possible upper plan tuples and then just join result table with upper
    > query. Another way is re-run subquery for each upper query tuple,
    > without temp table but may be with caching results by some ways.
    > Actually, there is special case - when subquery can be alternatively 
    > formulated as joins, - but this is just special case.
    
    This is interesting.  It really only applies for correlated subqueries,
    and certainly it may help sometimes to just evaluate the subquery for
    valid values that are going to come from the upper query than for all
    possible values.  Perhaps we can use the 'cost' value of each query to
    decide how to handle this.
    
    > 
    > > > > In the parent query, to parse the WHERE clause, we create a new operator
    > > > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
    > > >                                                ^^^^^^^^^^^^^^^^^^
    > > > No. We have to handle (a,b,c) OP (select x, y, z ...) and
    > > > '_a_constant_' OP (select ...) - I don't know is last in standards,
    > > > Sybase has this.
    > > 
    > > I have never seen this in my eight years of SQL.  Perhaps we can leave
    > > this for later, maybe much later.
    > 
    > Are you saying about (a, b, c) or about 'a_constant' ?
    > Again, can someone comment on are they in standards or not ?
    > Tom ?
    > If yes then please add parser' support for them now...
    
    OK, Thomas says it is, so we will put in as much code as we can to handle
    it.
    
    > Should we say users that subselect will work for standard data types only ?
    > I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ?
    > Is there difference between handling = ANY and ~ ANY ? I don't see any.
    > Currently we can't get IN working properly for boxes (and may be for others too)
    > and I don't like to try to resolve these problems now, but hope that someday
    > we'll be able to do this. At the moment - just convert IN into = ANY and
    > NOT IN into <> ALL in parser.
    
    OK.
    
    > 
    > (BTW, do you know how DISTINCT is implemented ? It doesn't use = but
    > use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...)
    
    I did not know that either.
    
    > There is big difference between subqueries and queries in UNION - 
    > there are not dependences between UNION queries.
    
    Yes, I know UNIONS are trivial compared to subselects.
    
    > 
    > Ok, opened issues:
    > 
    > 1. Is using upper query' vars in all subquery levels in standard ?
    > 2. Is (a, b, c) OP (subselect) in standard ?
    > 3. What types of expressions (Var, Const, ...) are allowed on the left
    >    side of operator with subquery on the right ?
    > 4. What types of operators should we support (=, >, ..., like, ~, ...) ?
    >    (My vote for all boolean operators).
    > 
    > And - did we get consensus on presentation subqueries stuff in Query,
    > Expr and Var ?
    
    OK, here are my concrete ideas on changes and structures.
    
    I think we all agreed that Query needs new fields:
    
            Query *parentQuery;
            List *subqueries;
    
    Maybe query level too, but I don't think so (see later ideas on Var).
    
    We need a new Node structure, call it Sublink:
    
    	int 	linkType	(IN, NOTIN, ANY, EXISTS, OPERATOR...)
    	Oid	operator	/* subquery must return single row */
    	List	*lefthand;	/* parent stuff */
    	Node 	*subquery;	/* represents nodes from parser */
    	Index	Subindex;	/* filled in to index Query->subqueries */
    
    Of course, the names are just suggestions.  Every time we run through
    the parsenodes of a query to create a Query* structure, when we do the
    WHERE clause, if we come upon one of these Sublink nodes (created in the
    parser), we move the supplied Query* in Sublink->subquery to a local
    List variable, and we set Subquery->subindex to equal the index of the
    new query, i.e. is it the first subquery we found, 1, or the second, 2,
    etc.
    
    After we have created the parent Query structure, we run through our
    local List variable of subquery parsenodes we created above, and add
    Query* entries to Query->subqueries.  In each subquery Query*, we set
    the parentQuery pointer.
    
    Also, when parsing the subqueries, we need to keep track of correlated
    references.  I recommend we add a field to the Var structure:
    
    	Index	sublevel;	/* range table reference:
    				   = 0  current level of query
    				   < 0  parent above this many levels
    				   > 0  index into subquery list
    				 */
    
    This way, a Var node with sublevel 0 is the current level, and is true
    in most cases.  This helps us not have to change much code.  sublevel =
    -1 means it references the range table in the parent query. sublevel =
    -2 means the parent's parent. sublevel = 2 means it references the range
    table of the second entry in Query->subqueries.  Varno and varattno are
    still meaningful.  Of course, we can't reference variables in the
    subqueries from the parent in the parser code, but Vadim may want to.
    
    When doing a Var lookup in the parser, we look in the current level
    first, but if not found, if it is a subquery, we can look at the parent
    and parent's parent to set the sublevel, varno, and varatno properly.
    
    We create no phantom range table entries in the subquery, and no phantom
    target list entries.   We can leave that all for the upper optimizer.
    
    -- 
    Bruce Momjian
    maillist@candle.pha.pa.us
    
    
  13. Re: [HACKERS] Re: subselects

    Vadim B. Mikheev <vadim@sable.krasnoyarsk.su> — 1998-01-12T05:09:20Z

    Bruce Momjian wrote:
    > 
    > We need a new Node structure, call it Sublink:
    > 
    >         int     linkType        (IN, NOTIN, ANY, EXISTS, OPERATOR...)
    >         Oid     operator        /* subquery must return single row */
    >         List    *lefthand;      /* parent stuff */
    >         Node    *subquery;      /* represents nodes from parser */
    >         Index   Subindex;       /* filled in to index Query->subqueries */
    
    Ok, I agreed that it's better to have new node and don't put subquery stuff
    into Expr node.
    
    int linkType
            is one of EXISTS, ANY, ALL, EXPR. EXPR is for the case of expression
            subqueries (following Sybase naming) which must return single row -
            (a, b, c) = (subquery).
            Note again, that there are no linkType for IN and NOTIN here. 
            User' IN and NOT IN must be converted to = ANY and <> ALL by parser.
    
    We need not in Oid operator! In all cases we need in
    
    List *oper
            list of Oper nodes for each of a, b, c, ... and operator (=, ...)
            corresponding to data type of a, b, c, ...
    
    List *lefthand
            is list of Var/Const nodes - representation of (a, b, c, ...)
    
    What is Node *subquery ?
    In optimizer we need either in Subindex (to get subquery from Query->subqueries
    when beeing in Sublink) or in Node *subquery inside Sublink itself.
    BTW, after some thought I don't see how Query->subqueries will be usefull.
    So, may be just add bool hassubqueries to Query (and Query *parentQuery)
    and use Query *subquery in Sublink, but not subindex ?
    
    > 
    > Also, when parsing the subqueries, we need to keep track of correlated
    > references.  I recommend we add a field to the Var structure:
    > 
    >         Index   sublevel;       /* range table reference:
    >                                    = 0  current level of query
    >                                    < 0  parent above this many levels
    >                                    > 0  index into subquery list
    >                                  */
    > 
    > This way, a Var node with sublevel 0 is the current level, and is true
    > in most cases.  This helps us not have to change much code.  sublevel =
    > -1 means it references the range table in the parent query. sublevel =
    > -2 means the parent's parent. sublevel = 2 means it references the range
    > table of the second entry in Query->subqueries.  Varno and varattno are
    > still meaningful.  Of course, we can't reference variables in the
    > subqueries from the parent in the parser code, but Vadim may want to.
                                                         ^^^^^^^^^^^^^^^^^
    No. So, just use sublevel >= 0: 0 - current level, 1 - one level up, ...
    sublevel is for optimizer only - executor will not use it.
    
    > 
    > When doing a Var lookup in the parser, we look in the current level
    > first, but if not found, if it is a subquery, we can look at the parent
    > and parent's parent to set the sublevel, varno, and varatno properly.
    > 
    > We create no phantom range table entries in the subquery, and no phantom
    > target list entries.   We can leave that all for the upper optimizer.
    
    Ok.
    
    Vadim
    
    
  14. Re: [HACKERS] Re: subselects

    Vadim B. Mikheev <vadim@sable.krasnoyarsk.su> — 1998-01-12T09:34:45Z

    Thomas G. Lockhart wrote:
    > 
    > btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called
    > makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions.
    > If lists are handled farther back, this routine should move to there also and the
    > parser will just pass the lists. Note that some assumptions have to be made about the
    > meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of
    > "a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK
    > to disallow those cases or to look for specific appearance of the operator to guess
    > the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if
    > it has "<>" or "!" then build as "or"s.
    
    Oh, god! I never thought about this!
    Ok, I have to agree:
    
    1. Only <, <=, =, >, >=, <> is allowed with subselects
    2. Use OR's for <>, and so - we need in bool useor in SubLink 
       for <>, <> ANY and <> ALL:
    
    typedef struct SubLink {
    	NodeTag		type;
    	int		linkType; /* EXISTS, ALL, ANY, EXPR */
    	bool		useor;    /* TRUE for <> */
    	List	        *lefthand; /* List of Var/Const nodes on the left */
    	List	        *oper;     /* List of Oper nodes */
    	Query	        *subquery; /* */
    } SubLink;
    
    Vadim
    
    
  15. Re: [HACKERS] Re: subselects

    Vadim B. Mikheev <vadim@sable.krasnoyarsk.su> — 1998-01-12T09:40:48Z

    Thomas G. Lockhart wrote:
    > 
    > btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called
    > makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions.
    > If lists are handled farther back, this routine should move to there also and the
    > parser will just pass the lists. Note that some assumptions have to be made about the
    > meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of
    > "a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK
    > to disallow those cases or to look for specific appearance of the operator to guess
    > the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if
    > it has "<>" or "!" then build as "or"s.
    
    Sorry, I forgot something: is (a, b) OP (x, y) in standard ?
    If not then I suggest to don't implement it at all and allow
    (a, b) OP [ANY|ALL] (subselect) only.
    
    Vadim
    
    
  16. Re: [HACKERS] Re: subselects

    Thomas Lockhart <lockhart@alumni.caltech.edu> — 1998-01-12T13:41:31Z

    > > btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called
    > > makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions.
    > > If lists are handled farther back, this routine should move to there also and the
    > > parser will just pass the lists. Note that some assumptions have to be made about the
    > > meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of
    > > "a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK
    > > to disallow those cases or to look for specific appearance of the operator to guess
    > > the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if
    > > it has "<>" or "!" then build as "or"s.
    >
    > Sorry, I forgot something: is (a, b) OP (x, y) in standard ?
    
    Yes. The problem wouldn't be very interesting otherwise :)
    
                                                   - Tom
    
    > If not then I suggest to don't implement it at all and allow
    > (a, b) OP [ANY|ALL] (subselect) only.
    
    
    
    
    
  17. Re: [HACKERS] Re: subselects

    Bruce Momjian <maillist@candle.pha.pa.us> — 1998-01-12T13:58:25Z

    > 
    > Thomas G. Lockhart wrote:
    > > 
    > > btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called
    > > makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions.
    > > If lists are handled farther back, this routine should move to there also and the
    > > parser will just pass the lists. Note that some assumptions have to be made about the
    > > meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of
    > > "a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK
    > > to disallow those cases or to look for specific appearance of the operator to guess
    > > the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if
    > > it has "<>" or "!" then build as "or"s.
    > 
    > Oh, god! I never thought about this!
    > Ok, I have to agree:
    > 
    > 1. Only <, <=, =, >, >=, <> is allowed with subselects
    > 2. Use OR's for <>, and so - we need in bool useor in SubLink 
    >    for <>, <> ANY and <> ALL:
    
    Ah, but this is just a problem when there are multiple fields on the
    left.
    
    > 
    > typedef struct SubLink {
    > 	NodeTag		type;
    > 	int		linkType; /* EXISTS, ALL, ANY, EXPR */
    > 	bool		useor;    /* TRUE for <> */
    > 	List	        *lefthand; /* List of Var/Const nodes on the left */
    > 	List	        *oper;     /* List of Oper nodes */
    > 	Query	        *subquery; /* */
    > } SubLink;
    
    -- 
    Bruce Momjian
    maillist@candle.pha.pa.us
    
    
  18. Re: [HACKERS] Re: subselects

    Bruce Momjian <maillist@candle.pha.pa.us> — 1998-01-12T14:23:49Z

    > 
    > Bruce Momjian wrote:
    > > 
    > > We need a new Node structure, call it Sublink:
    > > 
    > >         int     linkType        (IN, NOTIN, ANY, EXISTS, OPERATOR...)
    > >         Oid     operator        /* subquery must return single row */
    > >         List    *lefthand;      /* parent stuff */
    > >         Node    *subquery;      /* represents nodes from parser */
    > >         Index   Subindex;       /* filled in to index Query->subqueries */
    > 
    > Ok, I agreed that it's better to have new node and don't put subquery stuff
    > into Expr node.
    > 
    > int linkType
    >         is one of EXISTS, ANY, ALL, EXPR. EXPR is for the case of expression
    >         subqueries (following Sybase naming) which must return single row -
    >         (a, b, c) = (subquery).
    >         Note again, that there are no linkType for IN and NOTIN here. 
    >         User' IN and NOT IN must be converted to = ANY and <> ALL by parser.
    > 
    > We need not in Oid operator! In all cases we need in
    > 
    > List *oper
    >         list of Oper nodes for each of a, b, c, ... and operator (=, ...)
    >         corresponding to data type of a, b, c, ...
    > 
    > List *lefthand
    >         is list of Var/Const nodes - representation of (a, b, c, ...)
    
    I see, the opoids would be different for '=' if different variable types
    are used in (a,b,c) in (subselect).  Got it.
    
    > 
    > What is Node *subquery ?
    > In optimizer we need either in Subindex (to get subquery from Query->subqueries
    > when beeing in Sublink) or in Node *subquery inside Sublink itself.
    > BTW, after some thought I don't see how Query->subqueries will be usefull.
    > So, may be just add bool hassubqueries to Query (and Query *parentQuery)
    > and use Query *subquery in Sublink, but not subindex ?
    
    OK, I originally created it because the parser would have trouble
    filling in a List* field in SelectStmt while it was parsing a WHERE
    clause.  I decided to just stick the SelectStmt* into Sublink->subquery.
    
    While we are going through the parse output to fill in the Query*, I
    thought we should move the actual subquery parse output to a separate
    place, and once the Query* was completed, spin through the saved
    subquery parse list and stuff Query->subqueries with a list of Query*
    for the subqueries.  I thought this would be easier, because we would
    then have all the subqueries in a nice list that we can manage easier.
    
    In fact, we can fill Query->subqueries with SelectStmt* as we process
    the WHERE clause, then convert them to Query* at the end.
    
    If you would rather keep the subquery Query* entries in the Sublink
    structure, we can do that.  The only issue I see is that when you want
    to get to them, you have to wade through the WHERE clause to find them. 
    For example, we will have to run the subquery Query* through the rewrite
    system.  Right now, for UNION, I have a nice union List* in Query, and I
    just spin through it in postgres.c for each Union query.  If we keep the
    subquery Query* inside Sublink, we have to have some logic to go through
    and find them.
    
    If we just have an Index in Sublink to the Query->subqueries, we can use
    the nth() macro to find them quite easily.
    
    But it is up to you.  I really don't know how you are going to handle
    things like:
    
    	select *
    	from taba
    	where x = 3 and y = 5 and (z=6 or q in (select g from tabb ))
    
    My logic was to break the problem down to single queries as much as
    possible, so we would be breaking the problem up into pieces.  Whatever
    is easier for you.
    
    > 
    > > 
    > > Also, when parsing the subqueries, we need to keep track of correlated
    > > references.  I recommend we add a field to the Var structure:
    > > 
    > >         Index   sublevel;       /* range table reference:
    > >                                    = 0  current level of query
    > >                                    < 0  parent above this many levels
    > >                                    > 0  index into subquery list
    > >                                  */
    > > 
    > > This way, a Var node with sublevel 0 is the current level, and is true
    > > in most cases.  This helps us not have to change much code.  sublevel =
    > > -1 means it references the range table in the parent query. sublevel =
    > > -2 means the parent's parent. sublevel = 2 means it references the range
    > > table of the second entry in Query->subqueries.  Varno and varattno are
    > > still meaningful.  Of course, we can't reference variables in the
    > > subqueries from the parent in the parser code, but Vadim may want to.
    >                                                      ^^^^^^^^^^^^^^^^^
    > No. So, just use sublevel >= 0: 0 - current level, 1 - one level up, ...
    > sublevel is for optimizer only - executor will not use it.
    
    OK, if you don't need to reference range tables DOWN in subqueries, we
    can use positive numbers.
    
    > > When doing a Var lookup in the parser, we look in the current level
    > > first, but if not found, if it is a subquery, we can look at the parent
    > > and parent's parent to set the sublevel, varno, and varatno properly.
    > > 
    > > We create no phantom range table entries in the subquery, and no phantom
    > > target list entries.   We can leave that all for the upper optimizer.
    > 
    > Ok.
    > 
    > Vadim
    > 
    
    
    -- 
    Bruce Momjian
    maillist@candle.pha.pa.us
    
    
  19. Re: [HACKERS] Re: subselects

    Bruce Momjian <maillist@candle.pha.pa.us> — 1998-01-12T14:25:54Z

    > typedef struct SubLink {
    > 	NodeTag		type;
    > 	int		linkType; /* EXISTS, ALL, ANY, EXPR */
    > 	bool		useor;    /* TRUE for <> */
    > 	List	        *lefthand; /* List of Var/Const nodes on the left */
    > 	List	        *oper;     /* List of Oper nodes */
    > 	Query	        *subquery; /* */
    > } SubLink;
    
    If you want Query* inside Sublink, rather than a separate Query* field,
    this can be our SubLink structure.
    
    -- 
    Bruce Momjian
    maillist@candle.pha.pa.us
    
    
  20. Re: [HACKERS] Re: subselects

    Vadim B. Mikheev <vadim@sable.krasnoyarsk.su> — 1998-01-13T14:20:25Z

    Ok. I don't see how Query->subqueries could me help, but I foresee
    that Query->sublinks can do it. Could you add this ? 
    
    Bruce Momjian wrote:
    > 
    > >
    > > What is Node *subquery ?
    > > In optimizer we need either in Subindex (to get subquery from Query->subqueries
    > > when beeing in Sublink) or in Node *subquery inside Sublink itself.
    > > BTW, after some thought I don't see how Query->subqueries will be usefull.
    > > So, may be just add bool hassubqueries to Query (and Query *parentQuery)
    > > and use Query *subquery in Sublink, but not subindex ?
    > 
    > OK, I originally created it because the parser would have trouble
    > filling in a List* field in SelectStmt while it was parsing a WHERE
    > clause.  I decided to just stick the SelectStmt* into Sublink->subquery.
    > 
    > While we are going through the parse output to fill in the Query*, I
    > thought we should move the actual subquery parse output to a separate
    > place, and once the Query* was completed, spin through the saved
    > subquery parse list and stuff Query->subqueries with a list of Query*
    > for the subqueries.  I thought this would be easier, because we would
    > then have all the subqueries in a nice list that we can manage easier.
    > 
    > In fact, we can fill Query->subqueries with SelectStmt* as we process
    > the WHERE clause, then convert them to Query* at the end.
    > 
    > If you would rather keep the subquery Query* entries in the Sublink
    > structure, we can do that.  The only issue I see is that when you want
    > to get to them, you have to wade through the WHERE clause to find them.
    > For example, we will have to run the subquery Query* through the rewrite
    > system.  Right now, for UNION, I have a nice union List* in Query, and I
    > just spin through it in postgres.c for each Union query.  If we keep the
    > subquery Query* inside Sublink, we have to have some logic to go through
    > and find them.
    > 
    > If we just have an Index in Sublink to the Query->subqueries, we can use
    > the nth() macro to find them quite easily.
    > 
    > But it is up to you.  I really don't know how you are going to handle
    > things like:
    > 
    >         select *
    >         from taba
    >         where x = 3 and y = 5 and (z=6 or q in (select g from tabb ))
    
    No problems.
    
    > 
    > My logic was to break the problem down to single queries as much as
    > possible, so we would be breaking the problem up into pieces.  Whatever
    > is easier for you.
    
    Vadim
    
    
  21. Re: [HACKERS] Re: subselects

    Bruce Momjian <maillist@candle.pha.pa.us> — 1998-01-13T14:48:00Z

    > 
    > Ok. I don't see how Query->subqueries could me help, but I foresee
    > that Query->sublinks can do it. Could you add this ? 
    
    OK, so instead of moving the query out of the SubLink structure, you
    want the Query* in the Sublink structure, and a List* of SubLink
    pointers in the query structure?
    
    	Query
    	{
    		...
    		List *sublink;  /* list of pointers to Sublinks
    		...
    	}
    
    I can do that.  Let me know.
    -- 
    Bruce Momjian
    maillist@candle.pha.pa.us
    
    
  22. Re: [HACKERS] Re: subselects

    Vadim B. Mikheev <vadim@sable.krasnoyarsk.su> — 1998-01-13T14:51:45Z

    Thomas G. Lockhart wrote:
    > 
    > > > btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called
    > > > makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions.
    > > > If lists are handled farther back, this routine should move to there also and the
    > > > parser will just pass the lists. Note that some assumptions have to be made about the
    > > > meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of
    > > > "a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK
    > > > to disallow those cases or to look for specific appearance of the operator to guess
    > > > the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if
    > > > it has "<>" or "!" then build as "or"s.
    > >
    > > Sorry, I forgot something: is (a, b) OP (x, y) in standard ?
    > 
    > Yes. The problem wouldn't be very interesting otherwise :)
    
    Could we restrict OPs to standard ones (like we do for subselects) - I don't
    like assumption about ORs for operators with "!" ?
    "Assume as little as possible" is good rule...
    
    Vadim
    
    
  23. Re: [HACKERS] Re: subselects

    Thomas Lockhart <lockhart@alumni.caltech.edu> — 1998-01-13T15:24:30Z

    > > > > Note that some assumptions have to be made about the
    > > > > meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of
    > > > > "a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK
    > > > > to disallow those cases or to look for specific appearance of the operator to guess
    > > > > the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if
    > > > > it has "<>" or "!" then build as "or"s.
    >
    > Could we restrict OPs to standard ones (like we do for subselects) - I don't
    > like assumption about ORs for operators with "!" ?
    > "Assume as little as possible" is good rule...
    
    Yes, I agree. The suggestion about "!" was made without thinking very hard just to raise the
    possibility. Extending to other operators in a reliable way is an interesting problem, but is
    not required and can be explicitly disallowed for now.
    
                                                    - Tom
    
    
    
  24. Re: [HACKERS] Re: subselects

    Vadim B. Mikheev <vadim@sable.krasnoyarsk.su> — 1998-01-14T03:09:02Z

    Bruce Momjian wrote:
    > 
    > >
    > > Ok. I don't see how Query->subqueries could me help, but I foresee
    > > that Query->sublinks can do it. Could you add this ?
    > 
    > OK, so instead of moving the query out of the SubLink structure, you
    > want the Query* in the Sublink structure, and a List* of SubLink
    > pointers in the query structure?
    
    Yes.
    
    > 
    >         Query
    >         {
    >                 ...
    >                 List *sublink;  /* list of pointers to Sublinks
    >                 ...
    >         }
    > 
    > I can do that.  Let me know.
    
    Thanks!
    
    Are there any opened issues ?
    
    Vadim
    
    
  25. Re: [HACKERS] Re: subselects

    Bruce Momjian <maillist@candle.pha.pa.us> — 1998-01-15T23:18:31Z

    > 
    > Bruce Momjian wrote:
    > > 
    > > >
    > > > Ok. I don't see how Query->subqueries could me help, but I foresee
    > > > that Query->sublinks can do it. Could you add this ?
    > > 
    > > OK, so instead of moving the query out of the SubLink structure, you
    > > want the Query* in the Sublink structure, and a List* of SubLink
    > > pointers in the query structure?
    > 
    > Yes.
    > 
    > > 
    > >         Query
    > >         {
    > >                 ...
    > >                 List *sublink;  /* list of pointers to Sublinks
    > >                 ...
    > >         }
    > > 
    > > I can do that.  Let me know.
    > 
    > Thanks!
    > 
    > Are there any opened issues ?
    
    OK, what do you need me to do.  Do you want me to create the Sublink
    support stuff, fill them in in the parser, and pass them through the
    rewrite section and into the optimizer.  I will prepare a list of
    changes.
    
    
    -- 
    Bruce Momjian
    maillist@candle.pha.pa.us
    
    
  26. Re: [HACKERS] Re: subselects

    Bruce Momjian <maillist@candle.pha.pa.us> — 1998-01-15T23:26:41Z

    > typedef struct SubLink {
    > 	NodeTag		type;
    > 	int		linkType; /* EXISTS, ALL, ANY, EXPR */
    > 	bool		useor;    /* TRUE for <> */
    > 	List	        *lefthand; /* List of Var/Const nodes on the left */
    > 	List	        *oper;     /* List of Oper nodes */
    > 	Query	        *subquery; /* */
    > } SubLink;
    
    OK, we add this structure above.  During parsing, *subquery actually
    will hold Node *parsetree, not Query *.
    
    And add to Query:
    
    	bool	hasSubLinks;
    
    Also need a function to return a List* of SubLink*.  I just did a
    similar thing with Aggreg*.  And Var gets:
    
    	int uplevels;
    
    Is that it?
    
    
    -- 
    Bruce Momjian
    maillist@candle.pha.pa.us
    
    
  27. Re: [HACKERS] Re: subselects

    Vadim B. Mikheev <vadim@sable.krasnoyarsk.su> — 1998-01-16T09:34:15Z

    Bruce Momjian wrote:
    > 
    > > typedef struct SubLink {
    > >       NodeTag         type;
    > >       int             linkType; /* EXISTS, ALL, ANY, EXPR */
    > >       bool            useor;    /* TRUE for <> */
    > >       List            *lefthand; /* List of Var/Const nodes on the left */
    > >       List            *oper;     /* List of Oper nodes */
    > >       Query           *subquery; /* */
    > > } SubLink;
    > 
    > OK, we add this structure above.  During parsing, *subquery actually
    > will hold Node *parsetree, not Query *.
                ^^^^^^^^^^^^^^^
    But optimizer will get node Query here, yes ?
    
    > 
    > And add to Query:
    > 
    >         bool    hasSubLinks;
    > 
    > Also need a function to return a List* of SubLink*.  I just did a
    > similar thing with Aggreg*.  And Var gets:
    > 
    >         int uplevels;
    > 
    > Is that it?
    
    Yes.
    
    Vadim
    
    
  28. Re: [HACKERS] Re: subselects

    Vadim B. Mikheev <vadim@sable.krasnoyarsk.su> — 1998-01-16T09:37:20Z

    Bruce Momjian wrote:
    > 
    > >
    > > Are there any opened issues ?
    > 
    > OK, what do you need me to do.  Do you want me to create the Sublink
    > support stuff, fill them in in the parser, and pass them through the
    > rewrite section and into the optimizer.  I will prepare a list of
    > changes.
    
    Please do this. I'm ready to start coding of things in optimizer.
    
    Vadim