Thread

  1. Re: subselects

    Vadim B. Mikheev <vadim@sable.krasnoyarsk.su> — 1998-01-10T17:19:08Z

    Bruce Momjian wrote:
    > 
    > > No, I don't like to add anything in parser. Example:
    > >
    > >         select *
    > >         from tabA
    > >         where col1 = (select col2
    > >                       from tabB
    > >                       where tabA.col3 = tabB.col4
    > >                       and exists (select *
    > >                                   from tabC
    > >                                   where tabB.colX = tabC.colX and
    > >                                         tabC.colY = tabA.col2)
    > >                      )
    > >
    > > : a column of tabA is referenced in sub-subselect
    > 
    > This is a strange case that I don't think we need to handle in our first
    > implementation.
    
    I don't know is this strange case or not :)
    But I would like to know is this allowed by standards - can someone
    comment on this ?
    And I don't see problems with handling this...
    
    > 
    > > (is it allowable by standards ?) - in this case it's better
    > > to don't add tabA to 1st subselect but add tabA to second one
    > > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
    > > this gives us 2-tables join in 1st subquery instead of 3-tables join.
    > > (And I'm still not sure that using temp tables is best of what can be
    > > done in all cases...)
    > 
    > I don't see any use for temp tables in subselects anymore.  After having
    > implemented UNIONS, I now see how much can be done in the upper
    > optimizer.  I see you just putting the subquery PLAN into the proper
    > place in the plan tree, with some proper JOIN nodes for IN, NOT IN.
    
    When saying about temp tables, I meant tables created by node Material
    for subquery plan. This is one of two ways - run subquery once for all
    possible upper plan tuples and then just join result table with upper
    query. Another way is re-run subquery for each upper query tuple,
    without temp table but may be with caching results by some ways.
    Actually, there is special case - when subquery can be alternatively 
    formulated as joins, - but this is just special case.
    
    > > > In the parent query, to parse the WHERE clause, we create a new operator
    > > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
    > >                                                ^^^^^^^^^^^^^^^^^^
    > > No. We have to handle (a,b,c) OP (select x, y, z ...) and
    > > '_a_constant_' OP (select ...) - I don't know is last in standards,
    > > Sybase has this.
    > 
    > I have never seen this in my eight years of SQL.  Perhaps we can leave
    > this for later, maybe much later.
    
    Are you saying about (a, b, c) or about 'a_constant' ?
    Again, can someone comment on are they in standards or not ?
    Tom ?
    If yes then please add parser' support for them now...
    
    > > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
    > > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
    > > by parser into corresponding ANY and ALL. At the moment we can do:
    > >
    > > IN --> = ANY, NOT IN --> <> ALL
    > >
    > > but this will be "known bug": this breaks OO-nature of Postgres, because of
    > > operators can be overrided and '=' can mean  s o m e t h i n g (not equality).
    > > Example: box data type. For boxes, = means equality of _areas_ and =~
    > > means that boxes are the same ==> =~ ANY should be used for IN.
    > 
    > That is interesting, to use =~ for ANY.
    > 
    > Yes, but how many operators take a SUBQUERY as an operand.  This is a
    > special case to me.
    > 
    > I think I see where you are trying to go.  You want subselects to behave
    > like any other operator, with a subselect type, and you do all the
    > subselect handling in the optimizer, with special Nodes and actions.
    > 
    > I think this may be just too much of a leap.  We have such clean query
    > logic for single queries, I can't imagine having an operator that has a
    > Query operand, and trying to get everything to properly handle it.
    > UNIONS were very easy to implement as a List off of Query, with some
    > foreach()'s in rewrite and the high optimizer.
    > 
    > Subselects are SQL standard, and are never going to be over-ridden by a
    > user.  Same with UNION.  They want UNION, they get UNION.  They want
    > Subselect, we are going to spin through the Query structure and give
    > them what they want.
    > 
    > The complexities of subselects and correlated queries and range tables
    > and stuff is so bizarre that trying to get it to work inside the type
    > system could be a huge project.
    
    PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS),
    derived from the Berkeley Postgres database management system. While
    PostgreSQL retains the powerful object-relational data model, rich data types and
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    easy extensibility of Postgres, it replaces the PostQuel query language with an
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    extended subset of SQL.
    ^^^^^^^^^^^^^^^^^^^^^^
    
    Should we say users that subselect will work for standard data types only ?
    I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ?
    Is there difference between handling = ANY and ~ ANY ? I don't see any.
    Currently we can't get IN working properly for boxes (and may be for others too)
    and I don't like to try to resolve these problems now, but hope that someday
    we'll be able to do this. At the moment - just convert IN into = ANY and
    NOT IN into <> ALL in parser.
    
    (BTW, do you know how DISTINCT is implemented ? It doesn't use = but
    use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...)
    
    > >
    > > > right side is an index to a slot in the subqueries List.
    > 
    > I guess the question is what can we have by February 1?
    > 
    > I have been reading some postings, and it seems to me that subselects
    > are the litmus test for many evaluators when deciding if a database
    > engine is full-featured.
    > 
    > Sorry to be so straightforward, but I want to keep hashing this around
    > until we get a conclusion, so coding can start.
    > 
    > My suggestions have been, I believe, trying to get subselects working
    > with the fullest functionality by adding the least amount of code, and
    > keeping the logic clean.
    > 
    > Have you checked out the UNION code?  It is very small, but it works.  I
    > think it could make a good sample for subselects.
    
    There is big difference between subqueries and queries in UNION - 
    there are not dependences between UNION queries.
    
    Ok, opened issues:
    
    1. Is using upper query' vars in all subquery levels in standard ?
    2. Is (a, b, c) OP (subselect) in standard ?
    3. What types of expressions (Var, Const, ...) are allowed on the left
       side of operator with subquery on the right ?
    4. What types of operators should we support (=, >, ..., like, ~, ...) ?
       (My vote for all boolean operators).
    
    And - did we get consensus on presentation subqueries stuff in Query,
    Expr and Var ?
    I would like to have something done in parser near Jan 17 to get
    subqueries working by Feb 1. I vote for support of all standard
    things (1. - 3.) in parser right now - if there will be no time
    to implement something like (a, b, c) then optimizer will call
    elog(WARN) (oh, sorry, - elog(ERROR)).
    
    Vadim