Thread

Re: subselects

Vadim B. Mikheev <vadim@sable.krasnoyarsk.su> — 1998-01-10T17:19:08Z
Bruce Momjian wrote:
> 
> > No, I don't like to add anything in parser. Example:
> >
> >         select *
> >         from tabA
> >         where col1 = (select col2
> >                       from tabB
> >                       where tabA.col3 = tabB.col4
> >                       and exists (select *
> >                                   from tabC
> >                                   where tabB.colX = tabC.colX and
> >                                         tabC.colY = tabA.col2)
> >                      )
> >
> > : a column of tabA is referenced in sub-subselect
> 
> This is a strange case that I don't think we need to handle in our first
> implementation.

I don't know is this strange case or not :)
But I would like to know is this allowed by standards - can someone
comment on this ?
And I don't see problems with handling this...

> 
> > (is it allowable by standards ?) - in this case it's better
> > to don't add tabA to 1st subselect but add tabA to second one
> > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
> > this gives us 2-tables join in 1st subquery instead of 3-tables join.
> > (And I'm still not sure that using temp tables is best of what can be
> > done in all cases...)
> 
> I don't see any use for temp tables in subselects anymore.  After having
> implemented UNIONS, I now see how much can be done in the upper
> optimizer.  I see you just putting the subquery PLAN into the proper
> place in the plan tree, with some proper JOIN nodes for IN, NOT IN.

When saying about temp tables, I meant tables created by node Material
for subquery plan. This is one of two ways - run subquery once for all
possible upper plan tuples and then just join result table with upper
query. Another way is re-run subquery for each upper query tuple,
without temp table but may be with caching results by some ways.
Actually, there is special case - when subquery can be alternatively 
formulated as joins, - but this is just special case.

> > > In the parent query, to parse the WHERE clause, we create a new operator
> > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
> >                                                ^^^^^^^^^^^^^^^^^^
> > No. We have to handle (a,b,c) OP (select x, y, z ...) and
> > '_a_constant_' OP (select ...) - I don't know is last in standards,
> > Sybase has this.
> 
> I have never seen this in my eight years of SQL.  Perhaps we can leave
> this for later, maybe much later.

Are you saying about (a, b, c) or about 'a_constant' ?
Again, can someone comment on are they in standards or not ?
Tom ?
If yes then please add parser' support for them now...

> > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
> > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
> > by parser into corresponding ANY and ALL. At the moment we can do:
> >
> > IN --> = ANY, NOT IN --> <> ALL
> >
> > but this will be "known bug": this breaks OO-nature of Postgres, because of
> > operators can be overrided and '=' can mean  s o m e t h i n g (not equality).
> > Example: box data type. For boxes, = means equality of _areas_ and =~
> > means that boxes are the same ==> =~ ANY should be used for IN.
> 
> That is interesting, to use =~ for ANY.
> 
> Yes, but how many operators take a SUBQUERY as an operand.  This is a
> special case to me.
> 
> I think I see where you are trying to go.  You want subselects to behave
> like any other operator, with a subselect type, and you do all the
> subselect handling in the optimizer, with special Nodes and actions.
> 
> I think this may be just too much of a leap.  We have such clean query
> logic for single queries, I can't imagine having an operator that has a
> Query operand, and trying to get everything to properly handle it.
> UNIONS were very easy to implement as a List off of Query, with some
> foreach()'s in rewrite and the high optimizer.
> 
> Subselects are SQL standard, and are never going to be over-ridden by a
> user.  Same with UNION.  They want UNION, they get UNION.  They want
> Subselect, we are going to spin through the Query structure and give
> them what they want.
> 
> The complexities of subselects and correlated queries and range tables
> and stuff is so bizarre that trying to get it to work inside the type
> system could be a huge project.

PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS),
derived from the Berkeley Postgres database management system. While
PostgreSQL retains the powerful object-relational data model, rich data types and
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
easy extensibility of Postgres, it replaces the PostQuel query language with an
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
extended subset of SQL.
^^^^^^^^^^^^^^^^^^^^^^

Should we say users that subselect will work for standard data types only ?
I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ?
Is there difference between handling = ANY and ~ ANY ? I don't see any.
Currently we can't get IN working properly for boxes (and may be for others too)
and I don't like to try to resolve these problems now, but hope that someday
we'll be able to do this. At the moment - just convert IN into = ANY and
NOT IN into <> ALL in parser.

(BTW, do you know how DISTINCT is implemented ? It doesn't use = but
use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...)

> >
> > > right side is an index to a slot in the subqueries List.
> 
> I guess the question is what can we have by February 1?
> 
> I have been reading some postings, and it seems to me that subselects
> are the litmus test for many evaluators when deciding if a database
> engine is full-featured.
> 
> Sorry to be so straightforward, but I want to keep hashing this around
> until we get a conclusion, so coding can start.
> 
> My suggestions have been, I believe, trying to get subselects working
> with the fullest functionality by adding the least amount of code, and
> keeping the logic clean.
> 
> Have you checked out the UNION code?  It is very small, but it works.  I
> think it could make a good sample for subselects.

There is big difference between subqueries and queries in UNION - 
there are not dependences between UNION queries.

Ok, opened issues:

1. Is using upper query' vars in all subquery levels in standard ?
2. Is (a, b, c) OP (subselect) in standard ?
3. What types of expressions (Var, Const, ...) are allowed on the left
   side of operator with subquery on the right ?
4. What types of operators should we support (=, >, ..., like, ~, ...) ?
   (My vote for all boolean operators).

And - did we get consensus on presentation subqueries stuff in Query,
Expr and Var ?
I would like to have something done in parser near Jan 17 to get
subqueries working by Feb 1. I vote for support of all standard
things (1. - 3.) in parser right now - if there will be no time
to implement something like (a, b, c) then optimizer will call
elog(WARN) (oh, sorry, - elog(ERROR)).

Vadim