Thread

Re: [HACKERS] Re: [SQL] Oddities with NULL and GROUP BY

secret <secret@kearneydev.com> — 1999-05-19T15:28:42Z
Thomas Lockhart wrote:

> > > The Pratical SQL Handbook at page 171 says:
> > > Since nulls represent "the great unknown", there is no way to know
> > > whether one null is equal to any other null. Each unknown value
> > > may or may not be different from another.
>
> Although I've noticed some questionable statements quoted from this
> book, this looks good...
>
> > > Thus: NULL!=NULL but on GROUP BY it is considered as NULL=NULL.
> > This is something I have complained about time and again. It is time
> > something is changed about it, otherwise Postgres will NEVER be a
> > standard-compliant RDBMS.
>
> Postgres conforms to SQL92 in this regard. Date and Darwen, "A Guide
> to the SQL Standard", 3rd ed., are explicit about this near the top of
> page 249:
>
> Duplicates are relevant to the ... GROUP BY ... operations ...
> ... GROUP BY groups rows together on the basis of duplicate values in
> the set of grouping columns (and those sets of grouping column values
> can be regarded as "rows" for present purposes). The point is,
> however, the definition of duplicate rows requires some refinement in
> the presence of nulls. Let "left" and "right" be as defined
> (previously). Then "left" and "right" are defined to be "duplicates"
> of one another if and only if, for all "i" in the range 1 to "n",
> either "left_i" = "right_i" is TRUE, or "left_i" and "right_i" are
> both null.
>
> There is a single exception to Postgres' SQL92 conformance wrt NULLs
> afaik, involving DISTINCT column constraints which I discuss below.
>
> > > However, if the grouping column contains more than one null,
> > > all of them are put into a single group.
> > > Thus: NULL!=NULL but on GROUP BY it is considered as NULL=NULL.
> > The SQL92 text says:
> >      A null value is an implementation-dependent special value that
> >      is distinct from all non-null values of the associated data type.
> >      There is effectively only one null value and that value is a member
> >      of every SQL data type. There is no <literal> for a null value,
> >      although the keyword NULL is used in some places to indicate that a
> >      null value is desired.
> > Thus, by rights, NULL=NULL should be true, because there is only one null
> > value.
>
> No! An explicit "unknown" = "unknown" in a constraint clause should
> always evaluate to FALSE (we'll get to GROUP BY later). SQL92 and all
> of my reference books are clear about this. Date and Darwen have a
> good discussion of the shortcomings of NULL in SQL92, pointing out
> that with NULL handling one would really like a distinct UNKNOWN added
> to the possible boolean values TRUE and FALSE so that SQL would have
> true three-value logic.
>
> > About the <group by clause>, the text says:
> >     1) The result of the <group by clause> is a partitioning of T into
> >        a set of groups. The set is the minimum number of groups such
> >        that, for each grouping column of each group of more than one
> >        row, no two values of that grouping column are distinct.
>
> Interesting. Note that SQL92 asks that any column with the DISTINCT
> constraint contain *only one* NULL value in the entire column. Date
> and Darwen point out that this is inconsistant with the fundamental
> notion of "unknown" and renders DISTINCT constraints without NOT NULL
> to be effectively useless. They recommend against having any DISTINCT
> column without having an additional NOT NULL constraint. We've had
> this discussion wrt Postgres, and concluded that we would diverge from
> the standard by allowing multiple NULL fields in DISTINCT columns, to
> make DISTINCT a useful feature with NULLs. It probably didn't hurt
> that Postgres already behaved this way :)
>
> afaik this last point is the *only* place where Postgres intentionally
> diverges from SQL92, and it was done (or rather retained from existing
> behavior) to make a useless feature useful.
>
> > One should note, however, that when the actual comparison operator "=" is
> > used, the standard says that if one of the operands is null, the result of
> > the comparison is unknown. One should make a distinction between making
> > comparisons within group by, uniqueness, and other database-logic
> > operations, and between making the actual comparison (though in my opinion,
> > this should not be so. Comparing a null value to something should be always
> > false unless the other something is also null. But that's my opinion and
> > not the standard's).
>
> One can't take a portion of SQL92 statements wrt NULLs and apply it to
> all uses of NULL, because SQL92 is not internally consistant in this
> regard.
>
> In most GROUP BY situations, a corresponding WHERE col IS NOT NULL is
> probably a good idea.
>
> Regards.
>
>                          - Thomas
>
> --
> Thomas Lockhart                         lockhart@alumni.caltech.edu
> South Pasadena, California

    Sigh.  PostgreSQL seems pretty inconsitant in this... GROUP BY with 1 column
produces NULLs grouped, with 2 colums it usually seems not to(although I somehow
came up with an example where it did, grr... but lets ignore this since it's
supposed to "not work" that way.)... Oracle8, DB/2, and Sybase all group NULLs
together, for compatibility sake wouldn't it be reasonable for PostgreSQL to do
the same?  Else porting applications could fail miserably when one hits this
inconsistency.

--David