Thread

  1. Re: [HACKERS] Re: [SQL] Oddities with NULL and GROUP BY

    secret <secret@kearneydev.com> — 1999-05-19T15:28:42Z

    Thomas Lockhart wrote:
    
    > > > The Pratical SQL Handbook at page 171 says:
    > > > Since nulls represent "the great unknown", there is no way to know
    > > > whether one null is equal to any other null. Each unknown value
    > > > may or may not be different from another.
    >
    > Although I've noticed some questionable statements quoted from this
    > book, this looks good...
    >
    > > > Thus: NULL!=NULL but on GROUP BY it is considered as NULL=NULL.
    > > This is something I have complained about time and again. It is time
    > > something is changed about it, otherwise Postgres will NEVER be a
    > > standard-compliant RDBMS.
    >
    > Postgres conforms to SQL92 in this regard. Date and Darwen, "A Guide
    > to the SQL Standard", 3rd ed., are explicit about this near the top of
    > page 249:
    >
    > Duplicates are relevant to the ... GROUP BY ... operations ...
    > ... GROUP BY groups rows together on the basis of duplicate values in
    > the set of grouping columns (and those sets of grouping column values
    > can be regarded as "rows" for present purposes). The point is,
    > however, the definition of duplicate rows requires some refinement in
    > the presence of nulls. Let "left" and "right" be as defined
    > (previously). Then "left" and "right" are defined to be "duplicates"
    > of one another if and only if, for all "i" in the range 1 to "n",
    > either "left_i" = "right_i" is TRUE, or "left_i" and "right_i" are
    > both null.
    >
    > There is a single exception to Postgres' SQL92 conformance wrt NULLs
    > afaik, involving DISTINCT column constraints which I discuss below.
    >
    > > > However, if the grouping column contains more than one null,
    > > > all of them are put into a single group.
    > > > Thus: NULL!=NULL but on GROUP BY it is considered as NULL=NULL.
    > > The SQL92 text says:
    > >      A null value is an implementation-dependent special value that
    > >      is distinct from all non-null values of the associated data type.
    > >      There is effectively only one null value and that value is a member
    > >      of every SQL data type. There is no <literal> for a null value,
    > >      although the keyword NULL is used in some places to indicate that a
    > >      null value is desired.
    > > Thus, by rights, NULL=NULL should be true, because there is only one null
    > > value.
    >
    > No! An explicit "unknown" = "unknown" in a constraint clause should
    > always evaluate to FALSE (we'll get to GROUP BY later). SQL92 and all
    > of my reference books are clear about this. Date and Darwen have a
    > good discussion of the shortcomings of NULL in SQL92, pointing out
    > that with NULL handling one would really like a distinct UNKNOWN added
    > to the possible boolean values TRUE and FALSE so that SQL would have
    > true three-value logic.
    >
    > > About the <group by clause>, the text says:
    > >     1) The result of the <group by clause> is a partitioning of T into
    > >        a set of groups. The set is the minimum number of groups such
    > >        that, for each grouping column of each group of more than one
    > >        row, no two values of that grouping column are distinct.
    >
    > Interesting. Note that SQL92 asks that any column with the DISTINCT
    > constraint contain *only one* NULL value in the entire column. Date
    > and Darwen point out that this is inconsistant with the fundamental
    > notion of "unknown" and renders DISTINCT constraints without NOT NULL
    > to be effectively useless. They recommend against having any DISTINCT
    > column without having an additional NOT NULL constraint. We've had
    > this discussion wrt Postgres, and concluded that we would diverge from
    > the standard by allowing multiple NULL fields in DISTINCT columns, to
    > make DISTINCT a useful feature with NULLs. It probably didn't hurt
    > that Postgres already behaved this way :)
    >
    > afaik this last point is the *only* place where Postgres intentionally
    > diverges from SQL92, and it was done (or rather retained from existing
    > behavior) to make a useless feature useful.
    >
    > > One should note, however, that when the actual comparison operator "=" is
    > > used, the standard says that if one of the operands is null, the result of
    > > the comparison is unknown. One should make a distinction between making
    > > comparisons within group by, uniqueness, and other database-logic
    > > operations, and between making the actual comparison (though in my opinion,
    > > this should not be so. Comparing a null value to something should be always
    > > false unless the other something is also null. But that's my opinion and
    > > not the standard's).
    >
    > One can't take a portion of SQL92 statements wrt NULLs and apply it to
    > all uses of NULL, because SQL92 is not internally consistant in this
    > regard.
    >
    > In most GROUP BY situations, a corresponding WHERE col IS NOT NULL is
    > probably a good idea.
    >
    > Regards.
    >
    >                          - Thomas
    >
    > --
    > Thomas Lockhart                         lockhart@alumni.caltech.edu
    > South Pasadena, California
    
        Sigh.  PostgreSQL seems pretty inconsitant in this... GROUP BY with 1 column
    produces NULLs grouped, with 2 colums it usually seems not to(although I somehow
    came up with an example where it did, grr... but lets ignore this since it's
    supposed to "not work" that way.)... Oracle8, DB/2, and Sybase all group NULLs
    together, for compatibility sake wouldn't it be reasonable for PostgreSQL to do
    the same?  Else porting applications could fail miserably when one hits this
    inconsistency.
    
    --David