Thread

  1. Re: [HACKERS] Counting bool flags in a complex query

    Michael Richards <miker@scifair.acadiau.ca> — 1999-07-16T08:37:20Z

    On Thu, 15 Jul 1999, Tom Lane wrote:
    
    > Michael Richards <miker@scifair.acadiau.ca> writes:
    > > I'm not sure this is correct, but I think I see a bug of some sort...
    > 
    > I committed a fix last night; it will be in 6.5.1.
    
    I've found what I believe is another set of bugs:
    This is my monster query again...
    
    My folder numbers are: negative numbers are system folders such as New
    mail, trash, drafts and sentmail. I wanted to order the tuples so that the
    folderids were sorted from -1 to -4, then 1 to x. This way the system
    folders would always appear first in the list.
    
    This may not be valid SQL, as none of my books mention it. Is it possible
    to order by an expression?
    
    Here are some examples which some some odd behaviour. My suspected bug
    findings are at the end:
    
    SELECT folderid,foldername,count(*) as "messgaes",sum(bool2int(flagnew))
    as "newmessages",sum(contentlength) as "size" FROM usermail,folders WHERE
    usermail.loginid='michael' and folders.loginid=usermail.loginid AND
    usermail.folder = folders.folderid GROUP BY folderid,foldername UNION
    SELECT folderid,foldername,0,0,0 FROM folders WHERE loginid='michael' AND
    NOT EXISTS (SELECT folder FROM usermail WHERE loginid='michael' AND
    folder=folderid) order by (folderid>0);
    folderid|foldername      |messgaes|newmessages|   size
    --------+----------------+--------+-----------+-------
          -4|Deleted Messages|     110|         50| 245627
          -2|Sent Mail       |       7|          2|  10878
          -1|New Mail Folder |      73|          1|8831226
           1|OOL             |       7|          0|   8470
           2|suggestions     |      26|          0|  35433
           3|Acadia          |       5|          0|  17703
           4|advertising     |       4|          2|   5394
           5|dealt with      |       3|          0|   2883
          36|dauphne         |       9|          0|  66850
          -3|Saved Drafts    |       0|          0|      0
    (10 rows)
    
    It looks like the order by is only being applied to the original select,
    not the unioned select. Some authority should check on it, but by thought
    it that a union does not necessarily maintain the order, so the entire
    select should be applied to the order.
    
    I'm not so good at interpreting the query plan, but here it is:
    Unique  (cost=8.10 rows=0 width=0)
      ->  Sort  (cost=8.10 rows=0 width=0)
        ->  Append  (cost=8.10 rows=0 width=0)
          ->  Aggregate  (cost=6.05 rows=1 width=49)
            ->  Group  (cost=6.05 rows=1 width=49)
              ->  Sort  (cost=6.05 rows=1 width=49)
                ->  Nested Loop  (cost=6.05 rows=1 width=49)
                  ->  Index Scan using usermail_pkey on usermail  (cost=2.05 rows=2 width=21)
                  ->  Index Scan using folders_pkey on folders  (cost=2.00 rows=8448 width=28)
            -> Index Scan using folders_pkey on folders (cost=2.05 rows=2 width=16)
                 SubPlan
                   -> Index Scan using usermail_pkey on usermail (cost=2.05 rows=1 width=4)
    
    I would have expected the folderid -3 to appear as the 3rd one in this
    case.
    
    I'm probably going to change the numbering scheme of the system folders so
    they will sort correctly without a kluge such as:
    create function ordfolderid(int) returns int as 'select $1*-1 where $1<0
    union select $1+1*10 where $1>=0' language 'sql';
    
    Then running the order clause as: 
    order by (folderid<0),ordfolderid(folderid)
    My thought behind this kludge is that the table should first be ordered by
    the t/f value of the fact folderid<0, then within each of the true and
    false sortings, subsort those by the value of folderid.
    
    Complicated enough for you?
    
    Well, in my playing I notice what appears to be more of a bug...
    SELECT folderid,foldername,count(*) as "messages",sum(bool2int(flagnew))
    as "newmessages",sum(contentlength) as "size" FROM usermail,folders WHERE
    usermail.loginid='michael' and folders.loginid=usermail.loginid AND
    usermail.folder = folders.folderid GROUP BY folderid,foldername UNION
    SELECT folderid,foldername,0,0,0 FROM folders WHERE loginid='michael' AND
    NOT EXISTS (SELECT folder FROM usermail WHERE loginid='michael' AND
    folder=folderid) order by (folderid<0);
    folderid|foldername      |messgaes|newmessages|   size
    --------+----------------+--------+-----------+-------
           1|OOL             |       7|          0|   8470
           2|suggestions     |      26|          0|  35433
           3|Acadia          |       5|          0|  17703
           4|advertising     |       4|          2|   5394
           5|dealt with      |       3|          0|   2883
          36|dauphne         |       9|          0|  66850
          -4|Deleted Messages|     110|         50| 245627
          -2|Sent Mail       |       7|          2|  10878
          -1|New Mail Folder |      73|          1|8831226
          -3|Saved Drafts    |       0|          0|      0
    (10 rows)
    
    SELECT folderid,foldername,count(*) as "messages",sum(bool2int(flagnew))
    as "newmessages",sum(contentlength) as "size" FROM usermail,folders WHERE
    usermail.loginid='michael' and folders.loginid=usermail.loginid AND
    usermail.folder = folders.folderid GROUP BY folderid,foldername UNION
    SELECT folderid,foldername,0,0,0 FROM folders WHERE loginid='michael' AND
    NOT EXISTS (SELECT folder FROM usermail WHERE loginid='michael' AND
    folder=folderid) order by (messages<10);
    ERROR:  attribute 'messages' not found
    
    Using a column name within an expression in the order by does not seem to
    work...
    Or a much simpler example to illustrate the bug:
    fastmail=> select 1 as "test" order by (test<9);
    ERROR:  attribute 'test' not found
    
    fastmail=> select 1 as "test" order by test;
    test
    ----
       1
    (1 row)
    
    
    I was almost able to make it work properly aside from the sorting issue
    with my kludged up routine... This is so nasty that I most definitely
    don't want to put it into production:
    
    SELECT folderid,foldername,count(*) as "messages",sum(bool2int(flagnew))
    as "newmessages",sum(contentlength) as "size",(folderid>=0) FROM
    usermail,folders WHERE usermail.loginid='michael' and
    folders.loginid=usermail.loginid AND usermail.folder = folders.folderid
    GROUP BY folderid,foldername UNION SELECT
    folderid,foldername,0,0,0,(folderid>=0) FROM folders WHERE
    loginid='michael' AND NOT EXISTS (SELECT folder FROM usermail WHERE
    loginid='michael' AND folder=folderid) order by 6,ordfolderid(folderid);
    folderid|foldername      |messages|newmessages|   size|?column?
    --------+----------------+--------+-----------+-------+--------
          -1|New Mail Folder |      73|          1|8831226|f       
          -2|Sent Mail       |       7|          2|  10878|f       
          -4|Deleted Messages|     110|         50| 245627|f       
          -3|Saved Drafts    |       0|          0|      0|f       
           1|OOL             |       7|          0|   8470|t       
           2|suggestions     |      26|          0|  35433|t       
           3|Acadia          |       5|          0|  17703|t       
           4|advertising     |       4|          2|   5394|t       
           5|dealt with      |       3|          0|   2883|t       
          36|dauphne         |       9|          0|  66850|t       
    (10 rows)
    
    Do I need outer joins to make this work instead of the screwed up union
    method I'm trying here, or is it just a series of bugs?
    
    -Michael