Thread

  1. New WIP patch for cross column statistics Re: TEXT vs PG_NODE_TREE in system columns (cross column and expression statistics patch)

    Boszormenyi Zoltan <zb@cybertec.at> — 2011-08-04T12:13:18Z

    Hi,
    
    2011-04-28 17:20 keltezéssel, Alvaro Herrera írta:
    > Excerpts from Boszormenyi Zoltan's message of jue abr 28 11:03:56 -0300 2011:
    >> Hi,
    >>
    >> attached is the WIP patch for cross-column statistics and
    >> extra expression statistics.
    >>
    >> My question is that why pg_node_tree is unusable as
    >> syscache attribute? I attempted to alias it as text in the patch
    >> but I get the following error if I try to use it by setting
    >> USE_SYSCACHE_FOR_SEARCH to 1 in selfuncs.c.
    >> Directly using the underlying pg_statistic3 doesn't cause an error.
    > Two comments:
    > 1. it seems that expression stats are mostly separate from cross-column
    > stats; does it really make sense to submit the two in the same patch?
    >
    > 2. there are almost no code comments anywhere
    >
    > 3. (bonus) if you're going to copy/paste pg_attribute.h verbatim into
    > the new files, please remove the bits you currently have in "#if 0".
    > (Not to mention the fact that the new catalogs seem rather poorly
    > named).
    
    OK, we went to a different route this time. Here is what we came
    up with. Attached are two patches.
    
    attnum-int2vector.patch implements:
    
    - int2vector support routines and catalog entries for them
    - pg_statistic is modified so "staattnum int2" it converted to
      "staattnums int2vector". RemoveStatistics() is modified to take
      an array of AttrNumber and the length of it.
    - pg_attribute.attstattarget is moved to pg_statistic.statarget,
      pg_statistic gains a new "stavalid" bool field. Two support routines
      are added: AddStatistics() and InvalidateStatistics(). Entries
      in pg_statistic for table columns are created upon table creation
      and ALTER TABLE ADD COLUMN and maintained for the lifetime
      of the column. Exceptions are system tables: calling AddStatistics()
      for them during initdb is a Catch-22 when pg_statistic doesn't yet
      exist. For these, ANALYZE creates the pg_statistic record just
      as before. ALTER TABLE ALTER COLUMN SET DATA TYPE
      only invalidates the record by setting "stavalid" to false.
    - Factor out common code for getting the statistics tuple into a
      new function called validate_statistics().
    
    cross-col-syntax.patch builds on the first patch and implements:
    
    CREATE CROSS COLUMN STATISTICS ON TABLE tabname (col, ...)
         [ WITH ( statistics_target ) ] ;
    
    DROP CROSS COLUMN STATISTICS ON TABLE tabname (col, ...) ;
    
    CREATE CROSS COLUMN STATISTICS ON INDEX idxname
         [ WITH ( statistics_target ) ] ;
    
    DROP CROSS COLUMN STATISTICS ON INDEX idxname ;
    
    and puts new records into pg_statistic with array_length(staattnums, 1) > 1.
    Note: this patch should record dependencies on the respective table or
    index and the fields but doesn't.
    
    The data structure for storing the N-dimension histogram is not yet decided.
    
    Comments?
    
    Best regards,
    Zoltán böszörményi
    
    -- 
    ----------------------------------
    Zoltán Böszörményi
    Cybertec Schönig & Schönig GmbH
    Gröhrmühlgasse 26
    A-2700 Wiener Neustadt, Austria
    Web: http://www.postgresql-support.de
         http://www.postgresql.at/