Thread

  1. Data archiving/warehousing idea

    Chris Dunlop <chris@onthe.net.au> — 2007-02-01T02:01:34Z

    G'day hackers,
    
    I had some hand-wavy thoughts about some potential gains for
    postgres in the data archiving/warehousing area.  I'm not able
    to do any work myself on this, and don't actually have a
    pressing need for it so I'm not "requesting" someone do it, but
    I thought it might be worth discussing (if it hasn't been
    already - I couldn't find anything in the mail archives, but
    that doesn't mean it's not there...)
    
    The main idea is that, there might be space utilisation and
    performance advantages if postgres had "hard" read-only tables,
    i.e. tables which were guaranteed (by postgres) to never have
    their data changed (insert/update/delete). 
    
    This could potentially save disk space by allowing "book
    keeping" elements in the page/tuple headers to be removed, e.g.
    visibility information etc.  Also, some indexes could
    potentially be packed tighter if we know the data will never
    change (of course this is already available using the fillfactor
    control).
    
    There could be performance advantages from areas like:
    
      * more efficient disk buffering due to reduced disk space
        requirements per above.
    
      * no need to visit tuple store for visibility info
        during index scan
    
      * greatly reduced or even completely removed locking.  If the
        table is guaranteed read-only, there's no need to lock?
    
      * Planner optimisation?  E.g. changing the cost of
        index and sequential scans for the table due to the previous
        points, and there might be table stats which would be very
        useful to the planner but which are too expensive to
        maintain for changing data.
    
    The idea would be to introduce a statement something like:
    
      ALTER TABLE foo SET ARCHIVE;
    
    This would tell postgres to rewrite the on-disk table to the
    "read only" format, rewrite the indexes for maximum packing
    and collect stats for the planner etc.
    
    Thoughts?  Brickbats?
    
    
    Cheers,
    
    Chris.