Thread

  1. Re: Feature suggestions (long)

    Don Baccus <dhogaza@pacifier.com> — 2003-05-17T11:04:11Z

    On Saturday 17 May 2003 10:51 am, Matthew Kirkwood wrote:
    > On Sat, 17 May 2003, Alvaro Herrera wrote:
    > > > > I'm going suggest a feature like what Oracle calls "partitions" and
    > > > > later on something with indexes. The idea is to generate some
    > > > > discussion to see if they are worthy of being added to the TODO list.
    > > >
    > > > Why bother?
    > >
    > > Maybe one can put different partitions in different tablespaces?
    >
    > One can.  The tablespace a partition is in can even be
    > offline if Oracle can prove that a query doesn't require
    > that partition.
    
    People use this feature for warehousing old data that they don't want to purge 
    from the database.   For very large databases (of course that definition 
    changes with each new generation of computer) this can greatly improve the 
    performance of queries on the active portion of the data.
    
    
    
  2. Feature suggestions (long)

    Martijn van Oosterhout <kleptog@svana.org> — 2003-05-17T15:00:40Z

    [Please CC any replies to me. Thanks.]
    
    I'm going suggest a feature like what Oracle calls "partitions" and later on
    something with indexes. The idea is to generate some discussion to see if
    they are worthy of being added to the TODO list.
    
    Partitions
    ==========
    The idea is to split data across tables according to the contents of a
    tuple. So you could split the contents of a table based on years or some
    such. I'm not going to go too much into why it's a good idea, there are
    several links on the web discussing it:
    
    http://www.nyoug.org/200212nanda.pdf
    
    Anyway, what I'm proposing is different from what Oracle does but flexible
    enough such that what Oracle does could the implemented/emulated easily
    enough. I was thinking along the lines of:
    
    CREATE PARTITION sales_2003 ON sales WHERE saledate >= '2003-01-01' AND saledate < '2004-01-01';
    
    This would produce a sequence of events equivalent to:
    
    CREATE TABLE sales_2003 () INHERITS ( sales );
    INSERT INTO sales_2003 SELECT * FROM sales WHERE <condition>;
    DELETE FROM sales WHERE <condition>;
    
    CREATE RULE rule1 AS ON INSERT TO sales 
    WHERE <condition> DO INSTEAD 
    INSERT INTO sales_2003 (NEW.*);
    
    CREATE RULE rule2 AS ON DELETE TO sales
    WHERE <condition> DO INSTEAD
    DELETE FROM sales_2003 WHERE <???>;
    
    <a pile more RULEs>
    
    err... It gets a bit complex here. My point is that it can be done but with
    proper support it can be done more efficiently, reliably and usably.
    
    Internally you could treat them more like real inherited tables. Inserts
    would be fairly straight forward (use the conditions to determine the
    resulting table). Deletes require no work at all. Updates are trickier. The
    problem is that after an update the tuple may end up needing to be in
    another table. Adding/dropping/altering columns would have to be handled
    similarly to inherited tables.
    
    Looking at heap_update or ExecUpdate it's not clear whether it would be a
    problem if the old and new tuples are in a separate tables. As long as the
    descriptors are the same it shouldn't be too much of a deal hopefully.
    
    So far the storage doesn't seem too difficult. I'm not sure how to deal with
    inheritance and this. I'd suggest making partitions work only on individual
    tables and not inherited.
    
    Next stage would be teaching the planner. The conditions would be
    pseudo-constraints on the partitions. Hence if the conditions and the
    constraints form a non-intersecting set, you can skip that partition
    altogether.
    
    Now, semantically, should you be allowed to do inserts, updates and deletes
    on partitions directly, or should they be hidden? If/when we get
    table-spaces, you should be able to move them around.
    
    Deleting a partition would amount to moving all the tuples back to the main
    table. Of course, if there are other partitions the conditions will have to
    be re-evaluated to place the tuple into one of the other partitions.
    
    What do you do if multiple partitions have conflicting conditions? I'd be
    tempted to do a first-come-first-serve basis, though maybe some kind of
    priority? One advantage of the Oracle approach is that this issue doesn't
    come up. We certainly can't write a system to detect anything but the
    obvious cases.
    
    Of course, making individual indexes for all the partitions could get
    tedious, even if the system did it automatically; which me brings to my next
    suggestion:
    
    Multi-table indexes
    ===================
    
    Currently an index only applies to a single table. If you use inheritance
    all the indexes can get cumbersome. With something like the above partitions
    it could get completely out of hand if used heavily. What I'd like is if
    instead of an index mapping <index keys> -> ctid, it does <index keys> ->
    (tableoid,ctid). At the same time it lets us build UNIQUE indexes for
    inheritance (for primary keys).
    
    Currently, an index is represented by a row in pg_index. To implement this
    you would need to arrange a way for an index to list all its constituent
    tables so that when doing an update you know which to indexes to update.
    Multiple rows seems the easiest way but it does violate a uniqueness
    constraint.
    
    Obviously, the columns indexed can only be present in the root table.
    However, the index would be usable for queries on subtables too. The planner
    would simply have to remember to push down 'tableoid IN (oid,oid,oid,...)'
    conditions.
    
    The major problem here is statistics. What you really need is some sort of
    aggregate statistics over an entire inheritance tree. Without something like
    this the planner can't make sensible decisions on when to use the index.
    ANALYZE currently doesn't do this. It's not entirely clear how it would go
    about it either. This doesn't apply to UNIQUE indexes though.
    
    Another issue is when a table is deleted. Normally you can just purge the
    index when the table is deleted. With these indexes you can't since the
    remaining data is still useful. So the index access method would have to be
    careful not to return tuples from tables that don't exist anymore. VACUUM
    would hopefully eventually clean them up.
    
    There would also be an issue of the IndexScan node returning the right
    fields depending on which part of the subtree is being queried. For example,
    if B were inherited from A then an IndexScan on A* could return fields not
    in A if the query only ever referenced B. This may already be solved as
    the IndexScan node can scan multiple indexes.
    
    =
    
    In any case, the second feature seems like it will be required (or at least
    something similar) to handle proper referential integrity and primary keys
    for inherited tables. But if something like the former is implemented
    (hopefully, as using rules would get very cumbersome) then the latter would
    be necessary.
    
    Have a good weekend.
    -- 
    Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
    > "the West won the world not by the superiority of its ideas or values or
    > religion but rather by its superiority in applying organised violence.
    > Westerners often forget this fact, non-Westerners never do."
    >   - Samuel P. Huntington
    
  3. Re: Feature suggestions (long)

    Tom Lane <tgl@sss.pgh.pa.us> — 2003-05-17T16:05:23Z

    Martijn van Oosterhout <kleptog@svana.org> writes:
    > I'm going suggest a feature like what Oracle calls "partitions" and later on
    > something with indexes. The idea is to generate some discussion to see if
    > they are worthy of being added to the TODO list.
    
    Why bother?  Make partial indexes corresponding to what you are calling
    the partitions of the table, and (I claim) you can get every possible
    benefit of a partitioning scheme.  Plus more, because there's nothing
    constraining the partial indexes to be nonoverlapping, so you can get
    efficient plans for sets of queries that no partitioning scheme would
    win for.
    
    			regards, tom lane
    
    
  4. Re: Feature suggestions (long)

    Alvaro Herrera <alvherre@dcc.uchile.cl> — 2003-05-17T16:47:20Z

    On Sat, May 17, 2003 at 12:05:23PM -0400, Tom Lane wrote:
    > Martijn van Oosterhout <kleptog@svana.org> writes:
    > > I'm going suggest a feature like what Oracle calls "partitions" and later on
    > > something with indexes. The idea is to generate some discussion to see if
    > > they are worthy of being added to the TODO list.
    > 
    > Why bother?
    
    Maybe one can put different partitions in different tablespaces?
    
    -- 
    Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
    Essentially, you're proposing Kevlar shoes as a solution for the problem
    that you want to walk around carrying a loaded gun aimed at your foot.
    (Tom Lane)
    
    
  5. Re: Feature suggestions (long)

    Matthew Kirkwood <matthew@hairy.beasts.org> — 2003-05-17T17:51:48Z

    On Sat, 17 May 2003, Alvaro Herrera wrote:
    
    > > > I'm going suggest a feature like what Oracle calls "partitions" and later on
    > > > something with indexes. The idea is to generate some discussion to see if
    > > > they are worthy of being added to the TODO list.
    > >
    > > Why bother?
    >
    > Maybe one can put different partitions in different tablespaces?
    
    One can.  The tablespace a partition is in can even be
    offline if Oracle can prove that a query doesn't require
    that partition.
    
    Matthew.
    
    
    
  6. Re: Feature suggestions (long)

    Don Baccus <dhogaza@pacifier.com> — 2003-05-17T18:42:01Z

    On Saturday 17 May 2003 06:31 pm, Martijn van Oosterhout wrote:
    
    > Hmm, no comments on the UNIQUE-index-over-multiple-tables. I would have
    > thought that would've been the more interesting one.
    
    This would enable making PG inheritence one step closer to being truly useful, 
    as one could enforce a primary key on a table and all its children.
    
    
    
    
  7. Re: Feature suggestions (long)

    Martijn van Oosterhout <kleptog@svana.org> — 2003-05-18T01:31:54Z

    On Sat, May 17, 2003 at 04:04:11AM -0700, Don Baccus wrote:
    > On Saturday 17 May 2003 10:51 am, Matthew Kirkwood wrote:
    > > On Sat, 17 May 2003, Alvaro Herrera wrote:
    > > > > > I'm going suggest a feature like what Oracle calls "partitions" and
    > > > > > later on something with indexes. The idea is to generate some
    > > > > > discussion to see if they are worthy of being added to the TODO list.
    > > > >
    > > > > Why bother?
    > > >
    > > > Maybe one can put different partitions in different tablespaces?
    > >
    > > One can.  The tablespace a partition is in can even be
    > > offline if Oracle can prove that a query doesn't require
    > > that partition.
    > 
    > People use this feature for warehousing old data that they don't want to purge 
    > from the database.   For very large databases (of course that definition 
    > changes with each new generation of computer) this can greatly improve the 
    > performance of queries on the active portion of the data.
    
    Or can't delete (eg. financial records). The reason I'm looking at it is for
    queries where you want a report depending on all the data for 2002. Since
    this data is mixed in with all the data for upto 7 previous years, it's too
    big for a index scan but doing a seq. scan across the whole table is
    very expensive.
    
    What this buys you is being able to seq. scan over portions of a table,
    rather than the whole table. Currently we manage this manually by moving
    tuples around after the fact.
    
    Hmm, no comments on the UNIQUE-index-over-multiple-tables. I would have
    thought that would've been the more interesting one.
    
    Have a nice weekend.
    -- 
    Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
    > "the West won the world not by the superiority of its ideas or values or
    > religion but rather by its superiority in applying organized violence.
    > Westerners often forget this fact, non-Westerners never do."
    >   - Samuel P. Huntington
    
  8. Re: Feature suggestions (long)

    Martijn van Oosterhout <kleptog@svana.org> — 2003-05-18T01:56:29Z

    On Sat, May 17, 2003 at 11:42:01AM -0700, Don Baccus wrote:
    > On Saturday 17 May 2003 06:31 pm, Martijn van Oosterhout wrote:
    > 
    > > Hmm, no comments on the UNIQUE-index-over-multiple-tables. I would have
    > > thought that would've been the more interesting one.
    > 
    > This would enable making PG inheritence one step closer to being truly useful, 
    > as one could enforce a primary key on a table and all its children.
    
    Actually, what I described in my original post was a bit more general than
    just UNIQUE indexes. From what I wrote it doesn't seem as if it'd be too
    complicated (though I won't pretend to fully understand the index code).
    
    But there must be a few hurdles to overcome, which is why I expected someone
    to point out what I'd miseed.
    
    Have a nice weekend,
    -- 
    Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
    > "the West won the world not by the superiority of its ideas or values or
    > religion but rather by its superiority in applying organized violence.
    > Westerners often forget this fact, non-Westerners never do."
    >   - Samuel P. Huntington
    
  9. Re: Feature suggestions (long)

    Mark Kirkwood <markir@paradise.net.nz> — 2003-05-18T05:02:39Z

    
    Matthew Kirkwood wrote:
    
    >On Sat, 17 May 2003, Alvaro Herrera wrote:
    >
    >  
    >
    >>>>I'm going suggest a feature like what Oracle calls "partitions" and later on
    >>>>something with indexes. The idea is to generate some discussion to see if
    >>>>they are worthy of being added to the TODO list.
    >>>>        
    >>>>
    >>>Why bother?
    >>>      
    >>>
    >>Maybe one can put different partitions in different tablespaces?
    >>    
    >>
    >
    >One can.  The tablespace a partition is in can even be
    >offline if Oracle can prove that a query doesn't require
    >that partition.
    >  
    >
    
    Being able to "segment" tables for admin purposes (like archiving) might 
    be beneficial.
    
    I guess this could be orthogonal to clustering or partitioning schemes 
    that mininize table/index access for queries.
    
    (Oracle has sort of rolled these ideas together with their partitioning)
    
    Mark
    
    
    
    
    
    
    
  10. Re: Feature suggestions (long)

    Martijn van Oosterhout <kleptog@svana.org> — 2003-05-18T06:21:55Z

    On Sun, May 18, 2003 at 05:02:39PM +1200, Mark Kirkwood wrote:
    > Being able to "segment" tables for admin purposes (like archiving) might 
    > be beneficial.
    > 
    > I guess this could be orthogonal to clustering or partitioning schemes 
    > that mininize table/index access for queries.
    
    Segmenting tables is what inheritence can buy you now (although somewhat
    suboptimally). You can create subtables of your super-table and just
    remember which table is supposed to contain which tuples. By adding CHECK
    constraints you can even make the system error out on you if you make a
    mistake.
    
    Segmenting tables for archiving is only useful if the planner rarely uses
    them. If the planner is still going to generate a seq. scan on the "archive"
    every time you query the table, then it's not really archived in any sense.
    
    Which bring another question. If I have a CHECK constraint on a table saying
    'id < 5000' and I do a query 'WHERE id = 6000', is the query optimiser
    clever enough to notice that there can be no matches. Preliminary testing
    indicates no (although I see it's 7.2).
    
    > (Oracle has sort of rolled these ideas together with their partitioning)
    
    My guess is because segmenting without the planner cleverness is a bit
    useless.
    -- 
    Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
    > "the West won the world not by the superiority of its ideas or values or
    > religion but rather by its superiority in applying organized violence.
    > Westerners often forget this fact, non-Westerners never do."
    >   - Samuel P. Huntington
    
  11. Re: Feature suggestions (long)

    Greg Stark <gsstark@mit.edu> — 2003-05-18T13:24:06Z

    Tom Lane <tgl@sss.pgh.pa.us> writes:
    
    > Martijn van Oosterhout <kleptog@svana.org> writes:
    > > I'm going suggest a feature like what Oracle calls "partitions" and later on
    > > something with indexes. The idea is to generate some discussion to see if
    > > they are worthy of being added to the TODO list.
    > 
    > Why bother?  Make partial indexes corresponding to what you are calling
    > the partitions of the table, and (I claim) you can get every possible
    > benefit of a partitioning scheme.  Plus more, because there's nothing
    > constraining the partial indexes to be nonoverlapping, so you can get
    > efficient plans for sets of queries that no partitioning scheme would
    > win for.
    
    Partial indexes get some of the advantages of partitioned tables, but
    certainly not all. 
    
    A big advantage of partitioned tables is being able to manage the data in
    whole chunks very efficiently.
    
    . Archive all records for a single year becomes a seqential scan instead of an
      index scan.
    
    . Delete all records for a single year becomes a truncate table instead of an
      update.
    
    This will become even more important when postgres gets some other features
    like being able to set the physical storage location of each table and the
    equivalent of what oracle calls "transportable tablespaces". 
    
    Then you can move individual partitions to slow read-only media and keep the
    currently active partition on the fast read-write media. Or back up the old
    partition and drop it but load it on the DSS system using a simple binary
    copy.
    
    This isn't theoretical. I've done exactly this before. We had a table that
    grew by about a million records per day. When we used dml statements to
    archive the old records to the DSS system it took over a day and frequently
    failed. When we switched to partitioned tables we were able to run it reliably
    daily during prime time without impacting performance and have up-to-date data
    in the DSS system.
    
    -- 
    greg
    
    
    
  12. Re: Feature suggestions (long)

    Mark Kirkwood <markir@paradise.net.nz> — 2003-05-19T07:55:07Z

    
    Martijn van Oosterhout wrote:
    
    >>(Oracle has sort of rolled these ideas together with their partitioning)
    >>    
    >>
    >
    >My guess is because segmenting without the planner cleverness is a bit
    >useless.
    >
    Yes!  you want the planner to be *aware* of any "segmenting", but there 
    are other was to reduce the "amount of table" that is scanned (e.g. 
    partial indexes and clustering), so the "Oracle solution" might not be 
    the only  (or best) way to approach :
    
    1   query plans for large tables
    2   administration of large tables
    
    cheers
    
    Mark
    
    
    
    
  13. Re: Feature suggestions (long)

    Jim Nasby <jim@nasby.net> — 2003-05-19T19:10:35Z

    On Sat, May 17, 2003 at 04:04:11AM -0700, Don Baccus wrote:
    > On Saturday 17 May 2003 10:51 am, Matthew Kirkwood wrote:
    > > On Sat, 17 May 2003, Alvaro Herrera wrote:
    > > > > > I'm going suggest a feature like what Oracle calls "partitions" and
    > > > > > later on something with indexes. The idea is to generate some
    > > > > > discussion to see if they are worthy of being added to the TODO list.
    > > > >
    > > > > Why bother?
    > > >
    > > > Maybe one can put different partitions in different tablespaces?
    > >
    > > One can.  The tablespace a partition is in can even be
    > > offline if Oracle can prove that a query doesn't require
    > > that partition.
    > 
    > People use this feature for warehousing old data that they don't want to purge 
    > from the database.   For very large databases (of course that definition 
    > changes with each new generation of computer) this can greatly improve the 
    > performance of queries on the active portion of the data.
     
    This would be very useful for http://stats.distributed.net. Right now
    I'm trying to 'partition' our main (120M row) table by project_id.
    Sybase handled this very well, because it has good support for
    clustered indexes/index organized tables. PGSQL lacks this, which means
    large reads from that table are much more painful than on sybase,
    because you either tablescan (ugh) or go though the overhead of an index
    read for every row retrieved (not much better).
    
    Partitioning would fit the bill here perfectly (though in my case
    good/true clustering would work just as well).
    -- 
    Jim C. Nasby (aka Decibel!)                    jim@nasby.net
    Member: Triangle Fraternity, Sports Car Club of America
    Give your computer some brain candy! www.distributed.net Team #1828
    
    Windows: "Where do you want to go today?"
    Linux: "Where do you want to go tomorrow?"
    FreeBSD: "Are you guys coming, or what?"