Thread
-
Re: Feature suggestions (long)
Don Baccus <dhogaza@pacifier.com> — 2003-05-17T11:04:11Z
On Saturday 17 May 2003 10:51 am, Matthew Kirkwood wrote: > On Sat, 17 May 2003, Alvaro Herrera wrote: > > > > I'm going suggest a feature like what Oracle calls "partitions" and > > > > later on something with indexes. The idea is to generate some > > > > discussion to see if they are worthy of being added to the TODO list. > > > > > > Why bother? > > > > Maybe one can put different partitions in different tablespaces? > > One can. The tablespace a partition is in can even be > offline if Oracle can prove that a query doesn't require > that partition. People use this feature for warehousing old data that they don't want to purge from the database. For very large databases (of course that definition changes with each new generation of computer) this can greatly improve the performance of queries on the active portion of the data.
-
Feature suggestions (long)
Martijn van Oosterhout <kleptog@svana.org> — 2003-05-17T15:00:40Z
[Please CC any replies to me. Thanks.] I'm going suggest a feature like what Oracle calls "partitions" and later on something with indexes. The idea is to generate some discussion to see if they are worthy of being added to the TODO list. Partitions ========== The idea is to split data across tables according to the contents of a tuple. So you could split the contents of a table based on years or some such. I'm not going to go too much into why it's a good idea, there are several links on the web discussing it: http://www.nyoug.org/200212nanda.pdf Anyway, what I'm proposing is different from what Oracle does but flexible enough such that what Oracle does could the implemented/emulated easily enough. I was thinking along the lines of: CREATE PARTITION sales_2003 ON sales WHERE saledate >= '2003-01-01' AND saledate < '2004-01-01'; This would produce a sequence of events equivalent to: CREATE TABLE sales_2003 () INHERITS ( sales ); INSERT INTO sales_2003 SELECT * FROM sales WHERE <condition>; DELETE FROM sales WHERE <condition>; CREATE RULE rule1 AS ON INSERT TO sales WHERE <condition> DO INSTEAD INSERT INTO sales_2003 (NEW.*); CREATE RULE rule2 AS ON DELETE TO sales WHERE <condition> DO INSTEAD DELETE FROM sales_2003 WHERE <???>; <a pile more RULEs> err... It gets a bit complex here. My point is that it can be done but with proper support it can be done more efficiently, reliably and usably. Internally you could treat them more like real inherited tables. Inserts would be fairly straight forward (use the conditions to determine the resulting table). Deletes require no work at all. Updates are trickier. The problem is that after an update the tuple may end up needing to be in another table. Adding/dropping/altering columns would have to be handled similarly to inherited tables. Looking at heap_update or ExecUpdate it's not clear whether it would be a problem if the old and new tuples are in a separate tables. As long as the descriptors are the same it shouldn't be too much of a deal hopefully. So far the storage doesn't seem too difficult. I'm not sure how to deal with inheritance and this. I'd suggest making partitions work only on individual tables and not inherited. Next stage would be teaching the planner. The conditions would be pseudo-constraints on the partitions. Hence if the conditions and the constraints form a non-intersecting set, you can skip that partition altogether. Now, semantically, should you be allowed to do inserts, updates and deletes on partitions directly, or should they be hidden? If/when we get table-spaces, you should be able to move them around. Deleting a partition would amount to moving all the tuples back to the main table. Of course, if there are other partitions the conditions will have to be re-evaluated to place the tuple into one of the other partitions. What do you do if multiple partitions have conflicting conditions? I'd be tempted to do a first-come-first-serve basis, though maybe some kind of priority? One advantage of the Oracle approach is that this issue doesn't come up. We certainly can't write a system to detect anything but the obvious cases. Of course, making individual indexes for all the partitions could get tedious, even if the system did it automatically; which me brings to my next suggestion: Multi-table indexes =================== Currently an index only applies to a single table. If you use inheritance all the indexes can get cumbersome. With something like the above partitions it could get completely out of hand if used heavily. What I'd like is if instead of an index mapping <index keys> -> ctid, it does <index keys> -> (tableoid,ctid). At the same time it lets us build UNIQUE indexes for inheritance (for primary keys). Currently, an index is represented by a row in pg_index. To implement this you would need to arrange a way for an index to list all its constituent tables so that when doing an update you know which to indexes to update. Multiple rows seems the easiest way but it does violate a uniqueness constraint. Obviously, the columns indexed can only be present in the root table. However, the index would be usable for queries on subtables too. The planner would simply have to remember to push down 'tableoid IN (oid,oid,oid,...)' conditions. The major problem here is statistics. What you really need is some sort of aggregate statistics over an entire inheritance tree. Without something like this the planner can't make sensible decisions on when to use the index. ANALYZE currently doesn't do this. It's not entirely clear how it would go about it either. This doesn't apply to UNIQUE indexes though. Another issue is when a table is deleted. Normally you can just purge the index when the table is deleted. With these indexes you can't since the remaining data is still useful. So the index access method would have to be careful not to return tuples from tables that don't exist anymore. VACUUM would hopefully eventually clean them up. There would also be an issue of the IndexScan node returning the right fields depending on which part of the subtree is being queried. For example, if B were inherited from A then an IndexScan on A* could return fields not in A if the query only ever referenced B. This may already be solved as the IndexScan node can scan multiple indexes. = In any case, the second feature seems like it will be required (or at least something similar) to handle proper referential integrity and primary keys for inherited tables. But if something like the former is implemented (hopefully, as using rules would get very cumbersome) then the latter would be necessary. Have a good weekend. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > "the West won the world not by the superiority of its ideas or values or > religion but rather by its superiority in applying organised violence. > Westerners often forget this fact, non-Westerners never do." > - Samuel P. Huntington
-
Re: Feature suggestions (long)
Tom Lane <tgl@sss.pgh.pa.us> — 2003-05-17T16:05:23Z
Martijn van Oosterhout <kleptog@svana.org> writes: > I'm going suggest a feature like what Oracle calls "partitions" and later on > something with indexes. The idea is to generate some discussion to see if > they are worthy of being added to the TODO list. Why bother? Make partial indexes corresponding to what you are calling the partitions of the table, and (I claim) you can get every possible benefit of a partitioning scheme. Plus more, because there's nothing constraining the partial indexes to be nonoverlapping, so you can get efficient plans for sets of queries that no partitioning scheme would win for. regards, tom lane
-
Re: Feature suggestions (long)
Alvaro Herrera <alvherre@dcc.uchile.cl> — 2003-05-17T16:47:20Z
On Sat, May 17, 2003 at 12:05:23PM -0400, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > I'm going suggest a feature like what Oracle calls "partitions" and later on > > something with indexes. The idea is to generate some discussion to see if > > they are worthy of being added to the TODO list. > > Why bother? Maybe one can put different partitions in different tablespaces? -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) Essentially, you're proposing Kevlar shoes as a solution for the problem that you want to walk around carrying a loaded gun aimed at your foot. (Tom Lane)
-
Re: Feature suggestions (long)
Matthew Kirkwood <matthew@hairy.beasts.org> — 2003-05-17T17:51:48Z
On Sat, 17 May 2003, Alvaro Herrera wrote: > > > I'm going suggest a feature like what Oracle calls "partitions" and later on > > > something with indexes. The idea is to generate some discussion to see if > > > they are worthy of being added to the TODO list. > > > > Why bother? > > Maybe one can put different partitions in different tablespaces? One can. The tablespace a partition is in can even be offline if Oracle can prove that a query doesn't require that partition. Matthew.
-
Re: Feature suggestions (long)
Don Baccus <dhogaza@pacifier.com> — 2003-05-17T18:42:01Z
On Saturday 17 May 2003 06:31 pm, Martijn van Oosterhout wrote: > Hmm, no comments on the UNIQUE-index-over-multiple-tables. I would have > thought that would've been the more interesting one. This would enable making PG inheritence one step closer to being truly useful, as one could enforce a primary key on a table and all its children.
-
Re: Feature suggestions (long)
Martijn van Oosterhout <kleptog@svana.org> — 2003-05-18T01:31:54Z
On Sat, May 17, 2003 at 04:04:11AM -0700, Don Baccus wrote: > On Saturday 17 May 2003 10:51 am, Matthew Kirkwood wrote: > > On Sat, 17 May 2003, Alvaro Herrera wrote: > > > > > I'm going suggest a feature like what Oracle calls "partitions" and > > > > > later on something with indexes. The idea is to generate some > > > > > discussion to see if they are worthy of being added to the TODO list. > > > > > > > > Why bother? > > > > > > Maybe one can put different partitions in different tablespaces? > > > > One can. The tablespace a partition is in can even be > > offline if Oracle can prove that a query doesn't require > > that partition. > > People use this feature for warehousing old data that they don't want to purge > from the database. For very large databases (of course that definition > changes with each new generation of computer) this can greatly improve the > performance of queries on the active portion of the data. Or can't delete (eg. financial records). The reason I'm looking at it is for queries where you want a report depending on all the data for 2002. Since this data is mixed in with all the data for upto 7 previous years, it's too big for a index scan but doing a seq. scan across the whole table is very expensive. What this buys you is being able to seq. scan over portions of a table, rather than the whole table. Currently we manage this manually by moving tuples around after the fact. Hmm, no comments on the UNIQUE-index-over-multiple-tables. I would have thought that would've been the more interesting one. Have a nice weekend. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > "the West won the world not by the superiority of its ideas or values or > religion but rather by its superiority in applying organized violence. > Westerners often forget this fact, non-Westerners never do." > - Samuel P. Huntington
-
Re: Feature suggestions (long)
Martijn van Oosterhout <kleptog@svana.org> — 2003-05-18T01:56:29Z
On Sat, May 17, 2003 at 11:42:01AM -0700, Don Baccus wrote: > On Saturday 17 May 2003 06:31 pm, Martijn van Oosterhout wrote: > > > Hmm, no comments on the UNIQUE-index-over-multiple-tables. I would have > > thought that would've been the more interesting one. > > This would enable making PG inheritence one step closer to being truly useful, > as one could enforce a primary key on a table and all its children. Actually, what I described in my original post was a bit more general than just UNIQUE indexes. From what I wrote it doesn't seem as if it'd be too complicated (though I won't pretend to fully understand the index code). But there must be a few hurdles to overcome, which is why I expected someone to point out what I'd miseed. Have a nice weekend, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > "the West won the world not by the superiority of its ideas or values or > religion but rather by its superiority in applying organized violence. > Westerners often forget this fact, non-Westerners never do." > - Samuel P. Huntington
-
Re: Feature suggestions (long)
Mark Kirkwood <markir@paradise.net.nz> — 2003-05-18T05:02:39Z
Matthew Kirkwood wrote: >On Sat, 17 May 2003, Alvaro Herrera wrote: > > > >>>>I'm going suggest a feature like what Oracle calls "partitions" and later on >>>>something with indexes. The idea is to generate some discussion to see if >>>>they are worthy of being added to the TODO list. >>>> >>>> >>>Why bother? >>> >>> >>Maybe one can put different partitions in different tablespaces? >> >> > >One can. The tablespace a partition is in can even be >offline if Oracle can prove that a query doesn't require >that partition. > > Being able to "segment" tables for admin purposes (like archiving) might be beneficial. I guess this could be orthogonal to clustering or partitioning schemes that mininize table/index access for queries. (Oracle has sort of rolled these ideas together with their partitioning) Mark
-
Re: Feature suggestions (long)
Martijn van Oosterhout <kleptog@svana.org> — 2003-05-18T06:21:55Z
On Sun, May 18, 2003 at 05:02:39PM +1200, Mark Kirkwood wrote: > Being able to "segment" tables for admin purposes (like archiving) might > be beneficial. > > I guess this could be orthogonal to clustering or partitioning schemes > that mininize table/index access for queries. Segmenting tables is what inheritence can buy you now (although somewhat suboptimally). You can create subtables of your super-table and just remember which table is supposed to contain which tuples. By adding CHECK constraints you can even make the system error out on you if you make a mistake. Segmenting tables for archiving is only useful if the planner rarely uses them. If the planner is still going to generate a seq. scan on the "archive" every time you query the table, then it's not really archived in any sense. Which bring another question. If I have a CHECK constraint on a table saying 'id < 5000' and I do a query 'WHERE id = 6000', is the query optimiser clever enough to notice that there can be no matches. Preliminary testing indicates no (although I see it's 7.2). > (Oracle has sort of rolled these ideas together with their partitioning) My guess is because segmenting without the planner cleverness is a bit useless. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > "the West won the world not by the superiority of its ideas or values or > religion but rather by its superiority in applying organized violence. > Westerners often forget this fact, non-Westerners never do." > - Samuel P. Huntington
-
Re: Feature suggestions (long)
Greg Stark <gsstark@mit.edu> — 2003-05-18T13:24:06Z
Tom Lane <tgl@sss.pgh.pa.us> writes: > Martijn van Oosterhout <kleptog@svana.org> writes: > > I'm going suggest a feature like what Oracle calls "partitions" and later on > > something with indexes. The idea is to generate some discussion to see if > > they are worthy of being added to the TODO list. > > Why bother? Make partial indexes corresponding to what you are calling > the partitions of the table, and (I claim) you can get every possible > benefit of a partitioning scheme. Plus more, because there's nothing > constraining the partial indexes to be nonoverlapping, so you can get > efficient plans for sets of queries that no partitioning scheme would > win for. Partial indexes get some of the advantages of partitioned tables, but certainly not all. A big advantage of partitioned tables is being able to manage the data in whole chunks very efficiently. . Archive all records for a single year becomes a seqential scan instead of an index scan. . Delete all records for a single year becomes a truncate table instead of an update. This will become even more important when postgres gets some other features like being able to set the physical storage location of each table and the equivalent of what oracle calls "transportable tablespaces". Then you can move individual partitions to slow read-only media and keep the currently active partition on the fast read-write media. Or back up the old partition and drop it but load it on the DSS system using a simple binary copy. This isn't theoretical. I've done exactly this before. We had a table that grew by about a million records per day. When we used dml statements to archive the old records to the DSS system it took over a day and frequently failed. When we switched to partitioned tables we were able to run it reliably daily during prime time without impacting performance and have up-to-date data in the DSS system. -- greg
-
Re: Feature suggestions (long)
Mark Kirkwood <markir@paradise.net.nz> — 2003-05-19T07:55:07Z
Martijn van Oosterhout wrote: >>(Oracle has sort of rolled these ideas together with their partitioning) >> >> > >My guess is because segmenting without the planner cleverness is a bit >useless. > Yes! you want the planner to be *aware* of any "segmenting", but there are other was to reduce the "amount of table" that is scanned (e.g. partial indexes and clustering), so the "Oracle solution" might not be the only (or best) way to approach : 1 query plans for large tables 2 administration of large tables cheers Mark
-
Re: Feature suggestions (long)
Jim Nasby <jim@nasby.net> — 2003-05-19T19:10:35Z
On Sat, May 17, 2003 at 04:04:11AM -0700, Don Baccus wrote: > On Saturday 17 May 2003 10:51 am, Matthew Kirkwood wrote: > > On Sat, 17 May 2003, Alvaro Herrera wrote: > > > > > I'm going suggest a feature like what Oracle calls "partitions" and > > > > > later on something with indexes. The idea is to generate some > > > > > discussion to see if they are worthy of being added to the TODO list. > > > > > > > > Why bother? > > > > > > Maybe one can put different partitions in different tablespaces? > > > > One can. The tablespace a partition is in can even be > > offline if Oracle can prove that a query doesn't require > > that partition. > > People use this feature for warehousing old data that they don't want to purge > from the database. For very large databases (of course that definition > changes with each new generation of computer) this can greatly improve the > performance of queries on the active portion of the data. This would be very useful for http://stats.distributed.net. Right now I'm trying to 'partition' our main (120M row) table by project_id. Sybase handled this very well, because it has good support for clustered indexes/index organized tables. PGSQL lacks this, which means large reads from that table are much more painful than on sybase, because you either tablescan (ugh) or go though the overhead of an index read for every row retrieved (not much better). Partitioning would fit the bill here perfectly (though in my case good/true clustering would work just as well). -- Jim C. Nasby (aka Decibel!) jim@nasby.net Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"