Thread

Re: Feature suggestions (long)

Don Baccus <dhogaza@pacifier.com> — 2003-05-17T11:04:11Z

On Saturday 17 May 2003 10:51 am, Matthew Kirkwood wrote:
> On Sat, 17 May 2003, Alvaro Herrera wrote:
> > > > I'm going suggest a feature like what Oracle calls "partitions" and
> > > > later on something with indexes. The idea is to generate some
> > > > discussion to see if they are worthy of being added to the TODO list.
> > >
> > > Why bother?
> >
> > Maybe one can put different partitions in different tablespaces?
>
> One can.  The tablespace a partition is in can even be
> offline if Oracle can prove that a query doesn't require
> that partition.

People use this feature for warehousing old data that they don't want to purge 
from the database.   For very large databases (of course that definition 
changes with each new generation of computer) this can greatly improve the 
performance of queries on the active portion of the data.

Feature suggestions (long)

Martijn van Oosterhout <kleptog@svana.org> — 2003-05-17T15:00:40Z

[Please CC any replies to me. Thanks.]

I'm going suggest a feature like what Oracle calls "partitions" and later on
something with indexes. The idea is to generate some discussion to see if
they are worthy of being added to the TODO list.

Partitions
==========
The idea is to split data across tables according to the contents of a
tuple. So you could split the contents of a table based on years or some
such. I'm not going to go too much into why it's a good idea, there are
several links on the web discussing it:

http://www.nyoug.org/200212nanda.pdf

Anyway, what I'm proposing is different from what Oracle does but flexible
enough such that what Oracle does could the implemented/emulated easily
enough. I was thinking along the lines of:

CREATE PARTITION sales_2003 ON sales WHERE saledate >= '2003-01-01' AND saledate < '2004-01-01';

This would produce a sequence of events equivalent to:

CREATE TABLE sales_2003 () INHERITS ( sales );
INSERT INTO sales_2003 SELECT * FROM sales WHERE <condition>;
DELETE FROM sales WHERE <condition>;

CREATE RULE rule1 AS ON INSERT TO sales 
WHERE <condition> DO INSTEAD 
INSERT INTO sales_2003 (NEW.*);

CREATE RULE rule2 AS ON DELETE TO sales
WHERE <condition> DO INSTEAD
DELETE FROM sales_2003 WHERE <???>;

<a pile more RULEs>

err... It gets a bit complex here. My point is that it can be done but with
proper support it can be done more efficiently, reliably and usably.

Internally you could treat them more like real inherited tables. Inserts
would be fairly straight forward (use the conditions to determine the
resulting table). Deletes require no work at all. Updates are trickier. The
problem is that after an update the tuple may end up needing to be in
another table. Adding/dropping/altering columns would have to be handled
similarly to inherited tables.

Looking at heap_update or ExecUpdate it's not clear whether it would be a
problem if the old and new tuples are in a separate tables. As long as the
descriptors are the same it shouldn't be too much of a deal hopefully.

So far the storage doesn't seem too difficult. I'm not sure how to deal with
inheritance and this. I'd suggest making partitions work only on individual
tables and not inherited.

Next stage would be teaching the planner. The conditions would be
pseudo-constraints on the partitions. Hence if the conditions and the
constraints form a non-intersecting set, you can skip that partition
altogether.

Now, semantically, should you be allowed to do inserts, updates and deletes
on partitions directly, or should they be hidden? If/when we get
table-spaces, you should be able to move them around.

Deleting a partition would amount to moving all the tuples back to the main
table. Of course, if there are other partitions the conditions will have to
be re-evaluated to place the tuple into one of the other partitions.

What do you do if multiple partitions have conflicting conditions? I'd be
tempted to do a first-come-first-serve basis, though maybe some kind of
priority? One advantage of the Oracle approach is that this issue doesn't
come up. We certainly can't write a system to detect anything but the
obvious cases.

Of course, making individual indexes for all the partitions could get
tedious, even if the system did it automatically; which me brings to my next
suggestion:

Multi-table indexes
===================

Currently an index only applies to a single table. If you use inheritance
all the indexes can get cumbersome. With something like the above partitions
it could get completely out of hand if used heavily. What I'd like is if
instead of an index mapping <index keys> -> ctid, it does <index keys> ->
(tableoid,ctid). At the same time it lets us build UNIQUE indexes for
inheritance (for primary keys).

Currently, an index is represented by a row in pg_index. To implement this
you would need to arrange a way for an index to list all its constituent
tables so that when doing an update you know which to indexes to update.
Multiple rows seems the easiest way but it does violate a uniqueness
constraint.

Obviously, the columns indexed can only be present in the root table.
However, the index would be usable for queries on subtables too. The planner
would simply have to remember to push down 'tableoid IN (oid,oid,oid,...)'
conditions.

The major problem here is statistics. What you really need is some sort of
aggregate statistics over an entire inheritance tree. Without something like
this the planner can't make sensible decisions on when to use the index.
ANALYZE currently doesn't do this. It's not entirely clear how it would go
about it either. This doesn't apply to UNIQUE indexes though.

Another issue is when a table is deleted. Normally you can just purge the
index when the table is deleted. With these indexes you can't since the
remaining data is still useful. So the index access method would have to be
careful not to return tuples from tables that don't exist anymore. VACUUM
would hopefully eventually clean them up.

There would also be an issue of the IndexScan node returning the right
fields depending on which part of the subtree is being queried. For example,
if B were inherited from A then an IndexScan on A* could return fields not
in A if the query only ever referenced B. This may already be solved as
the IndexScan node can scan multiple indexes.

=

In any case, the second feature seems like it will be required (or at least
something similar) to handle proper referential integrity and primary keys
for inherited tables. But if something like the former is implemented
(hopefully, as using rules would get very cumbersome) then the latter would
be necessary.

Have a good weekend.
-- 
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "the West won the world not by the superiority of its ideas or values or
> religion but rather by its superiority in applying organised violence.
> Westerners often forget this fact, non-Westerners never do."
>   - Samuel P. Huntington

Re: Feature suggestions (long)

Tom Lane <tgl@sss.pgh.pa.us> — 2003-05-17T16:05:23Z

Martijn van Oosterhout <kleptog@svana.org> writes:
> I'm going suggest a feature like what Oracle calls "partitions" and later on
> something with indexes. The idea is to generate some discussion to see if
> they are worthy of being added to the TODO list.

Why bother?  Make partial indexes corresponding to what you are calling
the partitions of the table, and (I claim) you can get every possible
benefit of a partitioning scheme.  Plus more, because there's nothing
constraining the partial indexes to be nonoverlapping, so you can get
efficient plans for sets of queries that no partitioning scheme would
win for.

			regards, tom lane

Re: Feature suggestions (long)

Alvaro Herrera <alvherre@dcc.uchile.cl> — 2003-05-17T16:47:20Z

On Sat, May 17, 2003 at 12:05:23PM -0400, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > I'm going suggest a feature like what Oracle calls "partitions" and later on
> > something with indexes. The idea is to generate some discussion to see if
> > they are worthy of being added to the TODO list.
> 
> Why bother?

Maybe one can put different partitions in different tablespaces?

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Essentially, you're proposing Kevlar shoes as a solution for the problem
that you want to walk around carrying a loaded gun aimed at your foot.
(Tom Lane)

Re: Feature suggestions (long)

Matthew Kirkwood <matthew@hairy.beasts.org> — 2003-05-17T17:51:48Z

On Sat, 17 May 2003, Alvaro Herrera wrote:

> > > I'm going suggest a feature like what Oracle calls "partitions" and later on
> > > something with indexes. The idea is to generate some discussion to see if
> > > they are worthy of being added to the TODO list.
> >
> > Why bother?
>
> Maybe one can put different partitions in different tablespaces?

One can.  The tablespace a partition is in can even be
offline if Oracle can prove that a query doesn't require
that partition.

Matthew.

Re: Feature suggestions (long)

Don Baccus <dhogaza@pacifier.com> — 2003-05-17T18:42:01Z

On Saturday 17 May 2003 06:31 pm, Martijn van Oosterhout wrote:

> Hmm, no comments on the UNIQUE-index-over-multiple-tables. I would have
> thought that would've been the more interesting one.

This would enable making PG inheritence one step closer to being truly useful, 
as one could enforce a primary key on a table and all its children.

Re: Feature suggestions (long)

Martijn van Oosterhout <kleptog@svana.org> — 2003-05-18T01:31:54Z

On Sat, May 17, 2003 at 04:04:11AM -0700, Don Baccus wrote:
> On Saturday 17 May 2003 10:51 am, Matthew Kirkwood wrote:
> > On Sat, 17 May 2003, Alvaro Herrera wrote:
> > > > > I'm going suggest a feature like what Oracle calls "partitions" and
> > > > > later on something with indexes. The idea is to generate some
> > > > > discussion to see if they are worthy of being added to the TODO list.
> > > >
> > > > Why bother?
> > >
> > > Maybe one can put different partitions in different tablespaces?
> >
> > One can.  The tablespace a partition is in can even be
> > offline if Oracle can prove that a query doesn't require
> > that partition.
> 
> People use this feature for warehousing old data that they don't want to purge 
> from the database.   For very large databases (of course that definition 
> changes with each new generation of computer) this can greatly improve the 
> performance of queries on the active portion of the data.

Or can't delete (eg. financial records). The reason I'm looking at it is for
queries where you want a report depending on all the data for 2002. Since
this data is mixed in with all the data for upto 7 previous years, it's too
big for a index scan but doing a seq. scan across the whole table is
very expensive.

What this buys you is being able to seq. scan over portions of a table,
rather than the whole table. Currently we manage this manually by moving
tuples around after the fact.

Hmm, no comments on the UNIQUE-index-over-multiple-tables. I would have
thought that would've been the more interesting one.

Have a nice weekend.
-- 
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "the West won the world not by the superiority of its ideas or values or
> religion but rather by its superiority in applying organized violence.
> Westerners often forget this fact, non-Westerners never do."
>   - Samuel P. Huntington

Re: Feature suggestions (long)

Martijn van Oosterhout <kleptog@svana.org> — 2003-05-18T01:56:29Z

On Sat, May 17, 2003 at 11:42:01AM -0700, Don Baccus wrote:
> On Saturday 17 May 2003 06:31 pm, Martijn van Oosterhout wrote:
> 
> > Hmm, no comments on the UNIQUE-index-over-multiple-tables. I would have
> > thought that would've been the more interesting one.
> 
> This would enable making PG inheritence one step closer to being truly useful, 
> as one could enforce a primary key on a table and all its children.

Actually, what I described in my original post was a bit more general than
just UNIQUE indexes. From what I wrote it doesn't seem as if it'd be too
complicated (though I won't pretend to fully understand the index code).

But there must be a few hurdles to overcome, which is why I expected someone
to point out what I'd miseed.

Have a nice weekend,
-- 
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "the West won the world not by the superiority of its ideas or values or
> religion but rather by its superiority in applying organized violence.
> Westerners often forget this fact, non-Westerners never do."
>   - Samuel P. Huntington

Re: Feature suggestions (long)

Mark Kirkwood <markir@paradise.net.nz> — 2003-05-18T05:02:39Z


Matthew Kirkwood wrote:

>On Sat, 17 May 2003, Alvaro Herrera wrote:
>
>  
>
>>>>I'm going suggest a feature like what Oracle calls "partitions" and later on
>>>>something with indexes. The idea is to generate some discussion to see if
>>>>they are worthy of being added to the TODO list.
>>>>        
>>>>
>>>Why bother?
>>>      
>>>
>>Maybe one can put different partitions in different tablespaces?
>>    
>>
>
>One can.  The tablespace a partition is in can even be
>offline if Oracle can prove that a query doesn't require
>that partition.
>  
>

Being able to "segment" tables for admin purposes (like archiving) might 
be beneficial.

I guess this could be orthogonal to clustering or partitioning schemes 
that mininize table/index access for queries.

(Oracle has sort of rolled these ideas together with their partitioning)

Mark

Re: Feature suggestions (long)

Martijn van Oosterhout <kleptog@svana.org> — 2003-05-18T06:21:55Z

On Sun, May 18, 2003 at 05:02:39PM +1200, Mark Kirkwood wrote:
> Being able to "segment" tables for admin purposes (like archiving) might 
> be beneficial.
> 
> I guess this could be orthogonal to clustering or partitioning schemes 
> that mininize table/index access for queries.

Segmenting tables is what inheritence can buy you now (although somewhat
suboptimally). You can create subtables of your super-table and just
remember which table is supposed to contain which tuples. By adding CHECK
constraints you can even make the system error out on you if you make a
mistake.

Segmenting tables for archiving is only useful if the planner rarely uses
them. If the planner is still going to generate a seq. scan on the "archive"
every time you query the table, then it's not really archived in any sense.

Which bring another question. If I have a CHECK constraint on a table saying
'id < 5000' and I do a query 'WHERE id = 6000', is the query optimiser
clever enough to notice that there can be no matches. Preliminary testing
indicates no (although I see it's 7.2).

> (Oracle has sort of rolled these ideas together with their partitioning)

My guess is because segmenting without the planner cleverness is a bit
useless.
-- 
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "the West won the world not by the superiority of its ideas or values or
> religion but rather by its superiority in applying organized violence.
> Westerners often forget this fact, non-Westerners never do."
>   - Samuel P. Huntington

Re: Feature suggestions (long)

Greg Stark <gsstark@mit.edu> — 2003-05-18T13:24:06Z

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Martijn van Oosterhout <kleptog@svana.org> writes:
> > I'm going suggest a feature like what Oracle calls "partitions" and later on
> > something with indexes. The idea is to generate some discussion to see if
> > they are worthy of being added to the TODO list.
> 
> Why bother?  Make partial indexes corresponding to what you are calling
> the partitions of the table, and (I claim) you can get every possible
> benefit of a partitioning scheme.  Plus more, because there's nothing
> constraining the partial indexes to be nonoverlapping, so you can get
> efficient plans for sets of queries that no partitioning scheme would
> win for.

Partial indexes get some of the advantages of partitioned tables, but
certainly not all. 

A big advantage of partitioned tables is being able to manage the data in
whole chunks very efficiently.

. Archive all records for a single year becomes a seqential scan instead of an
  index scan.

. Delete all records for a single year becomes a truncate table instead of an
  update.

This will become even more important when postgres gets some other features
like being able to set the physical storage location of each table and the
equivalent of what oracle calls "transportable tablespaces". 

Then you can move individual partitions to slow read-only media and keep the
currently active partition on the fast read-write media. Or back up the old
partition and drop it but load it on the DSS system using a simple binary
copy.

This isn't theoretical. I've done exactly this before. We had a table that
grew by about a million records per day. When we used dml statements to
archive the old records to the DSS system it took over a day and frequently
failed. When we switched to partitioned tables we were able to run it reliably
daily during prime time without impacting performance and have up-to-date data
in the DSS system.

-- 
greg

Re: Feature suggestions (long)

Mark Kirkwood <markir@paradise.net.nz> — 2003-05-19T07:55:07Z


Martijn van Oosterhout wrote:

>>(Oracle has sort of rolled these ideas together with their partitioning)
>>    
>>
>
>My guess is because segmenting without the planner cleverness is a bit
>useless.
>
Yes!  you want the planner to be *aware* of any "segmenting", but there 
are other was to reduce the "amount of table" that is scanned (e.g. 
partial indexes and clustering), so the "Oracle solution" might not be 
the only  (or best) way to approach :

1   query plans for large tables
2   administration of large tables

cheers

Mark

Re: Feature suggestions (long)

Jim Nasby <jim@nasby.net> — 2003-05-19T19:10:35Z

On Sat, May 17, 2003 at 04:04:11AM -0700, Don Baccus wrote:
> On Saturday 17 May 2003 10:51 am, Matthew Kirkwood wrote:
> > On Sat, 17 May 2003, Alvaro Herrera wrote:
> > > > > I'm going suggest a feature like what Oracle calls "partitions" and
> > > > > later on something with indexes. The idea is to generate some
> > > > > discussion to see if they are worthy of being added to the TODO list.
> > > >
> > > > Why bother?
> > >
> > > Maybe one can put different partitions in different tablespaces?
> >
> > One can.  The tablespace a partition is in can even be
> > offline if Oracle can prove that a query doesn't require
> > that partition.
> 
> People use this feature for warehousing old data that they don't want to purge 
> from the database.   For very large databases (of course that definition 
> changes with each new generation of computer) this can greatly improve the 
> performance of queries on the active portion of the data.
 
This would be very useful for http://stats.distributed.net. Right now
I'm trying to 'partition' our main (120M row) table by project_id.
Sybase handled this very well, because it has good support for
clustered indexes/index organized tables. PGSQL lacks this, which means
large reads from that table are much more painful than on sybase,
because you either tablescan (ugh) or go though the overhead of an index
read for every row retrieved (not much better).

Partitioning would fit the bill here perfectly (though in my case
good/true clustering would work just as well).
-- 
Jim C. Nasby (aka Decibel!)                    jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"