Re: Big 7.1 open items

Jan Wieck <janwieck@t-online.de>

From: JanWieck@t-online.de (Jan Wieck)
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Jan Wieck <JanWieck@Yahoo.com>, Hiroshi Inoue <Inoue@tpf.co.jp>, Bruce Momjian <maillist@candle.pha.pa.us>, PostgreSQL-development <pgsql-hackers@postgresql.org>, "Ross J. Reedstrom" <reedstrm@rice.edu>
Date: 2000-06-18T00:10:09Z
Lists: pgsql-hackers
Tom Lane wrote:
> JanWieck@t-online.de (Jan Wieck) writes:
> > Tom Lane wrote:
> >> It gets a little trickier if you want to be able to split
> >> multi-gig tables across several tablespaces, though, since
> >> you couldn't just append ".N" to the base table path in that
> >> scenario.
> >>
> >> I'd be interested to know what sort of facilities Oracle
> >> provides for managing huge tables...
>
> >     Oracle  tablespaces  are  a  collection of 1...n preallocated
> >     files.   Each  table  then  is  bound  to  a  tablespace  and
> >     allocates extents (chunks) from those files.
>
> OK, to get back to the point here: so in Oracle, tables can't cross
> tablespace boundaries, but a tablespace itself could span multiple
> disks?

    They can. The path in

        ALTER TABLESPACE <tsname> ADD DATAFILE ...

    can point to any location the db system has access to.

>
> Not sure if I like that better or worse than equating a tablespace
> with a directory (so, presumably, all the files within it live on
> one filesystem) and then trying to make tables able to span
> tablespaces.  We will need to do one or the other though, if we want
> to have any significant improvement over the current state of affairs
> for large tables.
>
> One way is to play the flip-the-path-ordering game some more,
> and access multiple-segment tables with pathnames like this:
>
>    .../TABLESPACE/RELATION       -- first or only segment
>    .../TABLESPACE/N/RELATION     -- N'th extension segment
>
> [...]

    In most cases all objects in one database are bound to one or
    two tablespaces (data and indices). So you do  an  estimation
    of  the  size  required, create the tablespaces (and probably
    all their extension files), then create the schema  and  load
    it.  The  only reason not to do so is if your DB exceeds some
    size where you have to fear not beeing able to finish  online
    backups  before  getting into Online-Relolog stuck. Has to do
    the the online backup procedure of Oracle.

> This isn't any harder for md.c to deal with than what we do now,
> but by making the /N subdirectories be symlinks, the dbadmin could
> easily arrange for extension segments to go on different filesystems.
> Also, since /N subdirectory symlinks can be added as needed,
> expanding available space by attaching more disks isn't hard.
> (If the admin hasn't pre-made a /N symlink when it's needed,
> I'd envision the backend just automatically creating a plain
> subdirectory so that it can extend the table.)

    So the admin allways have to leave enough  freespace  in  the
    default  location to keep the DB running until he can take it
    offline, move the autocreated files and create the  symlinks.
    What a pain for 24/7 systems.

> We'd still want to create some tools to help the dbadmin with slinging
> all these symlinks around, of course.  But I think it's critical to keep
> the low-level file access protocol simple and reliable, which really
> means minimizing the amount of information the backend needs to know to
> figure out which file to write a page in.  With something like the above
> you only need to know the tablespace name (or more likely OID), the
> relation OID (+name or not, depending on outcome of other argument),
> and the offset in the table.  No worse than now from the software's
> point of view.

    Exactly  the  "low-level  file  access"  protocol  is  highly
    complicated in Postgres. Because nearly  every  object  needs
    his  own file, we need to deal with virtual file descriptors.
    With an Oracle-like tablespace concept and a fixed  limit  of
    total   tablespace   files  (this  time  OS  or  installation
    specific), we could keep them all open all the time.  IMHO  a
    big win.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #