Thread

  1. Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-13T09:05:53Z

    Here is the list I have gotten of open 7.1 items:
    	
    	bit type
    	inheritance
    	drop column
    	vacuum index speed
    	cached query plans
    	memory context cleanup
    	TOAST
    	WAL
    	fmgr redesign
    	encrypt pg_shadow passwords
    	redesign pg_hba.conf password file option
    	new location for config files
    
    I have some of my own that are not on the list, as do others who are
    working on their own items.  Just thought a list of major items that
    need work would be helpful.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  2. Re: Big 7.1 open items

    Karel Zak <zakkr@zf.jcu.cz> — 2000-06-13T09:28:37Z

    On Tue, 13 Jun 2000, Bruce Momjian wrote:
    
    > Here is the list I have gotten of open 7.1 items:
    > 	
    > 	bit type
    > 	inheritance
    > 	drop column
    > 	vacuum index speed
    > 	cached query plans
    	^^^^^^^^^^^^^^^^^
    
    I have already down it and I send patch for _testing_ next week (or
    later), but I think that not will for 7.1, but 7.2.
    
    > 	memory context cleanup
    > 	TOAST
    > 	WAL
    > 	fmgr redesign
    > 	encrypt pg_shadow passwords
    > 	redesign pg_hba.conf password file option
    > 	new location for config files
    
    	+ new ACL? (please :-)
    
    
    BTW. --- really cool list.
    
    						Karel
    
    
    
  3. Re: Big 7.1 open items

    Vince Vielhaber <vev@michvhf.com> — 2000-06-13T10:19:12Z

    On Tue, 13 Jun 2000, Bruce Momjian wrote:
    
    > Here is the list I have gotten of open 7.1 items:
    > 	
    > 	encrypt pg_shadow passwords
    
    This will be for 7.1?  For some reason I thought it was being pushed 
    off to 7.2.
    
    Vince.
    -- 
    ==========================================================================
    Vince Vielhaber -- KA8CSH    email: vev@michvhf.com    http://www.pop4.net
     128K ISDN from $22.00/mo - 56K Dialup from $16.00/mo at Pop4 Networking
            Online Campground Directory    http://www.camping-usa.com
           Online Giftshop Superstore    http://www.cloudninegifts.com
    ==========================================================================
    
    
    
    
    
  4. Re: Big 7.1 open items

    Sergio A. Kessler <sak@tribctas.gba.gov.ar> — 2000-06-13T13:00:47Z

    Bruce Momjian <pgman@candle.pha.pa.us> el día Tue, 13 Jun 2000 05:05:53 
    -0400 (EDT), escribió:
    
    [...]
    >	new location for config files
    
    can I suggest /etc/postgresql ?
    
    
    sergio
    
    
    
  5. Re: Big 7.1 open items

    Peter Eisentraut <e99re41@docs.uu.se> — 2000-06-13T13:06:11Z

    On Tue, 13 Jun 2000, Bruce Momjian wrote:
    
    > Here is the list I have gotten of open 7.1 items:
    > 	
    > 	bit type
    > 	inheritance
    > 	drop column
    > 	vacuum index speed
    > 	cached query plans
    > 	memory context cleanup
    > 	TOAST
    > 	WAL
    > 	fmgr redesign
    > 	encrypt pg_shadow passwords
    > 	redesign pg_hba.conf password file option
    
    Any details?
    
    > 	new location for config files
    
    Are you referring to pushing internal files to `$PGDATA/global'?
    
    
    -- 
    Peter Eisentraut                  Sernanders väg 10:115
    peter_e@gmx.net                   75262 Uppsala
    http://yi.org/peter-e/            Sweden
    
    
    
  6. Re: Big 7.1 open items

    Marc G. Fournier <scrappy@hub.org> — 2000-06-13T13:52:57Z

    On Tue, 13 Jun 2000, Sergio A. Kessler wrote:
    
    > Bruce Momjian <pgman@candle.pha.pa.us> el da Tue, 13 Jun 2000 05:05:53 
    > -0400 (EDT), escribi:
    > 
    > [...]
    > >	new location for config files
    > 
    > can I suggest /etc/postgresql ?
    
    you can ... but everything related to postgresql has always been designed
    not to require any special permissions to install, and /etc/postgresql
    would definitely require root access to install :(
    
    
    
    
  7. Re: Big 7.1 open items

    Vince Vielhaber <vev@michvhf.com> — 2000-06-13T13:59:42Z

    On Tue, 13 Jun 2000, The Hermit Hacker wrote:
    
    > On Tue, 13 Jun 2000, Sergio A. Kessler wrote:
    > 
    > > Bruce Momjian <pgman@candle.pha.pa.us> el da Tue, 13 Jun 2000 05:05:53 
    > > -0400 (EDT), escribi:
    > > 
    > > [...]
    > > >	new location for config files
    > > 
    > > can I suggest /etc/postgresql ?
    > 
    > you can ... but everything related to postgresql has always been designed
    > not to require any special permissions to install, and /etc/postgresql
    > would definitely require root access to install :(
    
    ~postgres/etc ??
    
    Vince.
    -- 
    ==========================================================================
    Vince Vielhaber -- KA8CSH    email: vev@michvhf.com    http://www.pop4.net
     128K ISDN from $22.00/mo - 56K Dialup from $16.00/mo at Pop4 Networking
            Online Campground Directory    http://www.camping-usa.com
           Online Giftshop Superstore    http://www.cloudninegifts.com
    ==========================================================================
    
    
    
    
    
  8. Re: Big 7.1 open items

    Peter Eisentraut <e99re41@docs.uu.se> — 2000-06-13T14:01:11Z

    On Tue, 13 Jun 2000, Vince Vielhaber wrote:
    
    > > > >	new location for config files
    > > > 
    > > > can I suggest /etc/postgresql ?
    > > 
    > > you can ... but everything related to postgresql has always been designed
    > > not to require any special permissions to install, and /etc/postgresql
    > > would definitely require root access to install :(
    > 
    > ~postgres/etc ??
    
    You need root access to create a postgres user. What's wrong with just
    keeping it in $PGDATA and making symlinks whereever you would prefer it?
    
    -- 
    Peter Eisentraut                  Sernanders väg 10:115
    peter_e@gmx.net                   75262 Uppsala
    http://yi.org/peter-e/            Sweden
    
    
    
  9. Re: Big 7.1 open items

    Marc G. Fournier <scrappy@hub.org> — 2000-06-13T14:31:33Z

    that one works ...
    
    On Tue, 13 Jun 2000, Vince Vielhaber wrote:
    
    > On Tue, 13 Jun 2000, The Hermit Hacker wrote:
    > 
    > > On Tue, 13 Jun 2000, Sergio A. Kessler wrote:
    > > 
    > > > Bruce Momjian <pgman@candle.pha.pa.us> el da Tue, 13 Jun 2000 05:05:53 
    > > > -0400 (EDT), escribi:
    > > > 
    > > > [...]
    > > > >	new location for config files
    > > > 
    > > > can I suggest /etc/postgresql ?
    > > 
    > > you can ... but everything related to postgresql has always been designed
    > > not to require any special permissions to install, and /etc/postgresql
    > > would definitely require root access to install :(
    > 
    > ~postgres/etc ??
    > 
    > Vince.
    > -- 
    > ==========================================================================
    > Vince Vielhaber -- KA8CSH    email: vev@michvhf.com    http://www.pop4.net
    >  128K ISDN from $22.00/mo - 56K Dialup from $16.00/mo at Pop4 Networking
    >         Online Campground Directory    http://www.camping-usa.com
    >        Online Giftshop Superstore    http://www.cloudninegifts.com
    > ==========================================================================
    > 
    > 
    > 
    
    Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
    Systems Administrator @ hub.org 
    primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 
    
    
    
  10. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-13T14:50:41Z

    Vince Vielhaber <vev@michvhf.com> writes:
    >> encrypt pg_shadow passwords
    
    > This will be for 7.1?  For some reason I thought it was being pushed 
    > off to 7.2.
    
    I don't know of anything that would force delaying it --- it's not
    dependent on querytree redesign, for example.  The real question is,
    do we have anyone who's committed to do the work?  I heard a lot of
    discussion but I didn't hear anyone taking responsibility for it...
    
    			regards, tom lane
    
    
  11. Re: Big 7.1 open items

    Vince Vielhaber <vev@michvhf.com> — 2000-06-13T14:54:27Z

    On Tue, 13 Jun 2000, Tom Lane wrote:
    
    > Vince Vielhaber <vev@michvhf.com> writes:
    > >> encrypt pg_shadow passwords
    > 
    > > This will be for 7.1?  For some reason I thought it was being pushed 
    > > off to 7.2.
    > 
    > I don't know of anything that would force delaying it --- it's not
    > dependent on querytree redesign, for example.  The real question is,
    > do we have anyone who's committed to do the work?  I heard a lot of
    > discussion but I didn't hear anyone taking responsibility for it...
    
    I offered to do the work and I have the md5 routine here and tested on 
    a number of platforms.  But as I said, I thought someone wanted to delay
    it until 7.2, if that's not the case then I'll get to it.  There was also
    a lack of interest in testing it, but I think we have most platforms 
    covered.
    
    Vince.
    -- 
    ==========================================================================
    Vince Vielhaber -- KA8CSH    email: vev@michvhf.com    http://www.pop4.net
     128K ISDN from $22.00/mo - 56K Dialup from $16.00/mo at Pop4 Networking
            Online Campground Directory    http://www.camping-usa.com
           Online Giftshop Superstore    http://www.cloudninegifts.com
    ==========================================================================
    
    
    
    
    
  12. Re: Big 7.1 open items

    Vince Vielhaber <vev@michvhf.com> — 2000-06-13T14:59:02Z

    On Tue, 13 Jun 2000, Ed Loehr wrote:
    
    > Vince Vielhaber wrote:
    > > 
    > > > > [...]
    > > > > >   new location for config files
    > > > >
    > > > > can I suggest /etc/postgresql ?
    > > >
    > > > you can ... but everything related to postgresql has always been designed
    > > > not to require any special permissions to install, and /etc/postgresql
    > > > would definitely require root access to install :(
    > > 
    > > ~postgres/etc ??
    > 
    > I would suggest you don't *require* or assume the creation of a postgres
    > user, except as an overridable default.
    
    I *knew* somebody would bring this up.  Before I sent that I tried to 
    describe the intent a few ways and just opted for simple.  PostgreSQL
    has to run as SOMEONE.  Substitute that SOMEONE for ~postgres above.
    
    Vince.
    -- 
    ==========================================================================
    Vince Vielhaber -- KA8CSH    email: vev@michvhf.com    http://www.pop4.net
     128K ISDN from $22.00/mo - 56K Dialup from $16.00/mo at Pop4 Networking
            Online Campground Directory    http://www.camping-usa.com
           Online Giftshop Superstore    http://www.cloudninegifts.com
    ==========================================================================
    
    
    
    
    
  13. Re: Big 7.1 open items

    Ed Loehr <eloehr@austin.rr.com> — 2000-06-13T15:00:52Z

    Vince Vielhaber wrote:
    > 
    > > > [...]
    > > > >   new location for config files
    > > >
    > > > can I suggest /etc/postgresql ?
    > >
    > > you can ... but everything related to postgresql has always been designed
    > > not to require any special permissions to install, and /etc/postgresql
    > > would definitely require root access to install :(
    > 
    > ~postgres/etc ??
    
    I would suggest you don't *require* or assume the creation of a postgres
    user, except as an overridable default.
    
    Regards,
    Ed Loehr
    
    
  14. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-13T15:03:39Z

    The Hermit Hacker <scrappy@hub.org> writes:
    >>>> new location for config files
    >> 
    >> can I suggest /etc/postgresql ?
    
    > you can ... but everything related to postgresql has always been designed
    > not to require any special permissions to install, and /etc/postgresql
    > would definitely require root access to install :(
    
    Even more to the point, the config files are always kept in the data
    directory so that it's possible to run multiple installations on the
    same machine.  Keeping the config files under /etc (or any other fixed
    location) would destroy that capability.
    
    			regards, tom lane
    
    
  15. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-13T15:16:57Z

    Vince Vielhaber <vev@michvhf.com> writes:
    >> do we have anyone who's committed to do the work?  I heard a lot of
    >> discussion but I didn't hear anyone taking responsibility for it...
    
    > I offered to do the work and I have the md5 routine here and tested on 
    > a number of platforms.  But as I said, I thought someone wanted to delay
    > it until 7.2, if that's not the case then I'll get to it.
    
    Far as I can see, you should go for it.
    
    			regards, tom lane
    
    
  16. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-13T15:53:47Z

    Bruce Momjian <pgman@candle.pha.pa.us> writes:
    > Here is the list I have gotten of open 7.1 items:
    
    There were a whole bunch of issues about the type system --- automatic
    coercion rules, default type selection for both numeric and string
    literals, etc.  Not sure how to describe this in five words or less...
    
    			regards, tom lane
    
    
  17. Re: Big 7.1 open items

    Kaare Rasmussen <kar@webline.dk> — 2000-06-13T23:36:38Z

    > Here is the list I have gotten of open 7.1 items:
    
    I thought that someone was working on
    outer joins
    better views (or rewriting the rules system, not sure what the direction was)
    better SQL92 compliance
    also, I think that at some time there was discussion about a better interface
    for procedures, enabling them to work on several tuples. May be wrong though.
    
    But if all, or just most, of the items on your list will be finished, it ought
    to be a 8.0 release :-)
    
    -- 
    Kaare Rasmussen            --Linux, spil,--        Tlf:        3816 2582
    Kaki Data                tshirts, merchandize      Fax:        3816 2582
    Howitzvej 75               ben 14.00-18.00        Email: kar@webline.dk
    2000 Frederiksberg        Lrdag 11.00-17.00       Web:      www.suse.dk
    
    
  18. Re: Big 7.1 open items

    Thomas Lockhart <lockhart@alumni.caltech.edu> — 2000-06-14T01:29:30Z

    Since there are several people interested in contributing, we should
    list:
    
      Support multiple simultaneous character sets, per SQL92
    
                     - Thomas
    
    
  19. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-14T02:24:29Z

    > > 	memory context cleanup
    > > 	TOAST
    > > 	WAL
    > > 	fmgr redesign
    > > 	encrypt pg_shadow passwords
    > > 	redesign pg_hba.conf password file option
    > > 	new location for config files
    > 
    > 	+ new ACL? (please :-)
    > 
    > 
    > BTW. --- really cool list.
    
    Updated TODO.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  20. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-14T02:24:43Z

    > 	+ new ACL? (please :-)
    
    Updated TODO.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  21. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-14T02:35:48Z

    [ Charset ISO-8859-1 unsupported, converting... ]
    > On Tue, 13 Jun 2000, Bruce Momjian wrote:
    > 
    > > Here is the list I have gotten of open 7.1 items:
    > > 	
    > > 	bit type
    > > 	inheritance
    > > 	drop column
    > > 	vacuum index speed
    > > 	cached query plans
    > > 	memory context cleanup
    > > 	TOAST
    > > 	WAL
    > > 	fmgr redesign
    > > 	encrypt pg_shadow passwords
    > > 	redesign pg_hba.conf password file option
    > 
    > Any details?
    
    I would like to remove our pg_passwd script that allows
    username/passwords to be specified in a file, change that file to lists
    of users, or allow lists of users in pg_hba.conf.
    
    
    > 
    > > 	new location for config files
    > 
    > Are you referring to pushing internal files to `$PGDATA/global'?
    
    Yes.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  22. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-14T02:38:58Z

    > that one works ...
    > 
    > On Tue, 13 Jun 2000, Vince Vielhaber wrote:
    > 
    > > On Tue, 13 Jun 2000, The Hermit Hacker wrote:
    > > 
    > > > On Tue, 13 Jun 2000, Sergio A. Kessler wrote:
    > > > 
    > > > > Bruce Momjian <pgman@candle.pha.pa.us> el da Tue, 13 Jun 2000 05:05:53 
    > > > > -0400 (EDT), escribi:
    > > > > 
    > > > > [...]
    > > > > >	new location for config files
    > > > > 
    > > > > can I suggest /etc/postgresql ?
    > > > 
    > > > you can ... but everything related to postgresql has always been designed
    > > > not to require any special permissions to install, and /etc/postgresql
    > > > would definitely require root access to install :(
    > > 
    > > ~postgres/etc ??
    
    Remember, that file has to be specific for each data tree, so it has to
    be under /data.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  23. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-14T02:40:21Z

    > Vince Vielhaber <vev@michvhf.com> writes:
    > >> encrypt pg_shadow passwords
    > 
    > > This will be for 7.1?  For some reason I thought it was being pushed 
    > > off to 7.2.
    > 
    > I don't know of anything that would force delaying it --- it's not
    > dependent on querytree redesign, for example.  The real question is,
    > do we have anyone who's committed to do the work?  I heard a lot of
    > discussion but I didn't hear anyone taking responsibility for it...
    
    Agreed.  No reason not to be in 7.1.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  24. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-14T02:44:46Z

    I just kept your e-mails.  I will make a TODO.detail mailbox with them.
    
    > Bruce Momjian <pgman@candle.pha.pa.us> writes:
    > > Here is the list I have gotten of open 7.1 items:
    > 
    > There were a whole bunch of issues about the type system --- automatic
    > coercion rules, default type selection for both numeric and string
    > literals, etc.  Not sure how to describe this in five words or less...
    > 
    > 			regards, tom lane
    > 
    
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  25. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-14T02:53:18Z

    > > Here is the list I have gotten of open 7.1 items:
    > 
    > I thought that someone was working on
    > outer joins
    > better views (or rewriting the rules system, not sure what the direction was)
    > better SQL92 compliance
    > also, I think that at some time there was discussion about a better interface
    > for procedures, enabling them to work on several tuples. May be wrong though.
    > 
    > But if all, or just most, of the items on your list will be finished, it ought
    > to be a 8.0 release :-)
    > 
    
    Most of these are planned for 7.2.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  26. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-14T02:56:17Z

    Added to TODO.
    
    > Since there are several people interested in contributing, we should
    > list:
    > 
    >   Support multiple simultaneous character sets, per SQL92
    > 
    >                  - Thomas
    > 
    
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  27. Re: Big 7.1 open items

    Oliver Elphick <olly@lfix.co.uk> — 2000-06-14T13:13:09Z

      >On Tue, 13 Jun 2000, Bruce Momjian wrote:
      >
      >> Here is the list I have gotten of open 7.1 items:
    
    Rolling back a transaction after dropping a table creates a corrupted
    database.  (Yes, I know it warns you not to do that, but users are
    fallible and sometimes just plain stupid.)  Although the system catalog
    entries are rolled back, the file on disk is permanently destroyed.
    
    I suggest that DROP TABLE in a transaction should not be allowed.
    
    
    
    -- 
    Oliver Elphick                                Oliver.Elphick@lfix.co.uk
    Isle of Wight                              http://www.lfix.co.uk/oliver
    PGP: 1024R/32B8FAA1: 97 EA 1D 47 72 3F 28 47  6B 7E 39 CC 56 E4 C1 47
    GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
                     ========================================
         "I beseech you therefore, brethren, by the mercies of 
          God, that ye present your bodies a living sacrifice, 
          holy, acceptable unto God, which is your reasonable 
          service."       Romans 12:1 
    
    
    
    
  28. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-14T15:36:20Z

    "Oliver Elphick" <olly@lfix.co.uk> writes:
    > I suggest that DROP TABLE in a transaction should not be allowed.
    
    I had actually made it do that for a short time early this year,
    and was shouted down.  On reflection I have to agree; it's too useful
    to be able to do
    
    	begin;
    	drop table foo;
    	create table foo(new schema);
    	...
    	end;
    
    You do indeed lose big if you suffer an error partway through, but
    the answer to that is to fix our file naming conventions so that we
    can support rollback of drop table.
    
    Also note the complaints we've been getting about CREATE USER not
    working inside a transaction block.  That is a case where someone
    (Peter IIRC) took the more hard-line approach of emitting an error
    instead of a warning.  I think it was not the right choice to make.
    
    			regards, tom lane
    
    
  29. Re: Big 7.1 open items

    Peter Eisentraut <peter_e@gmx.net> — 2000-06-14T16:36:25Z

    Tom Lane writes:
    
    > Also note the complaints we've been getting about CREATE USER not
    > working inside a transaction block.  That is a case where someone
    > (Peter IIRC) took the more hard-line approach of emitting an error
    > instead of a warning.  I think it was not the right choice to make.
    
    Probably. Remember that you can claim your lunch any time. :)
    
    In all truth, the problem is that the ODBC driver isn't very flexible
    about putting BEGIN/END blocks around things. Perhaps that is also
    something to look at.
    
    
    -- 
    Peter Eisentraut                  Sernanders väg 10:115
    peter_e@gmx.net                   75262 Uppsala
    http://yi.org/peter-e/            Sweden
    
    
    
  30. Re: Big 7.1 open items

    Jan Wieck <janwieck@t-online.de> — 2000-06-14T20:43:39Z

    Tom Lane wrote:
    > "Oliver Elphick" <olly@lfix.co.uk> writes:
    > > I suggest that DROP TABLE in a transaction should not be allowed.
    >
    > I had actually made it do that for a short time early this year,
    > and was shouted down.  On reflection I have to agree; it's too useful
    > to be able to do
    >
    >    begin;
    >    drop table foo;
    >    create table foo(new schema);
    >    ...
    >    end;
    >
    > You do indeed lose big if you suffer an error partway through, but
    > the answer to that is to fix our file naming conventions so that we
    > can support rollback of drop table.
    
        Belongs  IMHO  to  the  discussion  to  keep separate what is
        separate  (having  indices/toast-relations/etc.  in  separate
        directories and whatnot).
    
        I've   never   been   really   happy  with  the  file  naming
        conventions. The need of a filesystem entry to have the  same
        name of the DB object that is associated with it isn't right.
        I know, some people love to be able to  easily  identify  the
        files with ls(1). OTOH what is that good for?
    
        Well,  someone  can  easily see how big the disk footprint of
        his data is.  Whow - what an info. Anything else?
    
        Why not changing the naming to be something like this:
    
            <dbroot>/catalog_tables/pg_...
            <dbroot>/catalog_index/pg_...
            <dbroot>/user_tables/oid_...
            <dbroot>/user_index/oid_...
            <dbroot>/temp_tables/oid_...
            <dbroot>/temp_index/oid_...
            <dbroot>/toast_tables/oid_...
            <dbroot>/toast_index/oid_...
            <dbroot>/whatnot_???/...
    
        This way, it  would  be  much  easier  to  separate  all  the
        different  object types to different physical media. We would
        loose some  transparency,  but  I've  allways  wondered  what
        people  USE  that  for  (except  for  just  wanna  know). For
        convinience we could implement another  little  utility  that
        tells the object size like
    
            DESCRIBE TABLE/VIEW/whatnot <object-name>
    
        that returns the physical location and storage details of the
        object. And psql could use it to print this  info  additional
        on  the  \d commands. Would give unprivileged users access to
        this info, so be it, it's not a security issue IMHO.
    
        The subdirectory an object goes into has to be controlled  by
        the relkind. So we need to tidy up that a little too. I think
        it's worth it.
    
        The objects  storage  location  (the  bare  file)  now  would
        contain  the  OID.  So  we  avoid  naming  conflicts for temp
        tables, naming conflicts during DROP/CREATE in a  transaction
        and all the like.
    
        Comments?
    
    
    Jan
    
    --
    
    #======================================================================#
    # It's easier to get forgiveness for being wrong than for being right. #
    # Let's break this rule - forgive me.                                  #
    #================================================== JanWieck@Yahoo.com #
    
    
    
    
  31. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-14T23:13:47Z

    > Tom Lane wrote:
    > > "Oliver Elphick" <olly@lfix.co.uk> writes:
    > > > I suggest that DROP TABLE in a transaction should not be allowed.
    > >
    > > I had actually made it do that for a short time early this year,
    > > and was shouted down.  On reflection I have to agree; it's too useful
    > > to be able to do
    > >
    > >    begin;
    > >    drop table foo;
    > >    create table foo(new schema);
    > >    ...
    > >    end;
    > >
    > > You do indeed lose big if you suffer an error partway through, but
    > > the answer to that is to fix our file naming conventions so that we
    > > can support rollback of drop table.
    > 
    >     Belongs  IMHO  to  the  discussion  to  keep separate what is
    >     separate  (having  indices/toast-relations/etc.  in  separate
    >     directories and whatnot).
    > 
    >     I've   never   been   really   happy  with  the  file  naming
    >     conventions. The need of a filesystem entry to have the  same
    >     name of the DB object that is associated with it isn't right.
    >     I know, some people love to be able to  easily  identify  the
    >     files with ls(1). OTOH what is that good for?
    
    Well, I have no problem just appending some serial number to the end of
    our existing names.  That solves both purposes, no?  Seems Vadim is
    going to have a new storage manager in 7.2 anyway.
    
    If/when we lose file name/object mapping, we will have to write
    command-line utilities to report the mappings so people can do
    administration properly.  It certainly makes it hard for administrators.
    
    > 
    >     Well,  someone  can  easily see how big the disk footprint of
    >     his data is.  Whow - what an info. Anything else?
    > 
    >     Why not changing the naming to be something like this:
    > 
    >         <dbroot>/catalog_tables/pg_...
    >         <dbroot>/catalog_index/pg_...
    >         <dbroot>/user_tables/oid_...
    >         <dbroot>/user_index/oid_...
    >         <dbroot>/temp_tables/oid_...
    >         <dbroot>/temp_index/oid_...
    >         <dbroot>/toast_tables/oid_...
    >         <dbroot>/toast_index/oid_...
    >         <dbroot>/whatnot_???/...
    > 
    >     This way, it  would  be  much  easier  to  separate  all  the
    >     different  object types to different physical media. We would
    >     loose some  transparency,  but  I've  allways  wondered  what
    >     people  USE  that  for  (except  for  just  wanna  know). For
    >     convinience we could implement another  little  utility  that
    >     tells the object size like
    
    Yes, we could do that.
    
    > 
    >         DESCRIBE TABLE/VIEW/whatnot <object-name>
    > 
    >     that returns the physical location and storage details of the
    >     object. And psql could use it to print this  info  additional
    >     on  the  \d commands. Would give unprivileged users access to
    >     this info, so be it, it's not a security issue IMHO.
    
    You need something that works from the command line, and something that
    works if PostgreSQL is not running.  How would you restore one file from
    a tape.  I guess you could bring back the whole thing, then do the
    query, and move the proper table file back in, but that is a pain.
    
    
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  32. Re: Big 7.1 open items

    Don Baccus <dhogaza@pacifier.com> — 2000-06-14T23:51:51Z

    At 07:13 PM 6/14/00 -0400, Bruce Momjian wrote:
    
    >> 
    >>     This way, it  would  be  much  easier  to  separate  all  the
    >>     different  object types to different physical media. We would
    >>     loose some  transparency,  but  I've  allways  wondered  what
    >>     people  USE  that  for  (except  for  just  wanna  know). For
    >>     convinience we could implement another  little  utility  that
    >>     tells the object size like
    >
    >Yes, we could do that.
    
    It's a poor man's substitute for a proper create tablespace on
    storage 'filesystem' - style dml statement, but it's a step in
    the right direction.
    
    
    
    - Don Baccus, Portland OR <dhogaza@pacifier.com>
      Nature photos, on-line guides, Pacific Northwest
      Rare Bird Alert Service and other goodies at
      http://donb.photo.net.
    
    
  33. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-15T02:07:15Z

    JanWieck@t-online.de (Jan Wieck) writes:
    >     I've   never   been   really   happy  with  the  file  naming
    >     conventions. The need of a filesystem entry to have the  same
    >     name of the DB object that is associated with it isn't right.
    >     I know, some people love to be able to  easily  identify  the
    >     files with ls(1). OTOH what is that good for?
    
    I agree with Jan on this: let's just change the file names over to
    be OIDs.  Then we can have rollbackable DROP and RENAME TABLE easily.
    Naming the files after the logical names of the tables is nice if it
    doesn't cost anything, but it is *not* worth the trouble to preserve
    a relationship between filename and tablename when it is costing us.
    And it's costing us big time.  That single feature is hurting us on
    functionality, robustness, and portability, and for what benefit?
    Not nearly enough.  It's time to just let go of it.
    
    >     Why not changing the naming to be something like this:
    
    >         <dbroot>/catalog_tables/pg_...
    >         <dbroot>/catalog_index/pg_...
    >         <dbroot>/user_tables/oid_...
    >         <dbroot>/user_index/oid_...
    >         <dbroot>/temp_tables/oid_...
    >         <dbroot>/temp_index/oid_...
    >         <dbroot>/toast_tables/oid_...
    >         <dbroot>/toast_index/oid_...
    >         <dbroot>/whatnot_???/...
    
    I don't see a lot of value in that.  Better to do something like
    tablespaces:
    
    	<dbroot>/<oidoftablespace>/<oidofobject>
    
    			regards, tom lane
    
    
  34. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-15T02:21:30Z

    Bruce Momjian <pgman@candle.pha.pa.us> writes:
    > You need something that works from the command line, and something that
    > works if PostgreSQL is not running.  How would you restore one file from
    > a tape.
    
    "Restore one file from a tape"?  How are you going to do that anyway?
    You can't save and restore portions of a database like that, because
    of transaction commit status problems.  To restore table X correctly,
    you'd have to restore pg_log as well, and then your other tables are
    hosed --- unless you also restore all of them from the backup.  Only
    a complete database restore from tape would work, and for that you
    don't need to tell which file is which.  So the above argument is a
    red herring.
    
    I realize it's nice to be able to tell which table file is which by
    eyeball, but the price we are paying for that small convenience is
    just too high.  Give that up, and we can have rollbackable DROP and
    RENAME now (I'll personally commit to making it happen for 7.1).
    Continue to insist on it, and I don't think we'll *ever* have those
    features in a really robust form.  It's just not possible to do
    multiple file renames atomically.
    
    			regards, tom lane
    
    
  35. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-15T02:28:53Z

    > Bruce Momjian <pgman@candle.pha.pa.us> writes:
    > > You need something that works from the command line, and something that
    > > works if PostgreSQL is not running.  How would you restore one file from
    > > a tape.
    > 
    > "Restore one file from a tape"?  How are you going to do that anyway?
    > You can't save and restore portions of a database like that, because
    > of transaction commit status problems.  To restore table X correctly,
    > you'd have to restore pg_log as well, and then your other tables are
    > hosed --- unless you also restore all of them from the backup.  Only
    > a complete database restore from tape would work, and for that you
    > don't need to tell which file is which.  So the above argument is a
    > red herring.
    > 
    > I realize it's nice to be able to tell which table file is which by
    > eyeball, but the price we are paying for that small convenience is
    > just too high.  Give that up, and we can have rollbackable DROP and
    > RENAME now (I'll personally commit to making it happen for 7.1).
    > Continue to insist on it, and I don't think we'll *ever* have those
    > features in a really robust form.  It's just not possible to do
    > multiple file renames atomically.
    > 
    
    OK, I am flexible.  (Yea, right.)  :-)
    
    But seriously, let me give some background.  I used Ingres, that used
    the VMS file system, but used strange sequential AAAF324 numbers for
    tables.  When someone deleted a table, or we were looking at what tables
    were using disk space, it was impossible to find the Ingres table names
    that went with the file.  There was a system table that showed it, but
    it was poorly documented, and if you deleted the table, there was no way
    to look on the tape to find out which file to restore.
    
    As far as pg_log, you certainly would not expect to get any information
    back from the time of the backup table to current, so the current pg_log
    would be just fine.
    
    Basically, I guess we have to do it, but we have to print the proper
    error messages for cases in the backend we just print the file name. 
    Also, we have to now replace the 'ls -l' command with something that
    will be meaningful.
    
    Right now, we use 'ps' with args to display backend information, and ls
    -l to show disk information.  We are going to lose that here.
    
    
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  36. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-15T02:36:19Z

    Bruce Momjian <pgman@candle.pha.pa.us> writes:
    > But seriously, let me give some background.  I used Ingres, that used
    > the VMS file system, but used strange sequential AAAF324 numbers for
    > tables.  When someone deleted a table, or we were looking at what tables
    > were using disk space, it was impossible to find the Ingres table names
    > that went with the file.  There was a system table that showed it, but
    > it was poorly documented, and if you deleted the table, there was no way
    > to look on the tape to find out which file to restore.
    
    Fair enough, but it seems to me that the answer is to expend some effort
    on system admin support tools.  We could do a lot in that line with less
    effort than trying to make a fundamentally mismatched filesystem
    representation do what we need.
    
    			regards, tom lane
    
    
  37. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-15T02:44:16Z

    > Bruce Momjian <pgman@candle.pha.pa.us> writes:
    > > But seriously, let me give some background.  I used Ingres, that used
    > > the VMS file system, but used strange sequential AAAF324 numbers for
    > > tables.  When someone deleted a table, or we were looking at what tables
    > > were using disk space, it was impossible to find the Ingres table names
    > > that went with the file.  There was a system table that showed it, but
    > > it was poorly documented, and if you deleted the table, there was no way
    > > to look on the tape to find out which file to restore.
    > 
    > Fair enough, but it seems to me that the answer is to expend some effort
    > on system admin support tools.  We could do a lot in that line with less
    > effort than trying to make a fundamentally mismatched filesystem
    > representation do what we need.
    
    That was my point --- that in doing this change, we are taking on more
    TODO items, that may detract from our main TODO items.  I am also
    concerned that the filename/tablename mapping is supported by so many
    Unix toolks like ls, lsof/fstat, and tar, that we could be in for
    needing to support tons of utilities to enable administrators to do what
    they can so easily do now.
    
    Even gdb shows us the filename/tablename in backtraces.  We are never
    going to be able to reproduce that.  I guess I didn't want to bit off
    that much work until we had a _convincing_ need.  I guess I don't
    consider table schema commands inside transactions and such to be as big
    an items as the utility features we will need to build.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  38. Re: Big 7.1 open items

    Don Baccus <dhogaza@pacifier.com> — 2000-06-15T02:46:39Z

    At 10:28 PM 6/14/00 -0400, Bruce Momjian wrote:
    
    >As far as pg_log, you certainly would not expect to get any information
    >back from the time of the backup table to current, so the current pg_log
    >would be just fine.
    
    In reality, very few people are going to be interested in restoring
    a table in a way that breaks referential integrity and other 
    normal assumptions about what exists in the database.  The reality
    is that most people are going to engage in a little time travel
    to a past, consistent backup rather than do as you suggest.
    
    This is going to be more and more true as Postgres gains more and
    more acceptance in (no offense intended) the real world.
    
    >Right now, we use 'ps' with args to display backend information, and ls
    >-l to show disk information.  We are going to lose that here.
    
    Dependence on "ls -l" is, IMO, a very weak argument.
    
    
    
    - Don Baccus, Portland OR <dhogaza@pacifier.com>
      Nature photos, on-line guides, Pacific Northwest
      Rare Bird Alert Service and other goodies at
      http://donb.photo.net.
    
    
  39. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-15T03:13:52Z

    Bruce Momjian <pgman@candle.pha.pa.us> writes:
    > That was my point --- that in doing this change, we are taking on more
    > TODO items, that may detract from our main TODO items.
    
    True, but they are also TODO items that could be handled by people other
    than the inner circle of key developers.  The actual rejiggering of
    table-to-filename mapping is going to have to be done by one of the
    small number of people who are fully up to speed on backend internals.
    But we've got a lot more folks who would be able (and, hopefully,
    willing) to design and code whatever tools are needed to make the
    dbadmin's job easier in the face of the new filesystem layout.  I'd
    rather not expend a lot of core time to avoid needing those tools,
    especially when I feel the old approach is fatally flawed anyway.
    
    > Even gdb shows us the filename/tablename in backtraces.  We are never
    > going to be able to reproduce that.
    
    Backtraces from *what*, exactly?  99% of the backend is still going
    to be dealing with the same data as ever.  It might be that poking
    around in fd.c will be a little harder, but considering that fd.c
    doesn't really know or care what the files it's manipulating are
    anyway, I'm not convinced that this is a real issue.
    
    > I guess I don't consider table schema commands inside transactions and
    > such to be as big an items as the utility features we will need to
    > build.
    
    You've *got* to be kidding.  We're constantly seeing complaints about
    the fact that rolling back DROP or RENAME TABLE fails --- and worse,
    leaves the table in a corrupted/inconsistent state.  As far as I can
    tell, that's one of the worst robustness problems we've got left to
    fix.  This is a big deal IMHO, and I want it to be fixed and fixed
    right.  I don't see how to fix it right if we try to keep physical
    filenames tied to logical tablenames.
    
    Moreover, that restriction will continue to hurt us if we try to
    preserve it while implementing tablespaces, ANSI schemas, etc.
    
    			regards, tom lane
    
    
  40. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-15T03:21:15Z

    > Bruce Momjian <pgman@candle.pha.pa.us> writes:
    > > That was my point --- that in doing this change, we are taking on more
    > > TODO items, that may detract from our main TODO items.
    > 
    > True, but they are also TODO items that could be handled by people other
    > than the inner circle of key developers.  The actual rejiggering of
    > table-to-filename mapping is going to have to be done by one of the
    > small number of people who are fully up to speed on backend internals.
    > But we've got a lot more folks who would be able (and, hopefully,
    > willing) to design and code whatever tools are needed to make the
    > dbadmin's job easier in the face of the new filesystem layout.  I'd
    > rather not expend a lot of core time to avoid needing those tools,
    > especially when I feel the old approach is fatally flawed anyway.
    
    Yes, it is clearly fatally flawed.  I agree.
    
    > > Even gdb shows us the filename/tablename in backtraces.  We are never
    > > going to be able to reproduce that.
    > 
    > Backtraces from *what*, exactly?  99% of the backend is still going
    > to be dealing with the same data as ever.  It might be that poking
    > around in fd.c will be a little harder, but considering that fd.c
    > doesn't really know or care what the files it's manipulating are
    > anyway, I'm not convinced that this is a real issue.
    
    I was just throwing gdb out as an example.  The bigger ones are ls,
    lsof/fstat, and tar.
    
    > > I guess I don't consider table schema commands inside transactions and
    > > such to be as big an items as the utility features we will need to
    > > build.
    > 
    > You've *got* to be kidding.  We're constantly seeing complaints about
    > the fact that rolling back DROP or RENAME TABLE fails --- and worse,
    > leaves the table in a corrupted/inconsistent state.  As far as I can
    > tell, that's one of the worst robustness problems we've got left to
    > fix.  This is a big deal IMHO, and I want it to be fixed and fixed
    > right.  I don't see how to fix it right if we try to keep physical
    > filenames tied to logical tablenames.
    > 
    > Moreover, that restriction will continue to hurt us if we try to
    > preserve it while implementing tablespaces, ANSI schemas, etc.
    > 
    
    Well, we did have someone do a test implementation of oid file names,
    and their report was that is looked pretty ugly.  However, if people are
    convinced it has to be done, we can get started.  I guess I was waiting
    for Vadim's storage manager, where the whole idea of separate files is
    going to go away anyway, I suspect.  We would then have to re-write all
    our admin tools for the new format.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  41. Re: Big 7.1 open items

    Jan Wieck <janwieck@t-online.de> — 2000-06-15T04:15:22Z

    Bruce Momjian wrote:
    > >
    > >         DESCRIBE TABLE/VIEW/whatnot <object-name>
    > >
    > >     that returns the physical location and storage details of the
    > >     object. And psql could use it to print this  info  additional
    > >     on  the  \d commands. Would give unprivileged users access to
    > >     this info, so be it, it's not a security issue IMHO.
    >
    > You need something that works from the command line, and something that
    > works if PostgreSQL is not running.  How would you restore one file from
    > a tape.  I guess you could bring back the whole thing, then do the
    > query, and move the proper table file back in, but that is a pain.
    
        Think you messed up some basics of PG here.
    
        It's  totally useless to restore single files of a PostgreSQL
        database. You could either put back anything below ./data, or
        nothing - the reason is pg_log.
    
        You  don't  need  something  that work's if PostgreSQL is not
        running.  You cannot restore ONE file from a  tape!  You  can
        restore  a  PostgreSQL  instance (only a complete one - not a
        single DB, nor a single table or any  other  object).   While
        your  backup  is writing to the tape, each number of backends
        could concurrently modify single blocks  of  the  heap,  it's
        indices and pg_log. So what does the tape contain the?
    
        I'd  like  to ask you, are you sure the backups you're making
        are worth the power consumption of  the  tape  drive?  You're
        talking  about  restoring  a file - and sould be aware of the
        fact, that any file based backup would never be able  to  get
        consistent snapshot of the database, like pg_dump is able to.
    
        As long as you don't take  the  postmaster  down  during  the
        entire  saving  of ./data, you aren't in a safe position. And
        the only safe RESTORE is  to  restore  ./data  completely  or
        nothing.  It's  not  even  (easily)  possible  to  initdb and
        restore a single DB from tape (it is, but requires some  deep
        knowledge and more than just restoring some files from tape).
    
        YOU REALLY DON'T NEED ANY FILENAMES IN THERE!
    
        The more I think about it, the more I feel these file  names,
        easily associatable with the objects they represent, are more
        dangerous than useful in practice. Maybe we should  obfuscate
        the  entire  ./data  like  Oracle  does  with it's tablespace
        files.  Just  that  our  tablespaces  will  be   directories,
        containing totally cryptic named files.
    
    
    Jan
    
    --
    
    #======================================================================#
    # It's easier to get forgiveness for being wrong than for being right. #
    # Let's break this rule - forgive me.                                  #
    #================================================== JanWieck@Yahoo.com #
    
    
    
    
  42. Re: Big 7.1 open items

    Jan Wieck <janwieck@t-online.de> — 2000-06-15T04:20:21Z

    Tom Lane wrote:
    > JanWieck@t-online.de (Jan Wieck) writes:
    > >     Why not changing the naming to be something like this:
    > 
    > >         <dbroot>/catalog_tables/pg_...
    > >         <dbroot>/catalog_index/pg_...
    > >         <dbroot>/user_tables/oid_...
    > >         <dbroot>/user_index/oid_...
    > >         <dbroot>/temp_tables/oid_...
    > >         <dbroot>/temp_index/oid_...
    > >         <dbroot>/toast_tables/oid_...
    > >         <dbroot>/toast_index/oid_...
    > >         <dbroot>/whatnot_???/...
    > 
    > I don't see a lot of value in that.  Better to do something like
    > tablespaces:
    > 
    > 	<dbroot>/<oidoftablespace>/<oidofobject>
    
        *Slap* - yes!
    
    
    Jan
    
    -- 
    
    #======================================================================#
    # It's easier to get forgiveness for being wrong than for being right. #
    # Let's break this rule - forgive me.                                  #
    #================================================== JanWieck@Yahoo.com #
    
    
    
  43. Re: Big 7.1 open items

    Ross Reedstrom <reedstrm@rice.edu> — 2000-06-15T06:03:12Z

    On Wed, Jun 14, 2000 at 11:21:15PM -0400, Bruce Momjian wrote:
    > 
    > Well, we did have someone do a test implementation of oid file names,
    > and their report was that is looked pretty ugly.  
    
    That someone would be me. Did my mails from this morning fall into a black
    hole? I've got a patch that does either oid filenames or relname_<oid>,
    take your pick.  It doesn't do tablespaces, just leaves the files where
    they are. TO do relname_<oid>, I add a relphysname field to pg_class.
    
    I'll update it to current and throw it at the PATCHES list this weekend,
    unless someone more central wants to do tablespaces first. I tried
    out rollinging back ALTER TABLE RENAME. Works fine. Biggest problem
    with it is that I played silly buggers with the relcache for no good
    reason. Hiroshi stripped that out and said it works fone, otherwise. I
    also haven't touched DROP TABLE yet. The physical file be deleted at
    transaction commit time, then?  Hmm, we're the 'things to do at commit'
    queue?
    
    > convinced it has to be done, we can get started.  I guess I was waiting
    > for Vadim's storage manager, where the whole idea of separate files is
    > going to go away anyway, I suspect.  We would then have to re-write all
    > our admin tools for the new format.
    
    Any strong objections to the mixed relname_oid solution? It gets us
    everything oids does, and still lets Bruce use 'ls -l' to find the big
    tables, putting off writing any admin tools that'll need to be rewritten,
    anyway.
    
    Ross
    -- 
    Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
    NSBRI Research Scientist/Programmer
    Computer and Information Technology Institute
    Rice University, 6100 S. Main St.,  Houston, TX 77005
    
    
  44. Re: Big 7.1 open items

    Thomas Lockhart <lockhart@alumni.caltech.edu> — 2000-06-15T06:29:29Z

    > But seriously, let me give some background.  I used Ingres, that used
    > the VMS file system, but used strange sequential AAAF324 numbers for
    > tables.  When someone deleted a table, or we were looking at what 
    > tables were using disk space, it was impossible to find the Ingres 
    > table names that went with the file.  There was a system table that 
    > showed it, but it was poorly documented, and if you deleted the table, 
    > there was no way to look on the tape to find out which file to 
    > restore.
    
    I had the same experience, but let's put the blame where it belongs: it
    wasn't the filename's fault, it was poor design and support from the
    Ingres company.
    
                      - Thomas
    
    
  45. Re: Big 7.1 open items

    Chris <chrisb@nimrod.itg.telstra.com.au> — 2000-06-15T06:56:12Z

    "Ross J. Reedstrom" wrote:
    
    > Any strong objections to the mixed relname_oid solution? It gets us
    > everything oids does, and still lets Bruce use 'ls -l' to find the big
    > tables, putting off writing any admin tools that'll need to be rewritten,
    > anyway.
    
    Doesn't  relname_oid defeat the purpose of oid file names, which is that
    they don't change when the table is renamed? Wasn't it going to be oids
    with a tool to create a symlink of relname -> oid ?
    
    
  46. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-15T07:11:52Z

    "Ross J. Reedstrom" <reedstrm@rice.edu> writes:
    > Any strong objections to the mixed relname_oid solution?
    
    Yes!
    
    You cannot make it work reliably unless the relname part is the original
    relname and does not track ALTER TABLE RENAME.  IMHO having an obsolete
    relname in the filename is worse than not having the relname at all;
    it's a recipe for confusion, it means you still need admin tools to tell
    which end is really up, and what's worst is you might think you don't.
    
    Furthermore it requires an additional column in pg_class to keep track
    of the original relname, which is a waste of space and effort.
    
    It also creates a portability risk, or at least fails to remove one,
    since you are critically dependent on the assumption that the OS
    supports long filenames --- on a filesystem that truncates names to less
    than about 45 characters you're in very deep trouble.  An OID-only
    approach still works on traditional 14-char-filename Unix filesystems
    (it'd mostly even work on DOS 8+3, though I doubt we care about that).
    
    Finally, one of the reasons I want to go to filenames based only on OID
    is that that'll make life easier for mdblindwrt.  Original relname + OID
    doesn't help, in fact it makes life harder (more shmem space needed to
    keep track of the filename for each buffer).
    
    Can we *PLEASE JUST LET GO* of this bad idea?  No relname in the
    filename.  Period.
    
    			regards, tom lane
    
    
  47. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-15T07:14:30Z

    Bruce Momjian <pgman@candle.pha.pa.us> writes:
    > Well, we did have someone do a test implementation of oid file names,
    > and their report was that is looked pretty ugly.  However, if people are
    > convinced it has to be done, we can get started.  I guess I was waiting
    > for Vadim's storage manager, where the whole idea of separate files is
    > going to go away anyway, I suspect.  We would then have to re-write all
    > our admin tools for the new format.
    
    I seem to recall him saying that he wanted to go to filename == OID
    just like I'm suggesting.  But I agree we probably ought to hold off
    doing anything until he gets back from Russia and can let us know
    whether that's still his plan.  If he is planning one-huge-file or
    something like that, we might as well let these issues go unfixed
    for one more release cycle.
    
    			regards, tom lane
    
    
  48. Re: Big 7.1 open items

    Marc G. Fournier <scrappy@hub.org> — 2000-06-15T12:03:31Z

    On Wed, 14 Jun 2000, Jan Wieck wrote:
    
    >     Why not changing the naming to be something like this:
    > 
    >         <dbroot>/catalog_tables/pg_...
    >         <dbroot>/catalog_index/pg_...
    >         <dbroot>/user_tables/oid_...
    >         <dbroot>/user_index/oid_...
    >         <dbroot>/temp_tables/oid_...
    >         <dbroot>/temp_index/oid_...
    >         <dbroot>/toast_tables/oid_...
    >         <dbroot>/toast_index/oid_...
    >         <dbroot>/whatnot_???/...
    > 
    >     This way, it  would  be  much  easier  to  separate  all  the
    >     different  object types to different physical media. We would
    >     loose some  transparency,  but  I've  allways  wondered  what
    >     people  USE  that  for  (except  for  just  wanna  know). For
    >     convinience we could implement another  little  utility  that
    >     tells the object size like
    
    Wow, I've been advocating this one for how many months now? :)  You won't
    get any arguments from me ... 
    
    
    
    
  49. Re: Big 7.1 open items

    Marc G. Fournier <scrappy@hub.org> — 2000-06-15T12:14:29Z

    On Wed, 14 Jun 2000, Bruce Momjian wrote:
    
    > > Backtraces from *what*, exactly?  99% of the backend is still going
    > > to be dealing with the same data as ever.  It might be that poking
    > > around in fd.c will be a little harder, but considering that fd.c
    > > doesn't really know or care what the files it's manipulating are
    > > anyway, I'm not convinced that this is a real issue.
    > 
    > I was just throwing gdb out as an example.  The bigger ones are ls,
    > lsof/fstat, and tar.
    
    You've lost me on this one ... if someone does an lsof of the process, it
    will still provide them a list of open files ... are you complaining about
    the extra step required to translate the file name to a "valid table"?  
    
    Oh, one point here ... this whole 'filenaming issue' ... as far as ls is
    concerned, at least, only affects the superuser, since he's the only one
    that can go 'ls'ng around i nthe directories ...
    
    And, ummm, how hard would it be to have \d in psql display the "physical
    table name" as part of its output?
    
    Slight tangent here:
    
    One thing that I think would be great if we could add is some sort of:
    
    SELECT db_name, disk_space;
    
    query wher a database owner, not the superuser, could see how much disk
    space their tables are using up ... possible?
    
    
    
  50. Re: Big 7.1 open items

    Mark Hollomon <mhh@nortelnetworks.com> — 2000-06-15T12:28:12Z

    Ross J. Reedstrom wrote:
    > 
    > Any strong objections to the mixed relname_oid solution? It gets us
    > everything oids does, and still lets Bruce use 'ls -l' to find the big
    > tables, putting off writing any admin tools that'll need to be rewritten,
    > anyway.
    
    I would object to the mixed name.
    
    Consider:
    
    CREATE TABLE FOO ....
    ALTER TABLE FOO RENAME FOO_OLD;
    CREATE TABLE FOO ....
    
    For the same atomicity reason, rename can't change the
    name of the files. So, which foo_<oid> is the FOO_OLD
    and which is FOO?
    
    In other words, in the presence of rename, putting
    relname in the filename is misleading at best.
    
    -- 
    
    Mark Hollomon
    mhh@nortelnetworks.com
    ESN 451-9008 (302)454-9008
    
    
  51. Re: Big 7.1 open items

    Brian E Gallew <geek+@cmu.edu> — 2000-06-15T12:29:02Z

    Then <tgl@sss.pgh.pa.us> spoke up and said:
    > Precedence: bulk
    > 
    > Bruce Momjian <pgman@candle.pha.pa.us> writes:
    > > But seriously, let me give some background.  I used Ingres, that used
    > > the VMS file system, but used strange sequential AAAF324 numbers for
    > > tables.  When someone deleted a table, or we were looking at what tables
    > > were using disk space, it was impossible to find the Ingres table names
    > > that went with the file.  There was a system table that showed it, but
    > > it was poorly documented, and if you deleted the table, there was no way
    > > to look on the tape to find out which file to restore.
    > 
    > Fair enough, but it seems to me that the answer is to expend some effort
    > on system admin support tools.  We could do a lot in that line with less
    > effort than trying to make a fundamentally mismatched filesystem
    > representation do what we need.
    
    We've been an Ingres shop as long as there's been an Ingres.  While
    we've also had the problem Bruce noticed with table names, we've
    *also* used the trivial fix of running a (simple) Report Writer job
    each night, immediately before the backup, that lists all of the
    database tables/indicies and the underlying files.
    
    True, if someone drops/recreates a table twice between backups we
    can't find the intermediate file name, but since we also haven't
    backed up that filename, this isn't an issue.
    
    Also, the consistency issue is really not as important as you would
    think.  If you are restoring a table, you want the information in it,
    whether or not it's consistent with anything else.  I've done hundreds
    of table restores (can you say "modify table to heap"?) and never once
    has inconsistency been an issue.  Oh, yeah, and we don't shut the
    database down for this, either.  (That last isn't my choice, BTW.)
    
    -- 
    =====================================================================
    | JAVA must have been developed in the wilds of West Virginia.      |
    | After all, why else would it support only single inheritance??    |
    =====================================================================
    | Finger geek@cmu.edu for my public key.                            |
    =====================================================================
    
  52. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-15T13:38:53Z

    > > But seriously, let me give some background.  I used Ingres, that used
    > > the VMS file system, but used strange sequential AAAF324 numbers for
    > > tables.  When someone deleted a table, or we were looking at what 
    > > tables were using disk space, it was impossible to find the Ingres 
    > > table names that went with the file.  There was a system table that 
    > > showed it, but it was poorly documented, and if you deleted the table, 
    > > there was no way to look on the tape to find out which file to 
    > > restore.
    > 
    > I had the same experience, but let's put the blame where it belongs: it
    > wasn't the filename's fault, it was poor design and support from the
    > Ingres company.
    
    Yes, that certainly was part of the cause.  Also, if the PostgreSQL
    files are backed up using tar while no database activity is happening,
    there is no reason the tar restore will not work.  You just create a
    table with the same schema, stop the postmaster, have the backup file
    replace the newly created table file, and restart the postmaster.
    
    I can't tell you how many times I have said, "Man, whoever did this
    Ingres naming schema was an idiot.  Do they know how many problems they
    caused for us?"
    
    Also, Informix standard engine uses the tablename_oid setup for its
    table names, and it works fine.  It grabs the first 8 characters of the
    table, and plops some unique number on the end of it.  Works fine for
    administrators.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  53. Re: Big 7.1 open items

    Ross Reedstrom <reedstrm@rice.edu> — 2000-06-15T16:45:19Z

    On Thu, Jun 15, 2000 at 03:11:52AM -0400, Tom Lane wrote:
    > "Ross J. Reedstrom" <reedstrm@rice.edu> writes:
    > > Any strong objections to the mixed relname_oid solution?
    > 
    > Yes!
    > 
    > You cannot make it work reliably unless the relname part is the original
    > relname and does not track ALTER TABLE RENAME.  IMHO having an obsolete
    > relname in the filename is worse than not having the relname at all;
    > it's a recipe for confusion, it means you still need admin tools to tell
    > which end is really up, and what's worst is you might think you don't.
    
    The plan here was to let VACUUM handle renaming the file, since it
    will already have all the necessary locks. This shortens the window
    of confusion.  ALTER TABLE RENAME doesn't happen that often, really - 
    the relname is there just for human consumption, then.
    
    > 
    > Furthermore it requires an additional column in pg_class to keep track
    > of the original relname, which is a waste of space and effort.
    > 
    
    I actually started down this path thinking about implementing SCHEMA,
    since tables in the same DB but in different schema can have the same
    relname, I figured I needed to change that. We'll need something in
    pg_class to keep track of what schema a relation is in, instead.
    
    > It also creates a portability risk, or at least fails to remove one,
    > since you are critically dependent on the assumption that the OS
    > supports long filenames --- on a filesystem that truncates names to less
    > than about 45 characters you're in very deep trouble.  An OID-only
    > approach still works on traditional 14-char-filename Unix filesystems
    > (it'd mostly even work on DOS 8+3, though I doubt we care about that).
    
    Actually, no. Since I store the filename in a name attribute, I used this
    nifty function somebody wrote, makeObjectName, to trim the relname part,
    but leave the oid. (Yes, I know it's yours ;-)
    
    > 
    > Finally, one of the reasons I want to go to filenames based only on OID
    > is that that'll make life easier for mdblindwrt.  Original relname + OID
    > doesn't help, in fact it makes life harder (more shmem space needed to
    > keep track of the filename for each buffer).
    
    Can you explain in more detail how this helps? Not by letting the bufmgr
    know that oid == filename, I hope. We need to improving the abstraction
    of the smgr, not add another violation. Ah, sorry, mdblindwrt _is_
    in the smgr. 
    
    Hmm, grovelling through that code, I see how it could be simpler if reloid
    == filename. Heck, we even get to save shmem in the buffdesc.blind part,
    since we only need the dbname in there, now.
    
    Hmm, I see I missed the relpath_blind() in my patch - oops.  (relpath()
    is always called with RelationGetPhysicalRelationName(), and that's
    where I was putting in the relphysname)
    
    Hmm, what's all this with functions in catalog.c that are only called by
    smgr/md.c? seems to me that anything having to do with physical storage
    (like the path!) belongs in the smgr abstraction.
    
    > 
    > Can we *PLEASE JUST LET GO* of this bad idea?  No relname in the
    > filename.  Period.
    > 
    
    Gee, so dogmatic. No one besides Bruce and Hiroshi discussed this _at
    all_ when I first put up patches two month ago. O.K., I'll do the oids
    only version (and fix up relpath_blind)
    
    Ross
    
    -- 
    Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
    NSBRI Research Scientist/Programmer
    Computer and Information Technology Institute
    Rice University, 6100 S. Main St.,  Houston, TX 77005
    
    
  54. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-15T19:35:45Z

    > > Can we *PLEASE JUST LET GO* of this bad idea?  No relname in the
    > > filename.  Period.
    > > 
    > 
    > Gee, so dogmatic. No one besides Bruce and Hiroshi discussed this _at
    > all_ when I first put up patches two month ago. O.K., I'll do the oids
    > only version (and fix up relpath_blind)
    
    Hold on.  I don't think we want that work done yet.  Seems even Tom is
    thinking that if Vadim is going to re-do everything later anyway, we may
    be better with a relname/oid solution that does require additional
    administration apps.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  55. RE: Big 7.1 open items

    Hiroshi Inoue <inoue@tpf.co.jp> — 2000-06-15T21:48:21Z

    > -----Original Message-----
    > From: pgsql-hackers-owner@hub.org 
    > [mailto:pgsql-hackers-owner@hub.org]On Behalf Of Bruce Momjian
    > 
    > > > Can we *PLEASE JUST LET GO* of this bad idea?  No relname in the
    > > > filename.  Period.
    > > > 
    > > 
    > > Gee, so dogmatic. No one besides Bruce and Hiroshi discussed this _at
    > > all_ when I first put up patches two month ago. O.K., I'll do the oids
    > > only version (and fix up relpath_blind)
    > 
    > Hold on.  I don't think we want that work done yet.  Seems even Tom is
    > thinking that if Vadim is going to re-do everything later anyway, we may
    > be better with a relname/oid solution that does require additional
    > administration apps.
    >
    
    Hmm,why is naming rule first ?
    
    I've never enphasized naming rule except that it should be unique.
    It has been my main point to reduce the necessity of naming rule
    as possible. IIRC,by keeping the stored place in pg_class,Ross's
    trial patch remains only 2 places where naming rule is required. 
    So wouldn't we be free from naming rule(it would not be so difficult
    to change naming rule if the rule is found to be bad) ? 
    
    I've also mentioned many times neither relname nor oid is sufficient
    for the uniqueness. In addiiton neither relname nor oid would be
    necessary for the uniqueness.
    IMHO,it's bad to rely on the item which is neither necessary nor
    sufficient.
    I proposed relname+unique_id naming once. The unique_id is
    independent from oid. The relname is only for convinience for
    DBA and so we don't have to change it due to RENAME.
    Db's consistency is much more important than dba's satis-
    faction.
    
    Comments ?
    
    Regards.
    
    Hiroshi Inoue
    Inoue@tpf.co.jp
    
    
    
  56. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-15T21:48:59Z

    > I've also mentioned many times neither relname nor oid is sufficient
    > for the uniqueness. In addiiton neither relname nor oid would be
    > necessary for the uniqueness.
    > IMHO,it's bad to rely on the item which is neither necessary nor
    > sufficient.
    > I proposed relname+unique_id naming once. The unique_id is
    > independent from oid. The relname is only for convinience for
    > DBA and so we don't have to change it due to RENAME.
    > Db's consistency is much more important than dba's satis-
    > faction.
    > 
    > Comments ?
    
    I am happy not to rename the file on 'RENAME', but seems no one likes
    that.
    
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  57. Re: Big 7.1 open items

    Ross Reedstrom <reedstrm@rice.edu> — 2000-06-15T22:53:59Z

    On Thu, Jun 15, 2000 at 05:48:59PM -0400, Bruce Momjian wrote:
    > > I've also mentioned many times neither relname nor oid is sufficient
    > > for the uniqueness. In addiiton neither relname nor oid would be
    > > necessary for the uniqueness.
    > > IMHO,it's bad to rely on the item which is neither necessary nor
    > > sufficient.
    > > I proposed relname+unique_id naming once. The unique_id is
    > > independent from oid. The relname is only for convinience for
    > > DBA and so we don't have to change it due to RENAME.
    > > Db's consistency is much more important than dba's satis-
    > > faction.
    > > 
    > > Comments ?
    > 
    > I am happy not to rename the file on 'RENAME', but seems no one likes
    > that.
    
    Good, 'cause that's how I've implemented it so far. Actually, all
    I've done is port my previous patch to current, with one little
    change: I added a macro RelationGetRealRelationName which does what
    RelationGetPhysicalRelationName used to do: i.e. return the relname with
    no temptable funny business, and used that for the relcache macros. It
    passes all the serial regression tests: I haven't run the parallel tests
    yet. ALTER TABLE RENAME rollsback nicely. I'll need to learn some omre
    about xacts to get DROP TABLE rolling back.
    
    I'll drop it on PATCHES right now, for comment.
    
    Ross
    -- 
    Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
    NSBRI Research Scientist/Programmer
    Computer and Information Technology Institute
    Rice University, 6100 S. Main St.,  Houston, TX 77005
    
    
  58. filename patch (was Re: [HACKERS] Big 7.1 open items)

    Ross Reedstrom <reedstrm@rice.edu> — 2000-06-15T22:57:38Z

    Here's the patch I promised on HACKERS. Comments anyone?
    
    Ross
    -- 
    Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
    NSBRI Research Scientist/Programmer
    Computer and Information Technology Institute
    Rice University, 6100 S. Main St.,  Houston, TX 77005
    
    
  59. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-15T23:53:52Z

    "Ross J. Reedstrom" <reedstrm@rice.edu> writes:
    > On Thu, Jun 15, 2000 at 03:11:52AM -0400, Tom Lane wrote:
    >> "Ross J. Reedstrom" <reedstrm@rice.edu> writes:
    >>>> Any strong objections to the mixed relname_oid solution?
    >> 
    >> Yes!
    
    > The plan here was to let VACUUM handle renaming the file, since it
    > will already have all the necessary locks. This shortens the window
    > of confusion.  ALTER TABLE RENAME doesn't happen that often, really - 
    > the relname is there just for human consumption, then.
    
    Yeah, I've seen tons of discussion of how if we do this, that, and
    the other thing, and be prepared to fix up some other things in case
    of crash recovery, we can make it work with filename == relname + OID
    (where relname tracks logical name, at least at some remove).
    
    Probably.  Assuming nobody forgets anything.
    
    I'm just trying to point out that that's a huge amount of pretty
    delicate mechanism.  The amount of work required to make it trustworthy
    looks to me to dwarf the admin tools that Bruce is complaining about.
    And we only have a few people competent to do the work.  (With all
    due respect, Ross, if you weren't already aware of the implications
    for mdblindwrt, I have to wonder what else you missed.)
    
    Filename == OID is so simple, reliable, and straightforward by
    comparison that I think the decision is a no-brainer.
    
    If we could afford to sink unlimited time into this one issue then
    it might make sense to do it the hard way, but we have enough
    important stuff on our TODO list to keep us all busy for years ---
    I cannot believe that it's an effective use of our time to do this.
    
    
    > Hmm, what's all this with functions in catalog.c that are only called by
    > smgr/md.c? seems to me that anything having to do with physical storage
    > (like the path!) belongs in the smgr abstraction.
    
    Yeah, there's a bunch of stuff that should have been implemented by
    adding new smgr entry points, but wasn't.  It should be pushed down.
    (I can't resist pointing out that one of those things is physical
    relation rename, which will go away and not *need* to be pushed down
    if we do it the way I want.)
    
    			regards, tom lane
    
    
  60. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-15T23:57:05Z

    Bruce Momjian <pgman@candle.pha.pa.us> writes:
    >> Gee, so dogmatic. No one besides Bruce and Hiroshi discussed this _at
    >> all_ when I first put up patches two month ago. O.K., I'll do the oids
    >> only version (and fix up relpath_blind)
    
    > Hold on.  I don't think we want that work done yet.  Seems even Tom is
    > thinking that if Vadim is going to re-do everything later anyway, we may
    > be better with a relname/oid solution that does require additional
    > administration apps.
    
    Don't put words in my mouth, please.  If we are going to throw the
    work away later, it'd be foolish to do the much greater amount of
    work needed to make filename=relname+OID fly than is needed for
    filename=OID.
    
    However, I'm pretty sure I recall Vadim stating that he thought
    filename=OID would be required for his smgr changes anyway...
    
    			regards, tom lane
    
    
  61. RE: Big 7.1 open items

    Hiroshi Inoue <inoue@tpf.co.jp> — 2000-06-16T00:28:14Z

    > -----Original Message-----
    > From: pgsql-hackers-owner@hub.org [mailto:pgsql-hackers-owner@hub.org]On
    > Behalf Of Tom Lane
    > 
    > "Ross J. Reedstrom" <reedstrm@rice.edu> writes:
    > > On Thu, Jun 15, 2000 at 03:11:52AM -0400, Tom Lane wrote:
    > >> "Ross J. Reedstrom" <reedstrm@rice.edu> writes:
    > >>>> Any strong objections to the mixed relname_oid solution?
    > >> 
    > >> Yes!
    > 
    > > The plan here was to let VACUUM handle renaming the file, since it
    > > will already have all the necessary locks. This shortens the window
    > > of confusion.  ALTER TABLE RENAME doesn't happen that often, really - 
    > > the relname is there just for human consumption, then.
    > 
    > Yeah, I've seen tons of discussion of how if we do this, that, and
    > the other thing, and be prepared to fix up some other things in case
    > of crash recovery, we can make it work with filename == relname + OID
    > (where relname tracks logical name, at least at some remove).
    >
    
    I've seen little discussion of how to avoid the use of naming rule.
    I've proposed many times that we should keep the information
    where the table is stored in our database itself. I've never seen
    clear objections to it. So I could understand my proposal is OK ? 
    Isn't it much more important than naming rule ?  Under the
    mechanism,we could easily replace bad naming rule.
    And I believe that Ross's work is mostly around the mechanism
    not naming rule. 
    
    Now I like neither relname nor oid because it's not sufficient 
    for my purpose.
      
    Regards.
    
    Hiroshi Inoue
    Inoue@tpf.co.jp
    
    
  62. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-16T01:57:27Z

    "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
    > Now I like neither relname nor oid because it's not sufficient 
    > for my purpose.
    
    We should probably not do much of anything with this issue until
    we have a clearer understanding of what we want to do about
    tablespaces and schemas.
    
    My gut feeling is that we will end up with pathnames that look
    something like
    
    .../data/base/DBNAME/TABLESPACE/OIDOFRELATION
    
    (with .N attached if a segment of a large relation, of course).
    
    The TABLESPACE "name" should likely be an OID itself, but it wouldn't
    have to be if you are willing to say that tablespaces aren't renamable.
    (Come to think of it, does anyone care about being able to rename
    databases?  ;-))  Note that the TABLESPACE will often be a symlink
    to storage on another drive, rather than a plain subdirectory of the
    DBNAME, but that shouldn't be an issue at this level of discussion.
    
    I think that schemas probably don't enter into this.  We should instead
    rely on the uniqueness of OIDs to prevent filename collisions.  However,
    OIDs aren't really unique: different databases in an installation will
    use the same OIDs for their system tables.  My feeling is that we can
    live with a restriction like "you can't store the system tables of
    different databases in the same tablespace".  Alternatively we could
    avoid that issue by inverting the pathname order:
    
    .../data/base/TABLESPACE/DBNAME/OIDOFRELATION
    
    Note that in any case, system tables will have to live in a
    predetermined tablespace, since you can't very well look in pg_class
    to find out which tablespace pg_class lives in.  Perhaps we should
    just reserve a tablespace per database for system tables and forget
    the whole issue.  If we do that, there's not really any need for
    the database in the path!  Just
    
    .../data/base/TABLESPACE/OIDOFRELATION
    
    would do fine and help reduce lookup overhead.
    
    BTW, schemas do make things interesting for the other camp:
    is it possible for the same table to be referenced by different
    names in different schemas?  If so, just how useful is it to pick
    one of those names arbitrarily for the filename?  This is an advanced
    version of the main objection to using the original relname and not
    updating it at RENAME TABLE --- sooner or later, the filenames are
    going to be more confusing than helpful.
    
    Comments?  Have I missed something important about schemas?
    
    			regards, tom lane
    
    
  63. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-16T02:24:52Z

    > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
    > > Now I like neither relname nor oid because it's not sufficient 
    > > for my purpose.
    > 
    > We should probably not do much of anything with this issue until
    > we have a clearer understanding of what we want to do about
    > tablespaces and schemas.
    
    Here is an analysis of our options:
    
                              Work required             Disadvantages
    ----------------------------------------------------------------------------
    
    Keep current system       no work                   rename/create no rollback
    
    relname/oid but           less work                 new pg_class column,
    no rename change                                    filename not accurate on
                                                        rename
    
    relname/oid with          more work                 complex code
    rename change during      
    vacuum
    
    oid filename              less work, but            confusing to admins
                              need admin tools          
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  64. RE: Big 7.1 open items

    Hiroshi Inoue <inoue@tpf.co.jp> — 2000-06-16T02:43:52Z

    Sorry for my previous mail. It was posted by my mistake.
    
    > -----Original Message-----
    > From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
    > 
    > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
    > > Now I like neither relname nor oid because it's not sufficient 
    > > for my purpose.
    > 
    > We should probably not do much of anything with this issue until
    > we have a clearer understanding of what we want to do about
    > tablespaces and schemas.
    > 
    > My gut feeling is that we will end up with pathnames that look
    > something like
    >
    > .../data/base/DBNAME/TABLESPACE/OIDOFRELATION
    >
    
    Schema is a logical concept and irrevant to physical location.
    I strongly object your suggestion unless above means *default*
    location.
    Tablespace is an encapsulation of table allocation and the 
    name should be irrevant to the location basically. So above
    seems very bad for me.
    
    Anyway I don't see any advantage in fixed mapping impleme
    ntation. After renewal,we should at least have a possibility to
    allocate a specific table in arbitrary separate directory.
    
    Regards.
    
    Hiroshi Inoue
    Inoue@tpf.co.jp
    
    
  65. RE: Big 7.1 open items

    Hiroshi Inoue <inoue@tpf.co.jp> — 2000-06-16T03:20:16Z

    > -----Original Message-----
    > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
    > 
    > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
    > > > Now I like neither relname nor oid because it's not sufficient 
    > > > for my purpose.
    > > 
    > > We should probably not do much of anything with this issue until
    > > we have a clearer understanding of what we want to do about
    > > tablespaces and schemas.
    > 
    > Here is an analysis of our options:
    > 
    >                           Work required             Disadvantages
    > ------------------------------------------------------------------
    > ----------
    > 
    > Keep current system       no work                   rename/create 
    > no rollback
    > 
    > relname/oid but           less work                 new pg_class column,
    > no rename change                                    filename not 
    > accurate on
    >                                                     rename
    > 
    > relname/oid with          more work                 complex code
    > rename change during      
    > vacuum
    > 
    > oid filename              less work, but            confusing to admins
    >                           need admin tools          
    >
    
    Please add my opinion for naming rule.
    
    relname/unique_id but	need some work		new pg_class column,	
    no relname change.	for unique-id generation	filename not relname
    
    Regards.
    
    Hiroshi Inoue
    Inoue@tpf.co.jp
    
    
  66. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-16T03:35:21Z

    "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
    > Tablespace is an encapsulation of table allocation and the 
    > name should be irrevant to the location basically. So above
    > seems very bad for me.
    > Anyway I don't see any advantage in fixed mapping impleme
    > ntation. After renewal,we should at least have a possibility to
    > allocate a specific table in arbitrary separate directory.
    
    Call a "directory" a "tablespace" and we're on the same page,
    aren't we?  Actually I'd envision some kind of admin command
    "CREATE TABLESPACE foo AS /path/to/wherever". That would make
    appropriate system catalog entries and also create a symlink
    from ".../data/base/foo" (or some such place) to the target
    directory.  Then when we make a table in that tablespace,
    it's in the right place.  Problem solved, no?
    
    It gets a little trickier if you want to be able to split
    multi-gig tables across several tablespaces, though, since
    you couldn't just append ".N" to the base table path in that
    scenario.
    
    I'd be interested to know what sort of facilities Oracle
    provides for managing huge tables...
    
    			regards, tom lane
    
    
  67. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-16T03:43:41Z

    "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
    > Please add my opinion for naming rule.
    
    > relname/unique_id but	need some work		new pg_class column,	
    > no relname change.	for unique-id generation	filename not relname
    
    Why is a unique ID better than --- or even different from ---
    using the relation's OID?  It seems pointless to me...
    
    			regards, tom lane
    
    
  68. RE: Big 7.1 open items

    Hiroshi Inoue <inoue@tpf.co.jp> — 2000-06-16T03:57:44Z

    > -----Original Message-----
    > From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
    > 
    > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
    > > Please add my opinion for naming rule.
    > 
    > > relname/unique_id but	need some work		new 
    > pg_class column,	
    > > no relname change.	for unique-id generation	filename not relname
    > 
    > Why is a unique ID better than --- or even different from ---
    > using the relation's OID?  It seems pointless to me...
    >
    
    For example,in the implementation of CLUSTER command,
    we would need another new file for the target relation in
    order to put sorted rows but don't we want to change the
    OID ? It would be needed for table re-construction generally.
    If I remember correectly,you once proposed OID+version
    naming for the cases.
    
    Regards.
    
    Hiroshi Inoue
    Inoue@tpf.co.jp
    
    
  69. RE: Big 7.1 open items

    Hiroshi Inoue <inoue@tpf.co.jp> — 2000-06-16T05:35:21Z

    > -----Original Message-----
    > From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
    > 
    > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
    > > Tablespace is an encapsulation of table allocation and the 
    > > name should be irrevant to the location basically. So above
    > > seems very bad for me.
    > > Anyway I don't see any advantage in fixed mapping impleme
    > > ntation. After renewal,we should at least have a possibility to
    > > allocate a specific table in arbitrary separate directory.
    > 
    > Call a "directory" a "tablespace" and we're on the same page,
    > aren't we?  Actually I'd envision some kind of admin command
    > "CREATE TABLESPACE foo AS /path/to/wherever". 
    
    Yes,I think 'tablespace -> directory' is the most natural
    extension under current file_per_table storage manager.
    If many_tables_in_a_file storage manager is introduced,we
    may be able to change the definiiton of TABLESPACE
    to 'tablespace -> files'  like Oracle.
    
    > That would make
    > appropriate system catalog entries and also create a symlink
    > from ".../data/base/foo" (or some such place) to the target
    > directory.
    > Then when we make a table in that tablespace,
    > it's in the right place.  Problem solved, no?
    > 
    
    I don't like symlink for dbms data files. However it may
    be OK,If symlink are limited to 'tablespace->directory'
    corrspondence and all tablespaces(including default
    etc) are symlink.  It is simple and all debugging would
    be processed under tablespace_is_symlink environment.
    
    > It gets a little trickier if you want to be able to split
    > multi-gig tables across several tablespaces, though, since
    > you couldn't just append ".N" to the base table path in that
    > scenario.
    >
    
    This seems to be not that easy to solve now.
    Ross doesn't change this naming rule for multi-gig
    tables either in his trial.
     
    > I'd be interested to know what sort of facilities Oracle
    > provides for managing huge tables...
    >
    
    In my knowledge about old Oracle,one TABLESPACE
    could have many DATAFILEs which could contain
    many tables.
     
    Regards.
    
    Hiroshi Inoue
    Inoue@tpf.co.jp 
    
    
  70. Re: Big 7.1 open items

    Chris <chrisb@nimrod.itg.telstra.com.au> — 2000-06-16T05:36:04Z

    Tom Lane wrote:
    
    > >         <dbroot>/catalog_tables/pg_...
    > >         <dbroot>/catalog_index/pg_...
    > >         <dbroot>/user_tables/oid_...
    > >         <dbroot>/user_index/oid_...
    > >         <dbroot>/temp_tables/oid_...
    > >         <dbroot>/temp_index/oid_...
    > >         <dbroot>/toast_tables/oid_...
    > >         <dbroot>/toast_index/oid_...
    > >         <dbroot>/whatnot_???/...
    > 
    > I don't see a lot of value in that.  Better to do something like
    > tablespaces:
    > I don't see a lot of value in that.  Better to do something like
    > tablespaces:
    >
    >        <dbroot>/<oidoftablespace>/<oidofobject>
    
    What is the benefit of having oidoftablespace in the directory path?
    Isn't tablespace an idea so you can store it somewhere completely
    different?
    Or is there some symlink idea or something?
    
    
  71. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-16T05:54:46Z

    "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
    >> Why is a unique ID better than --- or even different from ---
    >> using the relation's OID?  It seems pointless to me...
    
    > For example,in the implementation of CLUSTER command,
    > we would need another new file for the target relation in
    > order to put sorted rows but don't we want to change the
    > OID ? It would be needed for table re-construction generally.
    > If I remember correectly,you once proposed OID+version
    > naming for the cases.
    
    Hmm, so you are thinking that the pg_class row for the table would
    include this uniqueID, and then committing the pg_class update would
    be the atomic action that replaces the old table contents with the
    new?  It does have some attraction now that I think about it.
    
    But there are other ways we could do the same thing.  If we want to
    have tablespaces, there will need to be a tablespace identifier in
    each pg_class row.  So we could do CLUSTER in the same way as we'd
    move a table from one tablespace to another: create the new files in
    the new tablespace directory, and the commit of the new pg_class row
    with the new tablespace value is the atomic action that makes the new
    files valid and the old files not.
    
    You will probably say "but I didn't want to move my table to a new
    tablespace just to cluster it!"  I think we could live with that,
    though.  A tablespace doesn't need to have any existence more concrete
    than a subdirectory, in my vision of the way things would work.  We 
    could do something like making two subdirectories of each place that
    the dbadmin designates as a "tablespace", so that we make two logical
    tablespaces out of what the dbadmin thinks of as one.  Then we can
    ping-pong between those directories to do things like clustering "in
    place".
    
    Basically I want to keep the bottom-level mechanisms as simple and
    reliable as we possibly can.  The fewer concepts are known down at
    the bottom, the better.  If we can keep the pathname constituents
    to just "tablespace" and "relation OID" we'll be in great shape ---
    but each additional concept that has to be known down there is
    another potential problem.
    
    			regards, tom lane
    
    
  72. RE: Big 7.1 open items

    Hiroshi Inoue <inoue@tpf.co.jp> — 2000-06-16T07:03:06Z

    > -----Original Message-----
    > From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
    > 
    > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
    > >> Why is a unique ID better than --- or even different from ---
    > >> using the relation's OID?  It seems pointless to me...
    > 
    > > For example,in the implementation of CLUSTER command,
    > > we would need another new file for the target relation in
    > > order to put sorted rows but don't we want to change the
    > > OID ? It would be needed for table re-construction generally.
    > > If I remember correectly,you once proposed OID+version
    > > naming for the cases.
    > 
    > Hmm, so you are thinking that the pg_class row for the table would
    > include this uniqueID, 
    
    No,I just include the place where the table is stored(pathname under
    current file_per_table storage manager) in the pg_class row because
    I don't want to rely on table allocating rule(naming rule for current)
    to access existent relation files. This has always been my main point.
    Many_tables_in_a_file storage manager wouldn't be able to live without
    keeping this kind of infomation.
    This information(where it is stored) is diffrent from tablespace(where
    to store) information. There was an idea to keep the information into
    opaque entry in pg_class which only a specific storage manager
    could handle. There was an idea to have a new system table which
    keeps the information. and so on...
    
    > and then committing the pg_class update would
    > be the atomic action that replaces the old table contents with the
    > new?  It does have some attraction now that I think about it.
    > 
    > But there are other ways we could do the same thing.  If we want to
    > have tablespaces, there will need to be a tablespace identifier in
    > each pg_class row.  So we could do CLUSTER in the same way as we'd
    > move a table from one tablespace to another: create the new files in
    > the new tablespace directory, and the commit of the new pg_class row
    > with the new tablespace value is the atomic action that makes the new
    > files valid and the old files not.
    > 
    > You will probably say "but I didn't want to move my table to a new
    > tablespace just to cluster it!" 
    
    Yes.
    
    > I think we could live with that,
    > though.  A tablespace doesn't need to have any existence more concrete
    > than a subdirectory, in my vision of the way things would work.  We 
    > could do something like making two subdirectories of each place that
    > the dbadmin designates as a "tablespace", so that we make two logical
    > tablespaces out of what the dbadmin thinks of as one. 
    
    Certainly we could design TABLESPACE(where to store) as above.
    
    > Then we can
    > ping-pong between those directories to do things like clustering "in
    > place".
    >
    
    But maybe we must keep the directory information where the table was 
    *ping-ponged* in (e.g.) pg_class. Is such an implementation cleaner or
    more extensible than mine(keeping the stored place exactly) ?   
     
    Regards.
    
    Hiroshi Inoue
    Inoue@tpf.co.jp
    
    
  73. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-16T07:34:47Z

    Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes:
    > Tom Lane wrote:
    >> I don't see a lot of value in that.  Better to do something like
    >> tablespaces:
    >> 
    >> <dbroot>/<oidoftablespace>/<oidofobject>
    
    > What is the benefit of having oidoftablespace in the directory path?
    > Isn't tablespace an idea so you can store it somewhere completely
    > different?
    > Or is there some symlink idea or something?
    
    Exactly --- I'm assuming that the tablespace "directory" is likely
    to be a symlink to some other mounted volume.  The point here is
    to keep the low-level file access routines from having to know very
    much about tablespaces or file organization.  In the above proposal,
    all they need to know is the relation's OID and the name (or OID)
    of the tablespace the relation's assigned to; then they can form
    a valid path using a hardwired rule.  There's still plenty of
    flexibility of organization, but it's not necessary to know that
    where the rubber meets the road (eg, when you're down inside mdblindwrt
    trying to dump a dirty buffer to disk with no spare resources to find
    out anything about the relation the page belongs to...)
    
    			regards, tom lane
    
    
  74. Re: Big 7.1 open items

    Jan Wieck <janwieck@t-online.de> — 2000-06-16T12:42:12Z

    Tom Lane wrote:
    >
    > It gets a little trickier if you want to be able to split
    > multi-gig tables across several tablespaces, though, since
    > you couldn't just append ".N" to the base table path in that
    > scenario.
    >
    > I'd be interested to know what sort of facilities Oracle
    > provides for managing huge tables...
    
        Oracle  tablespaces  are  a  collection of 1...n preallocated
        files.   Each  table  then  is  bound  to  a  tablespace  and
        allocates extents (chunks) from those files.
    
        There  are  some per table attributes that control the extent
        sizes with default values coming from  the  tablespace.   The
        initial  extent  size,  the  nextextent  and the pctincrease.
        There is a hardcoded limit for the number of extents a  table
        can  have at all.  In Oracle7 it was 512 (or somewhat below -
        don't recall correct). Maybe that's gone with Oracle8,  don't
        know.
    
        This  storage  concept  has  IMHO  a couple of advatages over
        ours.
    
            The tablespace files  are  preallocated,  so  there  will
            never  be a change in block allocation during runtime and
            that's the base  for  fdatasync()  beeing  sufficient  at
            syncpoints. All what might be inaccurate after a crash is
            the last modified time in the inode, and  that's  totally
            irrelevant  for  Oracle.  The  fsck  will never fail, and
            anything is up to Oracle's recovery.
    
            The number of total tablespace  files  is  limited  to  a
            value  that  ensures, that the backends can keep them all
            open all the time. It's hard  to  exceed  that  limit.  A
            typical   SAP   installation   with   more   than  20,000
            tables/indices doesn't need more than 30 or 40 of them.
    
            It  is  perfectly  prepared  for  raw  devices,  since  a
            tablespace in a raw device installation is simply an area
            of blocks on a disk.
    
        There are also disadvantages.
    
            You can run out of space even if there  are  plenty  GB's
            free  on  your  disks.   You  have  to create tablespaces
            explicitly.
    
            If you've choosen inadequate extent size parameters,  you
            end  up with high fragmented tables (slowing down) or get
            stuck with running against maxextents, where only a reorg
            (export/import) helps.
    
    
    Jan
    
    --
    
    #======================================================================#
    # It's easier to get forgiveness for being wrong than for being right. #
    # Let's break this rule - forgive me.                                  #
    #================================================== JanWieck@Yahoo.com #
    
    
    
    
  75. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-16T15:00:35Z

    JanWieck@t-online.de (Jan Wieck) writes:
    >     There are also disadvantages.
    
    >         You can run out of space even if there  are  plenty  GB's
    >         free  on  your  disks.   You  have  to create tablespaces
    >         explicitly.
    
    Not to mention the reverse: if I read this right, you have to suck
    up your GB's long in advance of actually needing them.  That's OK
    for a machine that's dedicated to Oracle ... not so OK for smaller
    installations, playpens, etc.
    
    I'm not convinced that there's anything fundamentally wrong with
    doing storage allocation in Unix files the way we have been.
    
    (At least not when we're sitting atop a well-done filesystem,
    which may leave the Linux folk out in the cold ;-).)
    
    			regards, tom lane
    
    
  76. Re: Big 7.1 open items

    Thomas Lockhart <lockhart@alumni.caltech.edu> — 2000-06-16T15:11:27Z

    > (At least not when we're sitting atop a well-done filesystem,
    > which may leave the Linux folk out in the cold ;-).)
    
    Those who live in HP houses should not throw stones :))
    
                   - Thomas
    
    
  77. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-16T15:46:41Z

    JanWieck@t-online.de (Jan Wieck) writes:
    > Tom Lane wrote:
    >> It gets a little trickier if you want to be able to split
    >> multi-gig tables across several tablespaces, though, since
    >> you couldn't just append ".N" to the base table path in that
    >> scenario.
    >> 
    >> I'd be interested to know what sort of facilities Oracle
    >> provides for managing huge tables...
    
    >     Oracle  tablespaces  are  a  collection of 1...n preallocated
    >     files.   Each  table  then  is  bound  to  a  tablespace  and
    >     allocates extents (chunks) from those files.
    
    OK, to get back to the point here: so in Oracle, tables can't cross
    tablespace boundaries, but a tablespace itself could span multiple
    disks?
    
    Not sure if I like that better or worse than equating a tablespace
    with a directory (so, presumably, all the files within it live on
    one filesystem) and then trying to make tables able to span
    tablespaces.  We will need to do one or the other though, if we want
    to have any significant improvement over the current state of affairs
    for large tables.
    
    One way is to play the flip-the-path-ordering game some more,
    and access multiple-segment tables with pathnames like this:
    
    	.../TABLESPACE/RELATION		-- first or only segment
    	.../TABLESPACE/N/RELATION	-- N'th extension segment
    
    This isn't any harder for md.c to deal with than what we do now,
    but by making the /N subdirectories be symlinks, the dbadmin could
    easily arrange for extension segments to go on different filesystems.
    Also, since /N subdirectory symlinks can be added as needed,
    expanding available space by attaching more disks isn't hard.
    (If the admin hasn't pre-made a /N symlink when it's needed,
    I'd envision the backend just automatically creating a plain
    subdirectory so that it can extend the table.)
    
    A limitation is that the N'th extension segments of all the relations
    in a given tablespace have to be in the same place, but I don't see
    that as a major objection.  Worst case is you make a separate tablespace
    for each of your multi-gig relations ... you're probably not going to
    have a very large number of such relations, so this doesn't seem like
    unmanageable admin complexity.
    
    We'd still want to create some tools to help the dbadmin with slinging
    all these symlinks around, of course.  But I think it's critical to keep
    the low-level file access protocol simple and reliable, which really
    means minimizing the amount of information the backend needs to know to
    figure out which file to write a page in.  With something like the above
    you only need to know the tablespace name (or more likely OID), the
    relation OID (+name or not, depending on outcome of other argument),
    and the offset in the table.  No worse than now from the software's
    point of view.
    
    Comments?
    
    			regards, tom lane
    
    
  78. Re: Big 7.1 open items

    Thomas Lockhart <lockhart@alumni.caltech.edu> — 2000-06-16T16:27:22Z

    > ...  But I think it's critical to keep
    > the low-level file access protocol simple and reliable, which really
    > means minimizing the amount of information the backend needs to know 
    > to figure out which file to write a page in.  With something like the 
    > above you only need to know the tablespace name (or more likely OID), 
    > the relation OID (+name or not, depending on outcome of other 
    > argument), and the offset in the table.  No worse than now from the 
    > software's point of view.
    > Comments?
    
    I'm probably missing the context a bit, but imho we should try hard to
    stay away from symlinks as the general solution for anything.
    
    Sorry for being behind here, but to make sure I'm on the right page:
    o tablespaces decouple storage from logical tables
    o a database lives in a default tablespace, unless specified
    o by default, a table will live in the default tablespace
    o (eventually) a table can be split across tablespaces
    
    Some thoughts:
    o the ability to split single tables across disks was essential for
    scalability when disks were small. But with RAID, NAS, etc etc isn't
    that a smaller issue now?
    o "tablespaces" would implement our less-developed "with location"
    feature, right? Splitting databases, whole indices and whole tables
    across storage is the biggest win for this work since more users will
    use the feature.
    o location information needs to travel with individual tables anyway.
    
    
  79. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-16T16:35:07Z

    >     There are also disadvantages.
    > 
    >         You can run out of space even if there  are  plenty  GB's
    >         free  on  your  disks.   You  have  to create tablespaces
    >         explicitly.
    > 
    >         If you've choosen inadequate extent size parameters,  you
    >         end  up with high fragmented tables (slowing down) or get
    >         stuck with running against maxextents, where only a reorg
    >         (export/import) helps.
    
    Also, Tom Lane pointed out to me that file system read-ahead does not
    help if your table is spread around in tablespaces.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  80. Re: Big 7.1 open items

    Marc G. Fournier <scrappy@hub.org> — 2000-06-16T16:50:37Z

    On Thu, 15 Jun 2000, Bruce Momjian wrote:
    
    > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
    > > > Now I like neither relname nor oid because it's not sufficient 
    > > > for my purpose.
    > > 
    > > We should probably not do much of anything with this issue until
    > > we have a clearer understanding of what we want to do about
    > > tablespaces and schemas.
    > 
    > Here is an analysis of our options:
    > 
    >                           Work required             Disadvantages
    > ----------------------------------------------------------------------------
    > 
    > Keep current system       no work                   rename/create no rollback
    > 
    > relname/oid but           less work                 new pg_class column,
    > no rename change                                    filename not accurate on
    >                                                     rename
    > 
    > relname/oid with          more work                 complex code
    > rename change during      
    > vacuum
    > 
    > oid filename              less work, but            confusing to admins
    >                           need admin tools          
    
    My vote is with Tom on this one ... oid only ... the admin should be able
    to do a quick SELECT on a table to find out the OID->table mapping, and I
    believe its already been pointed out that you cant' just restore one file
    anyway, so it kinda negates the "server isn't running problem" ...
    
    
    
    
    
  81. Re: Big 7.1 open items

    Marc G. Fournier <scrappy@hub.org> — 2000-06-16T16:52:27Z

    On Thu, 15 Jun 2000, Tom Lane wrote:
    
    > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
    > > Please add my opinion for naming rule.
    > 
    > > relname/unique_id but	need some work		new pg_class column,	
    > > no relname change.	for unique-id generation	filename not relname
    > 
    > Why is a unique ID better than --- or even different from ---
    > using the relation's OID?  It seems pointless to me...
    
    just to open up a whole new bucket of worms here, but ... if we do use OID
    (which up until this thought I endorse 100%) ... do we not run a risk if
    we run out of OIDs?  As far as I know, those are still a finite resource,
    no? 
    
    or, do we just assume that by the time that comes, everyone will be pretty
    much using 64bit machines? :)
    
    
    
    
  82. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-16T16:54:00Z

    Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
    >> ...  But I think it's critical to keep
    >> the low-level file access protocol simple and reliable, which really
    >> means minimizing the amount of information the backend needs to know 
    >> to figure out which file to write a page in.  With something like the 
    >> above you only need to know the tablespace name (or more likely OID), 
    >> the relation OID (+name or not, depending on outcome of other 
    >> argument), and the offset in the table.  No worse than now from the 
    >> software's point of view.
    >> Comments?
    
    > I'm probably missing the context a bit, but imho we should try hard to
    > stay away from symlinks as the general solution for anything.
    
    Why?
    
    			regards, tom lane
    
    
  83. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-16T17:08:38Z

    The Hermit Hacker <scrappy@hub.org> writes:
    > just to open up a whole new bucket of worms here, but ... if we do use OID
    > (which up until this thought I endorse 100%) ... do we not run a risk if
    > we run out of OIDs?  As far as I know, those are still a finite resource,
    > no? 
    
    They are, and there is some risk involved, but OID collisions in the
    system tables will cause you just as much headache.  There's not only
    the pg_class row to think of, but the pg_attribute rows, etc etc.
    
    If you did have an OID collision with an existing table you'd have to
    keep trying until you got a set of OID assignments with no conflicts.
    (Now that we have unique indexes on the system tables, this should
    work properly, ie, you will hear about it if you have a conflict.)
    I don't think the physical table names make this noticeably worse.
    Of course we'd better be careful to create table files with O_EXCL,
    so as not to tromp on existing files, but we do that already IIRC.
    
    > or, do we just assume that by the time that comes, everyone will be pretty
    > much using 64bit machines? :)
    
    I think we are not too far away from being able to offer 64-bit OID as
    a compile-time option (on machines where there is a 64-bit integer type
    that is).  It's just a matter of someone putting it at the head of their
    todo list.
    
    Bottom line is I'm not real worried about this issue.
    
    But having said all that, I am coming round to agree with Hiroshi's idea
    anyway.  See upcoming message.
    
    			regards, tom lane
    
    
  84. Re: Big 7.1 open items

    Don Baccus <dhogaza@pacifier.com> — 2000-06-16T17:50:23Z

    At 11:46 AM 6/16/00 -0400, Tom Lane wrote:
    
    >OK, to get back to the point here: so in Oracle, tables can't cross
    >tablespace boundaries,
    
    Right, the construct AFAIK is "create table/index foo on tablespace ..."
    
    > but a tablespace itself could span multiple
    >disks?
    
    Right.
    
    >Not sure if I like that better or worse than equating a tablespace
    >with a directory (so, presumably, all the files within it live on
    >one filesystem) and then trying to make tables able to span
    >tablespaces.  We will need to do one or the other though, if we want
    >to have any significant improvement over the current state of affairs
    >for large tables.
    
    Oracle's way does a reasonable job of isolating the datamodel
    from the details of the physical layout.
    
    Take the OpenACS web toolkit, for instance.  We could take
    each module's tables and indices and assign them appropriately
    to various dataspaces, then provide a separate .sql files with
    only "create tablespace" statements in there.
    
    By modifying that one central file, the toolkit installation
    could be customized to run anything from a small site (one
    disk with everything on it, ala my own personal webserver at
    birdnotes.net) or a very large site with many spindles, with
    various index and table structures spread out widely hither
    and thither.
    
    Given that the OpenACS datamodel is nearly 10K lines long (including
    many comments, of course), being able to customize an installation
    to such a degree by modifying a single file filled with "create
    tablespaces" would be very attractive.
    
    >One way is to play the flip-the-path-ordering game some more,
    >and access multiple-segment tables with pathnames like this:
    >
    >	.../TABLESPACE/RELATION		-- first or only segment
    >	.../TABLESPACE/N/RELATION	-- N'th extension segment
    >
    >This isn't any harder for md.c to deal with than what we do now,
    >but by making the /N subdirectories be symlinks, the dbadmin could
    >easily arrange for extension segments to go on different filesystems.
    
    I personally dislike depending on symlinks to move stuff around.
    Among other things, a pg_dump/restore (and presumably future 
    backup tools?) can't recreate the disk layout automatically.
    
    >We'd still want to create some tools to help the dbadmin with slinging
    >all these symlinks around, of course.
    
    OK, if symlinks are simply an implementation detail hidden from the
    dbadmin, and if the physical structure is kept in the db so it can
    be rebuilt if necessary automatically, then I don't mind symlinks.
    
    > But I think it's critical to keep
    >the low-level file access protocol simple and reliable, which really
    >means minimizing the amount of information the backend needs to know to
    >figure out which file to write a page in.  With something like the above
    >you only need to know the tablespace name (or more likely OID), the
    >relation OID (+name or not, depending on outcome of other argument),
    >and the offset in the table.  No worse than now from the software's
    >point of view.
    
    Make the code that creates and otherwise manipulates tablespaces
    do the work, while keeping the low-level file access protocol simple.
    
    Yes, this approach sounds very good to me.
    
    
    
    - Don Baccus, Portland OR <dhogaza@pacifier.com>
      Nature photos, on-line guides, Pacific Northwest
      Rare Bird Alert Service and other goodies at
      http://donb.photo.net.
    
    
  85. Re: Big 7.1 open items

    Don Baccus <dhogaza@pacifier.com> — 2000-06-16T18:14:35Z

    At 04:27 PM 6/16/00 +0000, Thomas Lockhart wrote:
    
    >Sorry for being behind here, but to make sure I'm on the right page:
    >o tablespaces decouple storage from logical tables
    >o a database lives in a default tablespace, unless specified
    >o by default, a table will live in the default tablespace
    >o (eventually) a table can be split across tablespaces
    
    Or tablespaces across filesystems/mountpoints whatever.
    
    >Some thoughts:
    >o the ability to split single tables across disks was essential for
    >scalability when disks were small. But with RAID, NAS, etc etc isn't
    >that a smaller issue now?
    
    Yes for size issues, I should think, especially if you have the 
    money for a large RAID subsystem.  But for throughput performance,
    control over which spindles particularly busy tables and indices
    go on would still seem to be pretty relevant, when they're being
    updated a lot.  In order to minimize seek times.
    
    I really can't say how important this is in reality.  Oracle-world
    folks still talk about this kind of optimization being important,
    but I'm not personally running any kind of database-backed website
    that's busy enough or contains enough storage to worry about it.
    
    >o "tablespaces" would implement our less-developed "with location"
    >feature, right? Splitting databases, whole indices and whole tables
    >across storage is the biggest win for this work since more users will
    >use the feature.
    >o location information needs to travel with individual tables anyway.
    
    
    
    - Don Baccus, Portland OR <dhogaza@pacifier.com>
      Nature photos, on-line guides, Pacific Northwest
      Rare Bird Alert Service and other goodies at
      http://donb.photo.net.
    
    
  86. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-16T19:00:10Z

    Don Baccus <dhogaza@pacifier.com> writes:
    >> This isn't any harder for md.c to deal with than what we do now,
    >> but by making the /N subdirectories be symlinks, the dbadmin could
    >> easily arrange for extension segments to go on different filesystems.
    
    > I personally dislike depending on symlinks to move stuff around.
    > Among other things, a pg_dump/restore (and presumably future 
    > backup tools?) can't recreate the disk layout automatically.
    
    Good point, we'd need some way of saving/restoring the tablespace
    structures.
    
    >> We'd still want to create some tools to help the dbadmin with slinging
    >> all these symlinks around, of course.
    
    > OK, if symlinks are simply an implementation detail hidden from the
    > dbadmin, and if the physical structure is kept in the db so it can
    > be rebuilt if necessary automatically, then I don't mind symlinks.
    
    I'm not sure about keeping it in the db --- creates a bit of a
    chicken-and-egg problem doesn't it?  Maybe there needs to be a
    "system database" that has nailed-down pathnames (no tablespaces
    for you baby) and contains the critical installation-wide tables
    like pg_database, pg_user, pg_tablespace.  A restore would have
    to restore these tables first anyway.
    
    > Make the code that creates and otherwise manipulates tablespaces
    > do the work, while keeping the low-level file access protocol simple.
    
    Right, that's the bottom line for me.
    
    			regards, tom lane
    
    
  87. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-16T19:06:17Z

    > >Some thoughts:
    > >o the ability to split single tables across disks was essential for
    > >scalability when disks were small. But with RAID, NAS, etc etc isn't
    > >that a smaller issue now?
    > 
    > Yes for size issues, I should think, especially if you have the 
    > money for a large RAID subsystem.  But for throughput performance,
    > control over which spindles particularly busy tables and indices
    > go on would still seem to be pretty relevant, when they're being
    > updated a lot.  In order to minimize seek times.
    > 
    > I really can't say how important this is in reality.  Oracle-world
    > folks still talk about this kind of optimization being important,
    > but I'm not personally running any kind of database-backed website
    > that's busy enough or contains enough storage to worry about it.
    
    It is important when you have a few big tables that must be fast.  One
    objection I have always had to the HP logical volume manager is that it
    is difficult to know what drives are being assigned to each logical
    volume.
    
    Seems if they don't have RAID, we should allow such drive partitioning.
    
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  88. Re: Big 7.1 open items

    Ross Reedstrom <reedstrm@rice.edu> — 2000-06-16T19:35:28Z

    On Fri, Jun 16, 2000 at 04:27:22PM +0000, Thomas Lockhart wrote:
    > > ...  But I think it's critical to keep
    > > the low-level file access protocol simple and reliable, which really
    > > means minimizing the amount of information the backend needs to know 
    > > to figure out which file to write a page in.  With something like the 
    > > above you only need to know the tablespace name (or more likely OID), 
    > > the relation OID (+name or not, depending on outcome of other 
    > > argument), and the offset in the table.  No worse than now from the 
    > > software's point of view.
    > > Comments?
    
    I think the backend needs a per table token that indicates how
    to get at the physical bits of the file. Whether that's a filename
    alone, filename with path, oid, key to a smgr hash table or something
    else, it's opaque above the smgr routines.
    
    Hmm, now I'm thinking, since the tablespace discussion has been reopened,
    the way to go about coding all this is to reactivate the smgr code: how
    about I leave the existing md smgr as is, and clone it, call it md2 or
    something, and start messing with adding features there?
    
    
    > 
    > I'm probably missing the context a bit, but imho we should try hard to
    > stay away from symlinks as the general solution for anything.
    > 
    > Sorry for being behind here, but to make sure I'm on the right page:
    > o tablespaces decouple storage from logical tables
    > o a database lives in a default tablespace, unless specified
    > o by default, a table will live in the default tablespace
    > o (eventually) a table can be split across tablespaces
    > 
    > Some thoughts:
    > o the ability to split single tables across disks was essential for
    > scalability when disks were small. But with RAID, NAS, etc etc isn't
    > that a smaller issue now?
    > o "tablespaces" would implement our less-developed "with location"
    > feature, right? Splitting databases, whole indices and whole tables
    > across storage is the biggest win for this work since more users will
    > use the feature.
    > o location information needs to travel with individual tables anyway.
    
    I was juist thinking that that discussion needed some summation.
    
    Some links to historic discussion: 
    
    This one is Vadim saying WAL will need oids names:
    http://www.postgresql.org/mhonarc/pgsql-hackers/1999-11/msg00809.html
    
    A longer discussion kicked off by Don Baccus:
    http://www.postgresql.org/mhonarc/pgsql-hackers/2000-01/msg00510.html
    
    Tom suggesting OIDs to allow rollback:
    http://www.postgresql.org/mhonarc/pgsql-hackers/2000-03/msg00119.html
    
    
    Martin Neumann posted an question on dataspaces:
    
    (can't find it in the offical archives:  looks like March 2000, 10-29 is
    missing. here's my copy: don't beat on it!  n particular, since I threw
    it together for local access, it's one _big_ index page)
    
    http://cooker.ir.rice.edu/postgresql/msg20257.html
    (in that thread is a post where I mention blindwrites and getting rid
    of GetRawDatabaseInfo)
    
    Martin later posted an RFD on tablespaces:
    
    http://cooker.ir.rice.edu/postgresql/msg20490.html
    
    Here's Horák Daniel with a patch for discussion, implementing dataspaces
    on a per database level:
    
    http://cooker.ir.rice.edu/postgresql/msg20498.html
    
    Ross
    -- 
    Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
    NSBRI Research Scientist/Programmer
    Computer and Information Technology Institute
    Rice University, 6100 S. Main St.,  Houston, TX 77005
    
    
  89. Re: Big 7.1 open items

    Don Baccus <dhogaza@pacifier.com> — 2000-06-16T19:37:36Z

    At 03:00 PM 6/16/00 -0400, Tom Lane wrote:
    
    >> OK, if symlinks are simply an implementation detail hidden from the
    >> dbadmin, and if the physical structure is kept in the db so it can
    >> be rebuilt if necessary automatically, then I don't mind symlinks.
    >
    >I'm not sure about keeping it in the db --- creates a bit of a
    >chicken-and-egg problem doesn't it? 
    
    Not if the tablespace creates preceeds the tables stored in them.
    
    > Maybe there needs to be a
    >"system database" that has nailed-down pathnames (no tablespaces
    >for you baby) and contains the critical installation-wide tables
    >like pg_database, pg_user, pg_tablespace.  A restore would have
    >to restore these tables first anyway.
    
    Oh, I see.  Yes, when I've looked into this and have thought about
    it I've assumed that there would always be a known starting point
    which would contain the installation-wide tables.
    
    From a practical point of view, I don't think that's really a
    problem.
    
    I've not looked into how Oracle does this, I assume it builds 
    a system tablespace on one of the initial mount points you give
    it when you install the thing.  The paths to the mount points
    are stored in specific files known to Oracle, I think.  It's 
    been over a year (not long enough!) since I've set up Oracle...
    
    
    
    
    - Don Baccus, Portland OR <dhogaza@pacifier.com>
      Nature photos, on-line guides, Pacific Northwest
      Rare Bird Alert Service and other goodies at
      http://donb.photo.net.
    
    
  90. Re: Big 7.1 open items

    Ross Reedstrom <reedstrm@rice.edu> — 2000-06-16T21:07:13Z

    On Thu, Jun 15, 2000 at 07:53:52PM -0400, Tom Lane wrote:
    > "Ross J. Reedstrom" <reedstrm@rice.edu> writes:
    > > On Thu, Jun 15, 2000 at 03:11:52AM -0400, Tom Lane wrote:
    > >> "Ross J. Reedstrom" <reedstrm@rice.edu> writes:
    > >>>> Any strong objections to the mixed relname_oid solution?
    > >> 
    > >> Yes!
    > 
    > > The plan here was to let VACUUM handle renaming the file, since it
    > > will already have all the necessary locks. This shortens the window
    > > of confusion.  ALTER TABLE RENAME doesn't happen that often, really - 
    > > the relname is there just for human consumption, then.
    > 
    > Yeah, I've seen tons of discussion of how if we do this, that, and
    > the other thing, and be prepared to fix up some other things in case
    > of crash recovery, we can make it work with filename == relname + OID
    > (where relname tracks logical name, at least at some remove).
    > 
    > Probably.  Assuming nobody forgets anything.
    
    I agree, it seems a major undertaking, at first glance. And second. Even
    third. Especially for someone who hasn't 'earned his spurs' yet. as
    it were.
    
    > I'm just trying to point out that that's a huge amount of pretty
    > delicate mechanism.  The amount of work required to make it trustworthy
    > looks to me to dwarf the admin tools that Bruce is complaining about.
    > And we only have a few people competent to do the work.  (With all
    > due respect, Ross, if you weren't already aware of the implications
    > for mdblindwrt, I have to wonder what else you missed.)
    
    Ah, you knew that comment would come back to haunt me (I have a
    tendency to think out loud, even if checking and coming back latter
    would be better;-) In fact, there's no problem, and never was, since the
    buffer->blind.relname is filled in via RelationGetPhysicalRelationName,
    just like every other path that requires direct file access. I just
    didn't remember that I had in fact checked it (it's been a couple months,
    and I just got back from vacation ;-)
    
    Actually, Once I re-checked it, the code looked very familiar. I had
    spent time looking at the blind write code in the context of getting
    rid of the only non-startup use of GetRawDatabaseInfo.
    
    As to missing things: I'm leaning heavily on Bruce's previous
    work for temp tables, to seperate the two uses of relname, via the
    RelationGetRelationName and RelationGetPhysicalRelationName. There are
    102 uses of the first in the current code (many in elog messages), and
    only 11 of the second. If I'd had to do the original work of finding
    every use of relname, and catagorizing it, I agree I'm not (yet) up to
    it, but I have more confidence in Bruce's  (already tested) work.
    
    > 
    > Filename == OID is so simple, reliable, and straightforward by
    > comparison that I think the decision is a no-brainer.
    > 
    
    Perhaps. Changing the label of the file on disk still requires finding
    all the code that assumes it knows what that name is, and changing it.
    Same work.
    
    > If we could afford to sink unlimited time into this one issue then
    > it might make sense to do it the hard way, but we have enough
    > important stuff on our TODO list to keep us all busy for years ---
    > I cannot believe that it's an effective use of our time to do this.
    > 
    
    The joys of Open Development. You've spent a fair amount of time trying
    to convince _me_ not to waste my time. Thanks, but I'm pretty bull headed
    sometimes. Since I've already done something of the work, take a look
    at what I've got, and then tell me I'm wasting my time, o.k.?
    
    > 
    > > Hmm, what's all this with functions in catalog.c that are only called by
    > > smgr/md.c? seems to me that anything having to do with physical storage
    > > (like the path!) belongs in the smgr abstraction.
    > 
    > Yeah, there's a bunch of stuff that should have been implemented by
    > adding new smgr entry points, but wasn't.  It should be pushed down.
    > (I can't resist pointing out that one of those things is physical
    > relation rename, which will go away and not *need* to be pushed down
    > if we do it the way I want.)
    > 
    
    Oh, I agree completely. In fact, As I said to Hiroshi last time this came
    up, I think of the field in pg_class an an opaque token, to be filled in
    by the smgr, and only used by code further up to hand back to the smgr
    routines. Same should be true of the buffer->blind struct.
    
    Ross
    -- 
    Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
    NSBRI Research Scientist/Programmer
    Computer and Information Technology Institute
    Rice University, 6100 S. Main St.,  Houston, TX 77005
    
    
    
  91. Re: Big 7.1 open items

    Kaare Rasmussen <kar@webline.dk> — 2000-06-16T23:02:49Z

    > (At least not when we're sitting atop a well-done filesystem,
    > which may leave the Linux folk out in the cold ;-).)
    
    Exactly what fs of Linux are you talking about? I believe that for a database
    server, ReiserFS would be a natural choice.
    
    -- 
    Kaare Rasmussen            --Linux, spil,--        Tlf:        3816 2582
    Kaki Data                tshirts, merchandize      Fax:        3816 2582
    Howitzvej 75               ben 14.00-18.00        Email: kar@webline.dk
    2000 Frederiksberg        Lrdag 11.00-17.00       Web:      www.suse.dk
    
    
  92. RE: Big 7.1 open items

    Hiroshi Inoue <inoue@tpf.co.jp> — 2000-06-16T23:11:08Z

    > -----Original Message-----
    > From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
    > 
    > JanWieck@t-online.de (Jan Wieck) writes:
    > >     There are also disadvantages.
    > 
    > >         You can run out of space even if there  are  plenty  GB's
    > >         free  on  your  disks.   You  have  to create tablespaces
    > >         explicitly.
    > 
    > Not to mention the reverse: if I read this right, you have to suck
    > up your GB's long in advance of actually needing them.  That's OK
    > for a machine that's dedicated to Oracle ... not so OK for smaller
    > installations, playpens, etc.
    >
    
    I've had an anxiety about the way like Oracle's preallocation.
    It had not been easy for me to estimate the extent size in
    Oracle.  Maybe it would lose the simplicity of environment
    settings which is one of the biggest advantage of PostgreSQL.
    It seems that we should also provide not_preallocated DATAFILE
    when many_tables_in_a_file storage manager is introduced.
    
    Regards.
    
    Hiroshi Inoue
    Inoue@tpf.co.jp
      
     
    
    
  93. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-16T23:16:37Z

    "Ross J. Reedstrom" <reedstrm@rice.edu> writes:
    > I think the backend needs a per table token that indicates how
    > to get at the physical bits of the file. Whether that's a filename
    > alone, filename with path, oid, key to a smgr hash table or something
    > else, it's opaque above the smgr routines.
    
    Except to the commands that provide the user interface for tablespaces
    and so forth.  And there aren't all that many places that deal with
    physical filenames anyway.  It would be a good idea to try to be a
    little stricter about this, but I'm not sure you can make the separation
    a whole lot cleaner than it is now ... with the exception of the obvious
    bogosities like "rename table" being done above the smgr level.  (But,
    as I said, I want to see that code go away, not just get moved into
    smgr...)
    
    > Hmm, now I'm thinking, since the tablespace discussion has been reopened,
    > the way to go about coding all this is to reactivate the smgr code: how
    > about I leave the existing md smgr as is, and clone it, call it md2 or
    > something, and start messing with adding features there?
    
    Um, well, you can't have it both ways.  If you're going to change/fix
    the assumptions of code above the smgr, then you've got to update md
    at the same time to match your new definition of the smgr interface.
    Won't do much good to have a playpen smgr if the "standard" one is
    broken.
    
    One thing I have been thinking would be a good idea is to take the
    relcache out of the bufmgr/smgr interfaces.  The relcache is a
    higher-level concept and ought not be known to bufmgr or smgr; they
    ought to work with some low-level data structure or token for relations.
    We might be able to eliminate the whole concept of "blind write" if we
    do that.  There are other problems with the relcache dependency: entries
    in relcache can get blown away at inopportune times due to shared cache
    inval, and it doesn't provide a good home for tokens for multiple
    "versions" of a relation if we go with the fill-a-new-physical-file
    approach to CLUSTER and so on.
    
    Hmm, if you replace relcache in the smgr interfaces with pointers to
    an smgr-maintained data structure, that might be the same thing that
    you are alluding to above about an smgr hash table.
    
    One thing *not* to do is add yet a third layer of data structure on
    top of the ones already maintained in fd.c and md.c.  Whatever extra
    data might be needed here should be added to md.c's tables, I think,
    and then the tokens used in the smgr interface would be pointers into
    that table.
    
    			regards, tom lane
    
    
  94. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-16T23:30:25Z

    "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
    > It seems that we should also provide not_preallocated DATAFILE
    > when many_tables_in_a_file storage manager is introduced.
    
    Several people in this thread have been talking like a
    single-physical-file storage manager is in our future, but I can't
    recall anyone saying that they were going to do such a thing or even
    presenting reasons why it'd be a good idea.
    
    Seems to me that physical file per relation is considerably better for
    our purposes.  It's easier to figure out what's going on for admin and
    debug work, it means less lock contention among different backends
    appending concurrently to different relations, and it gives the OS a
    better shot at doing effective read-ahead on sequential scans.
    
    So why all the enthusiasm for multi-tables-per-file?
    
    			regards, tom lane
    
    
  95. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-17T00:08:21Z

    > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
    > > It seems that we should also provide not_preallocated DATAFILE
    > > when many_tables_in_a_file storage manager is introduced.
    > 
    > Several people in this thread have been talking like a
    > single-physical-file storage manager is in our future, but I can't
    > recall anyone saying that they were going to do such a thing or even
    > presenting reasons why it'd be a good idea.
    > 
    > Seems to me that physical file per relation is considerably better for
    > our purposes.  It's easier to figure out what's going on for admin and
    > debug work, it means less lock contention among different backends
    > appending concurrently to different relations, and it gives the OS a
    > better shot at doing effective read-ahead on sequential scans.
    > 
    > So why all the enthusiasm for multi-tables-per-file?
    
    No idea.  I thought Vadim mentioned it, but I am not sure anymore.  I
    certainly like our current system.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  96. Re: Big 7.1 open items

    Chris Bitmead <chris@bitmead.com> — 2000-06-17T00:39:16Z

    > > So why all the enthusiasm for multi-tables-per-file?
    
    It allows you to use raw partitions which stop the OS double buffering
    and wasting half of memory, as well as removing the overhead of indirect
    blocks in the file system.
    
    
  97. RE: Big 7.1 open items

    Hiroshi Inoue <inoue@tpf.co.jp> — 2000-06-17T09:38:29Z

    > -----Original Message-----
    > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
    > > 
    > > So why all the enthusiasm for multi-tables-per-file?
    > 
    > No idea.  I thought Vadim mentioned it, but I am not sure anymore.  I
    > certainly like our current system.
    > 
    
    Oops,I'm not so enthusiastic for multi_tables_per_file smgr.
    I believe that Ross and I have taken a practical way that doesn't
    break current file_per_table smgr.
    
    However it seems very natural to take multi_tables_per_file
    smgr into account when we consider TABLESPACE concept.
    Because TABLESPACE is an encapsulation,it should have
    a possibility to handle multi_tables_per_file smgr IMHO.
    
    Regards. 
    
    Hiroshi Inoue
    Inoue@tpf.co.jp 
    
    
  98. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-17T16:11:18Z

    "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
    > However it seems very natural to take multi_tables_per_file
    > smgr into account when we consider TABLESPACE concept.
    > Because TABLESPACE is an encapsulation,it should have
    > a possibility to handle multi_tables_per_file smgr IMHO.
    
    OK, I see: you're just saying that the tablespace stuff should be
    designed in such a way that it would work with a non-file-per-table
    smgr.  Agreed, that'd be a good check of a clean design, and someday
    we might need it...
    
    			regards, tom lane
    
    
  99. RE: Big 7.1 open items

    Kaare Rasmussen <kar@webline.dk> — 2000-06-17T16:32:06Z

    > Not to mention the reverse: if I read this right, you have to suck
    > up your GB's long in advance of actually needing them.  That's OK
    > for a machine that's dedicated to Oracle ... not so OK for smaller
    > installations, playpens, etc.
    
    To me it looks like a way to make Oracle work on VMS machines. This is the way
    files are allocated on Digital hardware.
    
    -- 
    Kaare Rasmussen            --Linux, spil,--        Tlf:        3816 2582
    Kaki Data                tshirts, merchandize      Fax:        3816 2582
    Howitzvej 75               ben 14.00-18.00        Email: kar@webline.dk
    2000 Frederiksberg        Lrdag 11.00-17.00       Web:      www.suse.dk
    
    
  100. Re: Big 7.1 open items

    Randall Parker <rgparker@west.net> — 2000-06-17T21:52:37Z

    [This followup was posted to comp.databases.postgresql.hackers and a copy 
    was sent to the cited author.]
    
    A few thoughts:
    
    1) There may be reasons why someone might not want to use RAID. 
       For instance, suppose one wants to put different tables on different 
    drives so that the seeks for one table doesn't move the drive heads away 
    from the disk area for another table.
       Also, suppose someone wants to use a particular drive for a particular 
    purpose (eg certain indexes) because it is faster at seeking vs another 
    drive that is faster at sustained transfer rates.
       Also, someone may want to span a drive across multiple SCSI 
    controllers. Most RAID arrays I'm aware of are per SCSI controller. 
       I think it is fair to say that there will always be instances where 
    people want to have more control over where stuff goes because they are 
    willing to put the effort into more subtle tuning games. Well, there 
    ought to be a way.
    
    2) Some OSs do not support symlinks. The ability to list a bunch of 
    devices for where things will go would be of value.
       Also, if you aren't putting your data on a real file system (say on 
    raw partitions instead) you are going to need a way to specify that 
    anyway.
    
    In news:<394A556A.4EAC8B9A@alumni.caltech.edu>, 
    lockhart@alumni.caltech.edu says...
    > o the ability to split single tables across disks was essential for
    > scalability when disks were small. But with RAID, NAS, etc etc isn't
    > that a smaller issue now?
    > o "tablespaces" would implement our less-developed "with location"
    > feature, right? Splitting databases, whole indices and whole tables
    > across storage is the biggest win for this work since more users will
    > use the feature.
    > o location information needs to travel with individual tables anyway.
    
    
     
    
    
  101. Re: Big 7.1 open items

    Jan Wieck <janwieck@t-online.de> — 2000-06-17T23:23:59Z

    Tom Lane wrote:
    > JanWieck@t-online.de (Jan Wieck) writes:
    > >     There are also disadvantages.
    >
    > >         You can run out of space even if there  are  plenty  GB's
    > >         free  on  your  disks.   You  have  to create tablespaces
    > >         explicitly.
    >
    > Not to mention the reverse: if I read this right, you have to suck
    > up your GB's long in advance of actually needing them.  That's OK
    > for a machine that's dedicated to Oracle ... not so OK for smaller
    > installations, playpens, etc.
    
        Right,  the design is perfect for a few databases with a more
        or less stable size and schema (slow to medium  growth).  The
        problem  is, that production databases tend to fall into that
        behaviour and that might be  a  reason  for  so  many  people
        asking for Oracle compatibility - they want to do development
        in the high  flexible  Postgres  environment,  while  running
        their production server under Oracle :-(.
    
    > I'm not convinced that there's anything fundamentally wrong with
    > doing storage allocation in Unix files the way we have been.
    >
    > (At least not when we're sitting atop a well-done filesystem,
    > which may leave the Linux folk out in the cold ;-).)
    
        I'm  with  you on that, even if I'm one of the Linux loosers.
        The only point that really strikes me is that in  our  system
        you  might  end  up with a corrupted file system because some
        inode changes didn't make it to the disk before a crash. Even
        if  using  fsync() instead of fdatasync() (what we cannot use
        at all and that's a pain from the performance PoV).   In  the
        Oracle world, that could only happen during
    
            ALTER TABLESPACE <tsname> ADD DATAFILE ...
    
        which  is  a  fairly seldom command, issued usually by the DB
        admin (at  least  it  requires  admin  privileges)  and  thus
        ensures  the "admin is there and already paying attention". A
        little detail not to underestimate IMHO.
    
    
    Jan
    
    --
    
    #======================================================================#
    # It's easier to get forgiveness for being wrong than for being right. #
    # Let's break this rule - forgive me.                                  #
    #================================================== JanWieck@Yahoo.com #
    
    
    
    
  102. Re: Big 7.1 open items

    Jan Wieck <janwieck@t-online.de> — 2000-06-17T23:27:09Z

    Thomas Lockhart wrote:
    > > (At least not when we're sitting atop a well-done filesystem,
    > > which may leave the Linux folk out in the cold ;-).)
    >
    > Those who live in HP houses should not throw stones :))
    
        Huh?  Up to HPUX-9 they used to have BSD-FFS - even if it was
        a 4.2 BSD one - no?
    
    
    Jan
    
    --
    
    #======================================================================#
    # It's easier to get forgiveness for being wrong than for being right. #
    # Let's break this rule - forgive me.                                  #
    #================================================== JanWieck@Yahoo.com #
    
    
    
    
  103. Re: Big 7.1 open items

    Jan Wieck <janwieck@t-online.de> — 2000-06-18T00:10:09Z

    Tom Lane wrote:
    > JanWieck@t-online.de (Jan Wieck) writes:
    > > Tom Lane wrote:
    > >> It gets a little trickier if you want to be able to split
    > >> multi-gig tables across several tablespaces, though, since
    > >> you couldn't just append ".N" to the base table path in that
    > >> scenario.
    > >>
    > >> I'd be interested to know what sort of facilities Oracle
    > >> provides for managing huge tables...
    >
    > >     Oracle  tablespaces  are  a  collection of 1...n preallocated
    > >     files.   Each  table  then  is  bound  to  a  tablespace  and
    > >     allocates extents (chunks) from those files.
    >
    > OK, to get back to the point here: so in Oracle, tables can't cross
    > tablespace boundaries, but a tablespace itself could span multiple
    > disks?
    
        They can. The path in
    
            ALTER TABLESPACE <tsname> ADD DATAFILE ...
    
        can point to any location the db system has access to.
    
    >
    > Not sure if I like that better or worse than equating a tablespace
    > with a directory (so, presumably, all the files within it live on
    > one filesystem) and then trying to make tables able to span
    > tablespaces.  We will need to do one or the other though, if we want
    > to have any significant improvement over the current state of affairs
    > for large tables.
    >
    > One way is to play the flip-the-path-ordering game some more,
    > and access multiple-segment tables with pathnames like this:
    >
    >    .../TABLESPACE/RELATION       -- first or only segment
    >    .../TABLESPACE/N/RELATION     -- N'th extension segment
    >
    > [...]
    
        In most cases all objects in one database are bound to one or
        two tablespaces (data and indices). So you do  an  estimation
        of  the  size  required, create the tablespaces (and probably
        all their extension files), then create the schema  and  load
        it.  The  only reason not to do so is if your DB exceeds some
        size where you have to fear not beeing able to finish  online
        backups  before  getting into Online-Relolog stuck. Has to do
        the the online backup procedure of Oracle.
    
    > This isn't any harder for md.c to deal with than what we do now,
    > but by making the /N subdirectories be symlinks, the dbadmin could
    > easily arrange for extension segments to go on different filesystems.
    > Also, since /N subdirectory symlinks can be added as needed,
    > expanding available space by attaching more disks isn't hard.
    > (If the admin hasn't pre-made a /N symlink when it's needed,
    > I'd envision the backend just automatically creating a plain
    > subdirectory so that it can extend the table.)
    
        So the admin allways have to leave enough  freespace  in  the
        default  location to keep the DB running until he can take it
        offline, move the autocreated files and create the  symlinks.
        What a pain for 24/7 systems.
    
    > We'd still want to create some tools to help the dbadmin with slinging
    > all these symlinks around, of course.  But I think it's critical to keep
    > the low-level file access protocol simple and reliable, which really
    > means minimizing the amount of information the backend needs to know to
    > figure out which file to write a page in.  With something like the above
    > you only need to know the tablespace name (or more likely OID), the
    > relation OID (+name or not, depending on outcome of other argument),
    > and the offset in the table.  No worse than now from the software's
    > point of view.
    
        Exactly  the  "low-level  file  access"  protocol  is  highly
        complicated in Postgres. Because nearly  every  object  needs
        his  own file, we need to deal with virtual file descriptors.
        With an Oracle-like tablespace concept and a fixed  limit  of
        total   tablespace   files  (this  time  OS  or  installation
        specific), we could keep them all open all the time.  IMHO  a
        big win.
    
    
    Jan
    
    --
    
    #======================================================================#
    # It's easier to get forgiveness for being wrong than for being right. #
    # Let's break this rule - forgive me.                                  #
    #================================================== JanWieck@Yahoo.com #
    
    
    
    
  104. Re: Big 7.1 open items

    Jan Wieck <janwieck@t-online.de> — 2000-06-18T00:20:15Z

    Bruce Momjian wrote:
    > >     There are also disadvantages.
    > >
    > >         You can run out of space even if there  are  plenty  GB's
    > >         free  on  your  disks.   You  have  to create tablespaces
    > >         explicitly.
    > >
    > >         If you've choosen inadequate extent size parameters,  you
    > >         end  up with high fragmented tables (slowing down) or get
    > >         stuck with running against maxextents, where only a reorg
    > >         (export/import) helps.
    >
    > Also, Tom Lane pointed out to me that file system read-ahead does not
    > help if your table is spread around in tablespaces.
    
        Not  with our HEAP concept. With the Oracle EXTENT concept it
        does pretty good, because they  have  different  block/extent
        sizes.   Usually  an  extent spans multiple blocks, so in the
        case of sequential reads they read each  extent  of  probably
        hundreds  of  K sequential. And in the case of indexed reads,
        they know the extent and offset of the tuple  inside  of  the
        extent,  so they know the exact location of the record inside
        the tablespace to read.
    
        The big problem we allways had (why we need TOAST at all)  is
        that  the logical blocksize (extent size) of a table is bound
        to your physical blocksize used in the shared cache. This  is
        fixed  so  deeply  in the heap storage architecture, that I'm
        scared about it.
    
    
    Jan
    
    --
    
    #======================================================================#
    # It's easier to get forgiveness for being wrong than for being right. #
    # Let's break this rule - forgive me.                                  #
    #================================================== JanWieck@Yahoo.com #
    
    
    
    
  105. Re: Big 7.1 open items

    Jan Wieck <janwieck@t-online.de> — 2000-06-18T00:36:01Z

    Don Baccus wrote:
    > At 11:46 AM 6/16/00 -0400, Tom Lane wrote:
    >
    > I personally dislike depending on symlinks to move stuff around.
    > Among other things, a pg_dump/restore (and presumably future
    > backup tools?) can't recreate the disk layout automatically.
    >
    
        Most impact from this one, IMHO.
    
        Not  that  Oracle tools are able to do it either. But I think
        it's more trivial to recreate a 30+ tablespace layout on  the
        disks   than   to   recreate   all  symlinks  for  a  20,000+
        tables/indices database like an SAP R/3 one.
    
    
    Jan
    
    --
    
    #======================================================================#
    # It's easier to get forgiveness for being wrong than for being right. #
    # Let's break this rule - forgive me.                                  #
    #================================================== JanWieck@Yahoo.com #
    
    
    
    
  106. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-18T03:16:22Z

    OK, I have thought about tablespaces, and here is my proposal.  Maybe
    there will some good ideas in my design.
    
    My feeling is that intelligent use of directories and symlinks can allow
    PostgreSQL to handle tablespaces and allow administrators to use
    symlinks outside of PostgreSQL and have PostgreSQL honor those changes
    in a reload.
    
    Seems we have three tablespace needs:
    
    	locate database in separate disk
    	locate tables in separate directory/symlink
    	locate secondary extents on different drives
    
    If we have a new CREATE DATABASE LOCATION command, we can say:
    
    	CREATE DATABASE LOCATION dbloc IN '/var/private/pgsql';
    	CREATE DATABASE newdb IN dbloc;
    
    The first command makes sure /var/private/pgsql exists and is write-able
    by postgres.  It then creates a dbloc directory and a symlink:
    
    	mkdir /var/private/pgsql/dbloc
    	ln -s /var/private/pgsql/dbloc data/base/dbloc
    
    The CREATE DATABASE command creates data/base/dbloc/newdb and creates
    the database there.  We would have to store the dbloc location in
    pg_database.
    
    To handle placing tables, we can use:
    
    	CREATE LOCATION tabloc IN '/var/private/pgsql';
    	CREATE TABLE newtab ... IN tabloc;
    
    The first command makes sure /var/private/pgsql exists and is write-able
    by postgres.  It then creates a directory tabloc in /var/private/pgsql,
    and does a symlink:
    
    	ln -s /var/private/pgsql/tabloc data/base/dbloc/newdb/tabloc
    
    and creates the table in there.  These location names have to be stored
    in pg_class.
    
    The difference betweeen CREATE LOCATION and CREATE DATABASE LOCATION is
    that the first one puts it in the current database, while the latter
    puts the symlinks in data/base.  
    
    (Can we remove data/base and just make it data/?)
    
    I would also allow a simpler CREATE LOCATION tabloc2 which just creates
    a directory in the database directory.  These can be moved later using
    symlinks.  Of course, CREATE DATABASE LOCATION too.
    
    I haven't figured out extent locations yet.  One idea is to allow
    administrators to create symlinks for tables >1 gig, and to not remove
    the symlinks when a table shrinks.   Only remove the file pointed to by
    the table, but leave the symlink there so if the table grows again, it
    can use the symlink.  lstat() would allow this.
    
    Now on to preserving this information.  My ideas is that PostgreSQL
    should never remove a directory or symlink in the data/base directory. 
    Those represent locations made by the administrator.  So, pg_dump with a
    -l option can go through the db directory and output CREATE LOCATION
    commands for every database, so when reloaded, the locations will be
    preserved, assuming the symlinks point to still-valid directories.
    
    What this does allow is someone to create locations during table
    population, but to keep them all on the same drive.  If they later move
    things around on the disk using cp and symlinks, this will be preserved
    by pg_dump.
    
    My problem with many of the tablespace systems is that it requires two
    changes.  One in the file system using symlinks, and another in the
    database to point to the new entries, or it does not preserve them
    across backups.
    
    If someone does want to remove a location, they would have to remove all
    tables in the directory, and the base directory and symlink can be
    removed with DROP LOCATION.
    
    My solution basically stores locations for databases and tables in the
    database, but does _not_ store information about what locations exist or
    if they are symlinks.  However, it does allow for preserving of this
    information in dumps.
    
    I feel this solution is very flexible. 
    
    Comments?
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  107. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-18T05:21:47Z

    JanWieck@t-online.de (Jan Wieck) writes:
    > Thomas Lockhart wrote:
    >> Those who live in HP houses should not throw stones :))
    
    >     Huh?  Up to HPUX-9 they used to have BSD-FFS - even if it was
    >     a 4.2 BSD one - no?
    
    Yeah, the standard HPUX filesystem is still BSD ... and it still runs
    rings around Linux extfs2 in my experience.  (I've been informed that
    Linux has better filesystems than extfs2, but that seems to be what
    the average Linux user is running.)  I have a realtime data collection
    program that usually wants to write several thousand small files during
    shutdown.  The shutdown typically takes about 3 minutes on an HP 715/75,
    upwards of 10 minutes on a Linux box with nominally-faster hardware.
    
    BTW, HP is trying to sell people on using a new journaling filesystem
    that they claim outperforms BSD, but my few experiments with it
    haven't encouraged me to pursue it.
    
    			regards, tom lane
    
    
  108. Re: Big 7.1 open items

    Michael Reifenberger <root@nihil.plaut.de> — 2000-06-18T10:38:39Z

    On Sun, 18 Jun 2000, Jan Wieck wrote:
    ...
    >         ALTER TABLESPACE <tsname> ADD DATAFILE ...
    > 
    >     which  is  a  fairly seldom command, issued usually by the DB
    >     admin (at  least  it  requires  admin  privileges)  and  thus
    >     ensures  the "admin is there and already paying attention". A
    >     little detail not to underestimate IMHO.
    ...
    Esp. in the R/3 area this will become no longer be true the more commonly
    commands like "AUTOEXTEND" and "RESIZE" are used (automated at worst).
    
    Bye!
    ----
    Michael Reifenberger
    ^.*Plaut.*$, IT, R/3 Basis, GPS
    
    
    
  109. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-18T13:33:44Z

    > I haven't figured out extent locations yet.  One idea is to allow
    > administrators to create symlinks for tables >1 gig, and to not remove
    > the symlinks when a table shrinks.   Only remove the file pointed to by
    > the table, but leave the symlink there so if the table grows again, it
    > can use the symlink.  lstat() would allow this.
    
    OK, I have an extent idea.  It is:
    
    	CREATE LOCATION tabloc IN '/var/private/pgsql' EXTENT2
    '/usr/pg'.
    
    This creates an /extents directory in the location, with extents/2
    symlinked to /usr/pg:
    
    	data/base/mydb/tabloc
    	data/base/mydb/tabloc/extents/2
    
    When extending a table, it looks for an extents/2 directory and uses
    that if it exists. Same for extents3.  We could even get fancy and
    round-robin through all the extents directories, looping around to the
    beginning when we run out of them.  That sounds nice.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  110. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-18T14:35:54Z

    > > I haven't figured out extent locations yet.  One idea is to allow
    > > administrators to create symlinks for tables >1 gig, and to not remove
    > > the symlinks when a table shrinks.   Only remove the file pointed to by
    > > the table, but leave the symlink there so if the table grows again, it
    > > can use the symlink.  lstat() would allow this.
    > 
    > OK, I have an extent idea.  It is:
    > 
    > 	CREATE LOCATION tabloc IN '/var/private/pgsql' EXTENT2
    > '/usr/pg'.
    
    Even better:
    
    	CREATE LOCATION tabloc IN '/var/private/pgsql' 
    		EXTENT '/usr/pg', '/usr1/pg'
    
    This will create extent/2 and extent/3, and the system can rotate
    extents between the primary storage area, and 2 and 3.
    
    Also, CREATE INDEX will need a location specification added.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  111. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-18T16:06:29Z

    Bruce Momjian <pgman@candle.pha.pa.us> writes:
    > ...  We could even get fancy and
    > round-robin through all the extents directories, looping around to the
    > beginning when we run out of them.  That sounds nice.
    
    That sounds horrible.  There's no way to tell which extent directory
    extent N goes into except by scanning the location directory to find
    out how many extent subdirectories there are (so that you can compute
    N modulo number-of-directories).  Do you want to pay that price on every
    file open?
    
    Worse, what happens when you add another extent directory?  You can't
    find your old extents anymore, that's what, because they're not in the
    right place (N modulo number-of-directories just changed).  Since the
    extents are presumably on different volumes, you're talking about
    physical file moves to get them where they should be.  You probably
    can't add a new extent without shutting down the entire database while
    you reshuffle files --- at the very least you'd need to get exclusive
    locks on all the tables in that tablespace.
    
    Also, you'll get filename conflicts from multiple extents of a single
    table appearing in one of the recycled extent dirs.  You could work
    around it by using the non-modulo'd N as part of the final file name,
    but that just adds more complexity and makes the filename-generation
    machinery that much more closely tied to this specific way of doing
    things.
    
    The right way to do this is that extent N goes into extents subdirectory
    N, period.  If there's no such subdirectory, create one on-the-fly as a
    plain subdirectory of the location directory.  The dbadmin can easily
    create secondary extent symlinks *in advance of their being needed*.
    Reorganizing later is much more painful since it requires moving
    physical files, but I think that'd be true no matter what.  At least
    we should see to it that adding more space in advance of needing it is
    painless.
    
    It's possible to do it that way (auto-create extent subdir if needed)
    without tying the md.c machinery real closely to a specific filename
    creation procedure: it's just the same sort of thing as install programs
    customarily do.  "If you fail to create a file, try creating its
    ancestor directory."  We'd have to think about whether it'd be a good
    idea to allow auto-creation of more than one level of directory; offhand
    it seems that needing to make more than one level is probably a sign of
    an erroneous path, not need for another extent subdirectory.
    
    			regards, tom lane
    
    
  112. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-18T22:50:17Z

    If we eliminate the round-robin idea, what did people think of the rest
    of the ideas?
    
    > Bruce Momjian <pgman@candle.pha.pa.us> writes:
    > > ...  We could even get fancy and
    > > round-robin through all the extents directories, looping around to the
    > > beginning when we run out of them.  That sounds nice.
    > 
    > That sounds horrible.  There's no way to tell which extent directory
    > extent N goes into except by scanning the location directory to find
    > out how many extent subdirectories there are (so that you can compute
    > N modulo number-of-directories).  Do you want to pay that price on every
    > file open?
    
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  113. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-18T23:36:59Z

    > > Not to mention the reverse: if I read this right, you have to suck
    > > up your GB's long in advance of actually needing them.  That's OK
    > > for a machine that's dedicated to Oracle ... not so OK for smaller
    > > installations, playpens, etc.
    > 
    > To me it looks like a way to make Oracle work on VMS machines. This is the way
    > files are allocated on Digital hardware.
    
    Agreed.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  114. Re: Big 7.1 open items

    Don Baccus <dhogaza@pacifier.com> — 2000-06-18T23:43:42Z

    At 06:50 PM 6/18/00 -0400, Bruce Momjian wrote:
    >If we eliminate the round-robin idea, what did people think of the rest
    >of the ideas?
    
    Why invent new syntax when "create tablespace" is something a lot
    of folks will recognize?
    
    And why not use "create table ... using ... "?  In other words, 
    Oracle-compatible for this construct?  Sure, Postgres doesn't
    have to follow Oraclisms but picking an existing contruct means
    at least SOME folks can import a datamodel without having to
    edit it.
    
    Does your proposal break the smgr abstraction, i.e. does it
    preclude later efforts to (say) implement an (optional) 
    raw-device storage manager?
    
    
    
    
    - Don Baccus, Portland OR <dhogaza@pacifier.com>
      Nature photos, on-line guides, Pacific Northwest
      Rare Bird Alert Service and other goodies at
      http://donb.photo.net.
    
    
  115. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-19T00:08:07Z

    > At 06:50 PM 6/18/00 -0400, Bruce Momjian wrote:
    > >If we eliminate the round-robin idea, what did people think of the rest
    > >of the ideas?
    > 
    > Why invent new syntax when "create tablespace" is something a lot
    > of folks will recognize?
    > 
    > And why not use "create table ... using ... "?  In other words, 
    > Oracle-compatible for this construct?  Sure, Postgres doesn't
    > have to follow Oraclisms but picking an existing contruct means
    > at least SOME folks can import a datamodel without having to
    > edit it.
    
    Sure, use another syntax.  My idea was to use symlinks, and allow their
    moving using symlinks and preserve them during dump.
    
    > 
    > Does your proposal break the smgr abstraction, i.e. does it
    > preclude later efforts to (say) implement an (optional) 
    > raw-device storage manager?
    
    Seeing very few want that done, I don't see it as an issue at this
    point.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  116. Re: Big 7.1 open items

    Don Baccus <dhogaza@pacifier.com> — 2000-06-19T00:12:22Z

    At 08:08 PM 6/18/00 -0400, Bruce Momjian wrote:
    
    >> Does your proposal break the smgr abstraction, i.e. does it
    >> preclude later efforts to (say) implement an (optional) 
    >> raw-device storage manager?
    >
    >Seeing very few want that done, I don't see it as an issue at this
    >point.
    
    Sorry, I disagree.  There's excuse for breaking existing abstractions
    unless there's a compelling reason to do so.
    
    My question should make it clear I was using a raw-device storage
    manager as an example.  There are other possbilities, like a 
    many-tables-per-file storage manager.
    
    
    
    - Don Baccus, Portland OR <dhogaza@pacifier.com>
      Nature photos, on-line guides, Pacific Northwest
      Rare Bird Alert Service and other goodies at
      http://donb.photo.net.
    
    
  117. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-19T00:24:25Z

    > JanWieck@t-online.de (Jan Wieck) writes:
    > > Thomas Lockhart wrote:
    > >> Those who live in HP houses should not throw stones :))
    > 
    > >     Huh?  Up to HPUX-9 they used to have BSD-FFS - even if it was
    > >     a 4.2 BSD one - no?
    > 
    > Yeah, the standard HPUX filesystem is still BSD ... and it still runs
    > rings around Linux extfs2 in my experience.  (I've been informed that
    > Linux has better filesystems than extfs2, but that seems to be what
    > the average Linux user is running.)  I have a realtime data collection
    > program that usually wants to write several thousand small files during
    > shutdown.  The shutdown typically takes about 3 minutes on an HP 715/75,
    > upwards of 10 minutes on a Linux box with nominally-faster hardware.
    > 
    > BTW, HP is trying to sell people on using a new journaling filesystem
    > that they claim outperforms BSD, but my few experiments with it
    > haven't encouraged me to pursue it.
    
    You should really try the BSD4.4 FFS with soft updates.  It re-orders
    disk flushes to greatly improve performance.  It really is great.  
    
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  118. Re: Big 7.1 open items

    Ross Reedstrom <reedstrm@rice.edu> — 2000-06-19T00:47:04Z

    On Sun, Jun 18, 2000 at 05:12:22PM -0700, Don Baccus wrote:
    > At 08:08 PM 6/18/00 -0400, Bruce Momjian wrote:
    > 
    > >> Does your proposal break the smgr abstraction, i.e. does it
    > >> preclude later efforts to (say) implement an (optional) 
    > >> raw-device storage manager?
    > >
    > >Seeing very few want that done, I don't see it as an issue at this
    > >point.
    > 
    > Sorry, I disagree.  There's excuse for breaking existing abstractions
    > unless there's a compelling reason to do so.
    > 
    > My question should make it clear I was using a raw-device storage
    > manager as an example.  There are other possbilities, like a 
    > many-tables-per-file storage manager.
    > 
    
    Don, I see Bruce's proposal as implementation details within the sotrage
    manager. In fact, we should probably implement the tablespace commands
    with an extention of the smgr api. One different smgr I've been thinking
    a little about is the persistent RAM smgr: I've heard there's some
    new technologies coming up that may make large amounts cheaper, soon.
    And there's always PostgreSQL for PalmOS, right? (Hey, IBM's got a Pocket
    DB2, why shouldn't we?)
    
    Ross
    -- 
    Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
    NSBRI Research Scientist/Programmer
    Computer and Information Technology Institute
    Rice University, 6100 S. Main St.,  Houston, TX 77005
    
    
    
  119. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-19T00:54:00Z

    > At 08:08 PM 6/18/00 -0400, Bruce Momjian wrote:
    > 
    > >> Does your proposal break the smgr abstraction, i.e. does it
    > >> preclude later efforts to (say) implement an (optional) 
    > >> raw-device storage manager?
    > >
    > >Seeing very few want that done, I don't see it as an issue at this
    > >point.
    > 
    > Sorry, I disagree.  There's excuse for breaking existing abstractions
    > unless there's a compelling reason to do so.
    > 
    > My question should make it clear I was using a raw-device storage
    > manager as an example.  There are other possbilities, like a 
    > many-tables-per-file storage manager.
    
    I agree it is nice to keep things as abstract as possible.  I just don't
    know if the abstraction will cause added complexity.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  120. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-19T03:13:44Z

    My basic proposal is that we optionally allow symlinks when creating
    tablespace directories, and that we interrogate those symlinks during a
    dump so administrators can move tablespaces around without having to
    modify environment variables or system tables.
    
    I also suggested creating an extent directory to hold extents, like
    extent/2 and extent/3.  This will allow administration for smaller sites
    to be simpler.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  121. Re: Big 7.1 open items

    Don Baccus <dhogaza@pacifier.com> — 2000-06-19T04:07:48Z

    At 11:13 PM 6/18/00 -0400, Bruce Momjian wrote:
    >My basic proposal is that we optionally allow symlinks when creating
    >tablespace directories, and that we interrogate those symlinks during a
    >dump so administrators can move tablespaces around without having to
    >modify environment variables or system tables.
    
    If they can move them around from within the db, they'll have no need to
    move them around from outside the db. 
    
    I don't quite understand your devotion to using filesystem commands
    outside the database to do database administration.
    
    
    
    - Don Baccus, Portland OR <dhogaza@pacifier.com>
      Nature photos, on-line guides, Pacific Northwest
      Rare Bird Alert Service and other goodies at
      http://donb.photo.net.
    
    
  122. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-19T04:25:52Z

    Bruce Momjian <pgman@candle.pha.pa.us> writes:
    > I also suggested creating an extent directory to hold extents, like
    > extent/2 and extent/3.  This will allow administration for smaller sites
    > to be simpler.
    
    I don't see the value in creating an extra level of directory --- seems
    that just adds one more Unix directory-lookup cycle to each file open,
    without any apparent return.  What's wrong with extent directory names
    like extent2, extent3, etc?
    
    Obviously the extent dirnames must be chosen so they can't conflict
    with table filenames, but that's easily done.  For example, if table
    files are named like 'OID_xxx' then 'extentN' will never conflict.
    
    			regards, tom lane
    
    
  123. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-19T04:28:14Z

    Don Baccus <dhogaza@pacifier.com> writes:
    > If they can move them around from within the db, they'll have no need to
    > move them around from outside the db. 
    > I don't quite understand your devotion to using filesystem commands
    > outside the database to do database administration.
    
    Being *able* to use filesystem commands to see/fix what's going on is a
    good thing, particularly from a development/debugging standpoint.  But
    I agree we want to have within-the-system admin commands to do the same
    things.
    
    			regards, tom lane
    
    
  124. Re: Big 7.1 open items

    Don Baccus <dhogaza@pacifier.com> — 2000-06-19T04:33:19Z

    At 12:28 AM 6/19/00 -0400, Tom Lane wrote:
    
    >Being *able* to use filesystem commands to see/fix what's going on is a
    >good thing, particularly from a development/debugging standpoint. 
    
    Of course it's a crutch for development, but outside of development
    circles few users will know how to use the OS in regard to the
    database.
    
    Assuming PG takes off.  Of course, if it remains the realm of the
    dedicated hard-core hacker, I'm wrong.  
    
    I have nothing against preserving the ability to use filesystem
    commands if there's no significant costs inherent with this approach.
    I'd view the breaking of smgr abstraction as a significant cost (though
    I agree with Ross that it Bruce's proposal shouldn't require that, I
    asked my question to flush Bruce out, if you will, because he's 
    devoted to a particular outside-the-db management model).
    
    > But
    >I agree we want to have within-the-system admin commands to do the same
    >things.
    
    MUST have, I should think.
    
    
    
    - Don Baccus, Portland OR <dhogaza@pacifier.com>
      Nature photos, on-line guides, Pacific Northwest
      Rare Bird Alert Service and other goodies at
      http://donb.photo.net.
    
    
  125. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-19T04:53:49Z

    > Don Baccus <dhogaza@pacifier.com> writes:
    > > If they can move them around from within the db, they'll have no need to
    > > move them around from outside the db. 
    > > I don't quite understand your devotion to using filesystem commands
    > > outside the database to do database administration.
    > 
    > Being *able* to use filesystem commands to see/fix what's going on is a
    > good thing, particularly from a development/debugging standpoint.  But
    > I agree we want to have within-the-system admin commands to do the same
    > things.
    
    Yes, I like to have db commands to do it.  I just like to allow things
    outside too, if possible.  It also prevents things from getting out of
    sync because the database doesn't need to store the symlink location.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  126. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-19T05:49:32Z

    Don Baccus <dhogaza@pacifier.com> writes:
    > I'd view the breaking of smgr abstraction as a significant cost
    
    Actually, the "smgr abstraction" has *been* broken for a long time,
    due to sloppy implementation of features like relation rename.
    But I agree we should try to re-establish a clean separation.
    
    >> But
    >> I agree we want to have within-the-system admin commands to do the same
    >> things.
    
    > MUST have, I should think.
    
    No argument from this quarter.  It seems to me that once a PG
    installation has been set up, it ought to be possible to do routine
    admin tasks remotely --- and that means no direct access to the
    server's filesystem.
    
    			regards, tom lane
    
    
  127. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-19T13:28:59Z

    > Bruce Momjian <pgman@candle.pha.pa.us> writes:
    > > I also suggested creating an extent directory to hold extents, like
    > > extent/2 and extent/3.  This will allow administration for smaller sites
    > > to be simpler.
    > 
    > I don't see the value in creating an extra level of directory --- seems
    > that just adds one more Unix directory-lookup cycle to each file open,
    > without any apparent return.  What's wrong with extent directory names
    > like extent2, extent3, etc?
    > 
    > Obviously the extent dirnames must be chosen so they can't conflict
    > with table filenames, but that's easily done.  For example, if table
    > files are named like 'OID_xxx' then 'extentN' will never conflict.
    
    We could call them extent.2, extent-2, or Extent-2.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  128. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-19T13:30:56Z

    > At 12:28 AM 6/19/00 -0400, Tom Lane wrote:
    > 
    > >Being *able* to use filesystem commands to see/fix what's going on is a
    > >good thing, particularly from a development/debugging standpoint. 
    > 
    > Of course it's a crutch for development, but outside of development
    > circles few users will know how to use the OS in regard to the
    > database.
    > 
    > Assuming PG takes off.  Of course, if it remains the realm of the
    > dedicated hard-core hacker, I'm wrong.  
    > 
    > I have nothing against preserving the ability to use filesystem
    > commands if there's no significant costs inherent with this approach.
    > I'd view the breaking of smgr abstraction as a significant cost (though
    > I agree with Ross that it Bruce's proposal shouldn't require that, I
    > asked my question to flush Bruce out, if you will, because he's 
    > devoted to a particular outside-the-db management model).
    
    The fact is that symlink information is already stored in the file
    system.  If we store symlink information in the database too, there
    exists the ability for the two to get out of sync.  My point is that I
    think we can _not_ store symlink information in the database, and query
    the file system using lstat when required.
    
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  129. RE: Big 7.1 open items

    Hiroshi Inoue <inoue@tpf.co.jp> — 2000-06-19T16:17:14Z

    > -----Original Message-----
    > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
    > 
    > The fact is that symlink information is already stored in the file
    > system.  If we store symlink information in the database too, there
    > exists the ability for the two to get out of sync.  My point is that I
    > think we can _not_ store symlink information in the database, and query
    > the file system using lstat when required.
    >
    
    Hmm,this seems pretty confusing to me.
    I don't understand the necessity of symlink.
    Directory tree,symlink,hard link ... are OS's standard.
    But I don't think they are fit for dbms management.
    
    PostgreSQL is a database system of cource. So
    couldn't it handle more flexible structure than OS's
    directory tree for itself ?
    
    Regards.
    
    Hiroshi Inoue
    Inoue@tpf.co.jp 
     
    
    
    
  130. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-19T17:35:59Z

    > > -----Original Message-----
    > > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
    > > 
    > > The fact is that symlink information is already stored in the file
    > > system.  If we store symlink information in the database too, there
    > > exists the ability for the two to get out of sync.  My point is that I
    > > think we can _not_ store symlink information in the database, and query
    > > the file system using lstat when required.
    > >
    > 
    > Hmm,this seems pretty confusing to me.
    > I don't understand the necessity of symlink.
    > Directory tree,symlink,hard link ... are OS's standard.
    > But I don't think they are fit for dbms management.
    > 
    > PostgreSQL is a database system of cource. So
    > couldn't it handle more flexible structure than OS's
    > directory tree for itself ?
    
    Yes, but is anyone suggesting a solution that does not work with
    symlinks?  If not, why not do it that way?
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  131. RE: Big 7.1 open items

    Hiroshi Inoue <inoue@tpf.co.jp> — 2000-06-20T05:52:17Z

    > -----Original Message-----
    > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
    >
    > > > -----Original Message-----
    > > > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
    > > >
    > > > The fact is that symlink information is already stored in the file
    > > > system.  If we store symlink information in the database too, there
    > > > exists the ability for the two to get out of sync.  My point is that I
    > > > think we can _not_ store symlink information in the database,
    > and query
    > > > the file system using lstat when required.
    > > >
    > > Hmm,this seems pretty confusing to me.
    > > I don't understand the necessity of symlink.
    > > Directory tree,symlink,hard link ... are OS's standard.
    > > But I don't think they are fit for dbms management.
    > >
    > > PostgreSQL is a database system of cource. So
    > > couldn't it handle more flexible structure than OS's
    > > directory tree for itself ?
    >
    > Yes, but is anyone suggesting a solution that does not work with
    > symlinks?  If not, why not do it that way?
    >
    
    Maybe other solutions have been proposed already because
    there have been so many opinions and proposals.
    
    I've felt TABLE(DATA)SPACE discussion has always been
    divergent.  IMHO,one of the main cause is that various factors
    have been discussed at once.  Shouldn't we make step by step
    consensus in TABLE(DATA)SPACE discussion ?
    
    IMHO,the first step is to decide the syntax of CREATE TABLE
    command not to define TABLE(DATA)SPACE.
    
    Comments ?
    
    Regards.
    
    Hiroshi Inoue
    Inoue@tpf.co.jp
    
    
    
  132. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-20T13:40:03Z

    > > Yes, but is anyone suggesting a solution that does not work with
    > > symlinks?  If not, why not do it that way?
    > >
    > 
    > Maybe other solutions have been proposed already because
    > there have been so many opinions and proposals.
    > 
    > I've felt TABLE(DATA)SPACE discussion has always been
    > divergent.  IMHO,one of the main cause is that various factors
    > have been discussed at once.  Shouldn't we make step by step
    > consensus in TABLE(DATA)SPACE discussion ?
    > 
    > IMHO,the first step is to decide the syntax of CREATE TABLE
    > command not to define TABLE(DATA)SPACE.
    > 
    > Comments ?
    
    Agreed.  Seems we have several issues:
    
    	filename contents
    	tablespace implementation
    	tablespace directory layout
    	tablespace commands and syntax
    
    Filename syntax seems to have resolved to
    tablespace/tablename_oid_version or something like that.  I think a
    clean solution to keep symlink names in sync with rename is to use hard
    links during rename, and during vacuum, if the link count is greater
    than one, we can scan the directory and remove old files matching the
    oid.
    
    I hope we can implement tablespaces using symlinks that can be dump, but
    the symlink location does not have to be stored in the database.
    
    Seems we are going to use Extent-2/Extent-3 to store extents under each
    tablespace.
    
    It also seems we will be using the Oracle tablespace syntax where
    appropriate.
    
    Comments?
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  133. Re: Big 7.1 open items

    Philip Warner <pjw@rhyme.com.au> — 2000-06-20T14:20:07Z

    At 09:40 20/06/00 -0400, Bruce Momjian wrote:
    >
    > [lots of stuff about symlinks]
    >
    
    It just occurred to me that the symlinks concerns may be short-circuitable,
    if the following are true:
    
    1. most of the desirability is for external 'management' and debugging etc
    on 'reasonably' static database designs.
    
    2. metadata changes (specifically renaming tables) occur infrequently.
    
    3. there is no reason why they are desirable *technically* within the
    implementations being discussed.
    
    If these are true, then why not create a utility (eg. pg_update_symlinks)
    that creates the relevant symlinks. It does not matter if they are
    outdated, from an integrity point of view, and for the most part they can
    be automatically maintained. Internally, postgresql can totally ignore them.
    
    Have I missed something?
    
    
    ----------------------------------------------------------------
    Philip Warner                    |     __---_____
    Albatross Consulting Pty. Ltd.   |----/       -  \
    (A.C.N. 008 659 498)             |          /(@)   ______---_
    Tel: (+61) 0500 83 82 81         |                 _________  \
    Fax: (+61) 0500 83 82 82         |                 ___________ |
    Http://www.rhyme.com.au          |                /           \|
                                     |    --________--
    PGP key available upon request,  |  /
    and from pgp5.ai.mit.edu:11371   |/
    
    
  134. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-20T14:35:47Z

    > At 09:40 20/06/00 -0400, Bruce Momjian wrote:
    > >
    > > [lots of stuff about symlinks]
    > >
    > 
    > It just occurred to me that the symlinks concerns may be short-circuitable,
    > if the following are true:
    > 
    > 1. most of the desirability is for external 'management' and debugging etc
    > on 'reasonably' static database designs.
    > 
    > 2. metadata changes (specifically renaming tables) occur infrequently.
    > 
    > 3. there is no reason why they are desirable *technically* within the
    > implementations being discussed.
    > 
    > If these are true, then why not create a utility (eg. pg_update_symlinks)
    > that creates the relevant symlinks. It does not matter if they are
    > outdated, from an integrity point of view, and for the most part they can
    > be automatically maintained. Internally, postgresql can totally ignore them.
    > 
    > Have I missed something?
    
    I am a little confused.  Are you suggesting that the entire symlink
    thing can be done outside the database?  Yes, that is true if we don't
    store the symlink locations in the database.  Of course, the database
    has to be down to do this.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  135. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-20T14:36:04Z

    Bruce Momjian <pgman@candle.pha.pa.us> writes:
    > Agreed.  Seems we have several issues:
    
    > 	filename contents
    > 	tablespace implementation
    > 	tablespace directory layout
    > 	tablespace commands and syntax
    
    I think we've agreed that the filename must depend on tablespace,
    file version, and file segment number in some fashion --- plus
    the table name/OID of course.  Although there's no real consensus
    about exactly how to construct the name, agreeing on the components
    is still a positive step.
    
    A couple of other areas of contention were:
    
    	revising smgr interface to be cleaner
    	exactly what to store in pg_class
    
    I don't think there's any quibble about the idea of cleaning up smgr,
    but we don't have a complete proposal on the table yet either.
    
    As for the pg_class issue, I still favor storing
    	(a) OID of tablespace --- not for file access, but so that
    	    associated tablespace-table entry can be looked up
    	    by tablespace management operations
    	(b) pathname of file as a column of type "name", including
                a %d to be replaced by segment #
    
    I think Peter was holding out for storing purely numeric tablespace OID
    and table version in pg_class and having a hardwired mapping to pathname
    somewhere in smgr.  However, I think that doing it that way gains only
    micro-efficiency compared to passing a "name" around, while using the
    name approach buys us flexibility that's needed for at least some of
    the variants under discussion.  Given that the exact filename contents
    are still so contentious, I think it'd be a bad idea to pick an
    implementation that doesn't allow some leeway as to what the filename
    will be.  A name also has the advantage that it is a single item that
    can be used to identify the table to smgr, which will help in cleaning
    up the smgr interface.
    
    As for tablespace layout/implementation, the only real proposal I've
    heard is that there be a subdirectory of the database directory for each
    tablespace, and that that have a subdirectory for each segment (extent)
    of its tables --- where any of these subdirectories could be symlinks
    off to a different filesystem.  Some unhappiness was raised about
    depending on symlinks for this function, but I didn't hear one single
    concrete reason not to do it, nor an alternative design.  Unless someone
    comes up with a counterproposal, I think that that's what the actual
    access mechanism will look like.  We still need to talk about what we
    want to store in the SQL-level representation of a tablespace, and what
    sort of tablespace management tools/commands are needed.  (Although
    "try to make it look like Oracle" seems to be pretty much the consensus
    for the command level, not all of us know exactly what that means...)
    
    Comments?  Anything else that we do have consensus on?
    
    			regards, tom lane
    
    
  136. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-20T14:45:38Z

    "Philip J. Warner" <pjw@rhyme.com.au> writes:
    > If these are true, then why not create a utility (eg. pg_update_symlinks)
    > that creates the relevant symlinks. It does not matter if they are
    > outdated, from an integrity point of view, and for the most part they can
    > be automatically maintained. Internally, postgresql can totally ignore them.
    
    What?
    
    I think you are confusing a couple of different things.  IIRC, at one
    time when we were just thinking about ALTER TABLE RENAME, there was
    a suggestion that the "real" table files be named by table OID, and
    that there be symlinks to those files named by logical table name as
    a crutch (:-)) for admins who wanted to know which table file was which.
    That could be handled as you've sketched above, but I think the whole
    proposal has fallen by the wayside anyway.
    
    The current discussion of symlinks is focusing on using directory
    symlinks, not file symlinks, to represent/implement tablespace layout.
    
    			regards, tom lane
    
    
  137. Re: Big 7.1 open items

    Philip Warner <pjw@rhyme.com.au> — 2000-06-20T14:49:59Z

    At 10:35 20/06/00 -0400, Bruce Momjian wrote:
    >> 
    >> If these are true, then why not create a utility (eg. pg_update_symlinks)
    >> that creates the relevant symlinks. It does not matter if they are
    >> outdated, from an integrity point of view, and for the most part they can
    >> be automatically maintained. Internally, postgresql can totally ignore
    them.
    >>
    >I am a little confused.  Are you suggesting that the entire symlink
    >thing can be done outside the database?  Yes, that is true if we don't
    >store the symlink locations in the database.  Of course, the database
    >has to be down to do this.
    
    The idea was to have postgresql, internally, totally ignore symlinks - use
    OID or whatever is technically best for file names. Then create a
    utility/command to make human-centric symlinks in a known location. The
    symlinks *could* be updated automatically by postgres, if possible, but
    would never be used internally. Things like vacuum could report out of date
    symlinks, and maybe fix them (but probably not).
    
    It may sound crude, but the only reason for the symlinks is for humans to
    'see what is going on', and in most cases they wont be very volatile.
    
    
    ----------------------------------------------------------------
    Philip Warner                    |     __---_____
    Albatross Consulting Pty. Ltd.   |----/       -  \
    (A.C.N. 008 659 498)             |          /(@)   ______---_
    Tel: (+61) 0500 83 82 81         |                 _________  \
    Fax: (+61) 0500 83 82 82         |                 ___________ |
    Http://www.rhyme.com.au          |                /           \|
                                     |    --________--
    PGP key available upon request,  |  /
    and from pgp5.ai.mit.edu:11371   |/
    
    
  138. Re: Big 7.1 open items

    Philip Warner <pjw@rhyme.com.au> — 2000-06-20T14:53:54Z

    At 10:45 20/06/00 -0400, Tom Lane wrote:
    >
    >What?
    >
    ...
    >
    >The current discussion of symlinks is focusing on using directory
    >symlinks, not file symlinks, to represent/implement tablespace layout.
    >
    
    Ooops. I'll pull my head in again.
    
    
    ----------------------------------------------------------------
    Philip Warner                    |     __---_____
    Albatross Consulting Pty. Ltd.   |----/       -  \
    (A.C.N. 008 659 498)             |          /(@)   ______---_
    Tel: (+61) 0500 83 82 81         |                 _________  \
    Fax: (+61) 0500 83 82 82         |                 ___________ |
    Http://www.rhyme.com.au          |                /           \|
                                     |    --________--
    PGP key available upon request,  |  /
    and from pgp5.ai.mit.edu:11371   |/
    
    
  139. Re: Big 7.1 open items

    Peter Eisentraut <peter_e@gmx.net> — 2000-06-20T16:43:35Z

    Bruce Momjian writes:
    
    > If we have a new CREATE DATABASE LOCATION command, we can say:
    > 
    > 	CREATE DATABASE LOCATION dbloc IN '/var/private/pgsql';
    > 	CREATE DATABASE newdb IN dbloc;
    
    We kind of have this already, with CREATE DATABASE foo WITH LOCATION =
    'bar'; but of course with environment variable kludgery. But it's a start.
    
    > 	mkdir /var/private/pgsql/dbloc
    > 	ln -s /var/private/pgsql/dbloc data/base/dbloc
    
    I think the problem with this was that you'd have to do an extra lookup
    into, say, pg_location to resolve this. Some people are talking about
    blind writes, this is not really blind.
    
    > 	CREATE LOCATION tabloc IN '/var/private/pgsql';
    > 	CREATE TABLE newtab ... IN tabloc;
    
    Okay, so we'd have "table spaces" and "database spaces". Seems like one
    "space" ought to be enough. I was thinking that the database "space" would
    serve as a default "space" for tables created within it but you could
    still create tables in other "spaces" than were the database really is. In
    fact, the database wouldn't show up at all in the file names anymore,
    which may or may not be a good thing.
    
    I think Tom suggested something more or less like this:
    
    $PGDATA/base/tablespace/segment/table
    
    (leaving the details of "table" aside for now). pg_class would get a
    column storing the table space somehow, say an oid reference to
    pg_location. There would have to be a default tablespace that's created by
    initdb and it's indicated by oid 0. So if you create a simple little table
    "foo" it ends up in
    
    $PGDATA/base/0/0/foo
    
    That is pretty manageable. Now to create a table space you do
    
    CREATE LOCATION "name" AT '/some/where';
    
    which would make an entry in pg_location and, similar to how you
    suggested, create a symlink from
    
    $PGDATA/base/newoid -> /some/where
    
    Then when you create a new table at that new location this gets simply
    noted in pg_class with an oid reference, the rest works completely
    transparently and no lookup outside of pg_class required. The system would
    create the segment 0 subdirectory automatically.
    
    When tables get segmented the system would simply create subdirectories 1,
    2, 3, etc. as needed, just as it created the 0 as need, no extra code.
    
    pg_dump doesn't need to use lstat or whatever at all because the locations
    are catalogued. Administrators don't even need to know about the linking
    business, they just make sure the target directory exists.
    
    Two more items to ponder:
    
    * per-location transaction logs
    
    * pg_upgrade
    
    
    -- 
    Peter Eisentraut                  Sernanders väg 10:115
    peter_e@gmx.net                   75262 Uppsala
    http://yi.org/peter-e/            Sweden
    
    
    
  140. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-20T17:53:26Z

    [ Charset ISO-8859-1 unsupported, converting... ]
    > Bruce Momjian writes:
    > 
    > > If we have a new CREATE DATABASE LOCATION command, we can say:
    > > 
    > > 	CREATE DATABASE LOCATION dbloc IN '/var/private/pgsql';
    > > 	CREATE DATABASE newdb IN dbloc;
    > 
    > We kind of have this already, with CREATE DATABASE foo WITH LOCATION =
    > 'bar'; but of course with environment variable kludgery. But it's a start.
    
    Yes, I didn't like the environment variable stuff.  In fact, I would
    like to not mention the symlink location anywhere in the database, so it
    can be changed without changing it in the database.
    
    > 
    > > 	mkdir /var/private/pgsql/dbloc
    > > 	ln -s /var/private/pgsql/dbloc data/base/dbloc
    > 
    > I think the problem with this was that you'd have to do an extra lookup
    > into, say, pg_location to resolve this. Some people are talking about
    > blind writes, this is not really blind.
    
    I was think of storing the relfilename as dbloc/mytab32332.
    
    
    > 
    > > 	CREATE LOCATION tabloc IN '/var/private/pgsql';
    > > 	CREATE TABLE newtab ... IN tabloc;
    > 
    > Okay, so we'd have "table spaces" and "database spaces". Seems like one
    > "space" ought to be enough. I was thinking that the database "space" would
    > serve as a default "space" for tables created within it but you could
    > still create tables in other "spaces" than were the database really is. In
    > fact, the database wouldn't show up at all in the file names anymore,
    > which may or may not be a good thing.
    > 
    > I think Tom suggested something more or less like this:
    > 
    > $PGDATA/base/tablespace/segment/table
    
    So you mix tables from different database in the same tablespace?  Seems
    better to keep them in separate directories for efficiency and clarity.
    
    We could use tablespace/dbname/table so that a tablespace would have
    a directory for each database that uses the tablespace.
    > 
    > (leaving the details of "table" aside for now). pg_class would get a
    > column storing the table space somehow, say an oid reference to
    > pg_location. There would have to be a default tablespace that's created by
    > initdb and it's indicated by oid 0. So if you create a simple little table
    > "foo" it ends up in
    > 
    > $PGDATA/base/0/0/foo
    > 
    
    Seems better to use the top directory for 0, and have extents in
    subdirectories like Extent-2, etc.  Easier for administrators and new
    people.
    
    However, one problem is that tables created in a database without a
    location are put under pgsql directory.  You would have to symlink the
    actual database directory.  Maybe that is why I had separate database
    locations.  I realize that is bad.
    
    > That is pretty manageable. Now to create a table space you do
    > 
    > CREATE LOCATION "name" AT '/some/where';
    > 
    > which would make an entry in pg_location and, similar to how you
    > suggested, create a symlink from
    > 
    > $PGDATA/base/newoid -> /some/where
    > 
    > Then when you create a new table at that new location this gets simply
    > noted in pg_class with an oid reference, the rest works completely
    > transparently and no lookup outside of pg_class required. The system would
    > create the segment 0 subdirectory automatically.
    
    > 
    > When tables get segmented the system would simply create subdirectories 1,
    > 2, 3, etc. as needed, just as it created the 0 as need, no extra code.
    > 
    > pg_dump doesn't need to use lstat or whatever at all because the locations
    > are catalogued. Administrators don't even need to know about the linking
    > business, they just make sure the target directory exists.
    
    What I was suggesting is not to catalog the symlink locations, but to
    use lstat when dumping, so that admins can move files around using
    symlinks and not have to udpate the database.
    
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  141. RE: Big 7.1 open items

    Hiroshi Inoue <inoue@tpf.co.jp> — 2000-06-20T20:59:41Z

    > -----Original Message-----
    > From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
    > 
    > Bruce Momjian <pgman@candle.pha.pa.us> writes:
    > > Agreed.  Seems we have several issues:
    > 
    > > 	filename contents
    > > 	tablespace implementation
    > > 	tablespace directory layout
    > > 	tablespace commands and syntax
    >
    
    [snip]
     
    > 
    > Comments?  Anything else that we do have consensus on?
    >
    
    Before the details of tablespace implementation,
    
    1) How to change(extend) the syntax of CREATE TABLE
        We only add table(data)space name with some
        keyword ? i.e Do we consider tablespace as an
       abstraction ? 
    
    To confirm our mutual understanding.
    
    2) Is tablespace defined per PostgreSQL's database ?
    3) Is default tablespace defined per database/user or 
        for all ?
    
    AFAIK in Oracle,2) global, 3) per user. 
    
    Regards.
    
    Hiroshi Inoue
    Inoue@tpf.co.jp 
    
    
  142. RE: Big 7.1 open items

    Hiroshi Inoue <inoue@tpf.co.jp> — 2000-06-20T23:54:51Z

    > -----Original Message-----
    > From: Peter Eisentraut
    >
    > Bruce Momjian writes:
    >
    > > If we have a new CREATE DATABASE LOCATION command, we can say:
    > >
    > > 	CREATE DATABASE LOCATION dbloc IN '/var/private/pgsql';
    > > 	CREATE DATABASE newdb IN dbloc;
    >
    > We kind of have this already, with CREATE DATABASE foo WITH LOCATION =
    > 'bar'; but of course with environment variable kludgery. But it's a start.
    >
    > > 	mkdir /var/private/pgsql/dbloc
    > > 	ln -s /var/private/pgsql/dbloc data/base/dbloc
    >
    > I think the problem with this was that you'd have to do an extra lookup
    > into, say, pg_location to resolve this. Some people are talking about
    > blind writes, this is not really blind.
    >
    > > 	CREATE LOCATION tabloc IN '/var/private/pgsql';
    > > 	CREATE TABLE newtab ... IN tabloc;
    >
    > Okay, so we'd have "table spaces" and "database spaces". Seems like one
    > "space" ought to be enough.
    
    Does your "database space" correspond to current PostgreSQL's database ?
    And is it different from SCHEMA ?
    
    Regards.
    
    Hiroshi Inoue
    Inoue@tpf.co.jp
    
    
    
  143. RE: Big 7.1 open items

    Philip Warner <pjw@rhyme.com.au> — 2000-06-21T01:22:10Z

    At 05:59 21/06/00 +0900, Hiroshi Inoue wrote:
    >
    >Before the details of tablespace implementation,
    >
    >1) How to change(extend) the syntax of CREATE TABLE
    >    We only add table(data)space name with some
    >    keyword ? i.e Do we consider tablespace as an
    >   abstraction ? 
    >
    
    It may be worth considering leaving the CREATE TABLE statement alone.
    Dec/RDB uses a new statement entirely to define where a table goes. It's
    actually a *very* complex statement, but the key syntax is:
    
    CREATE STORAGE MAP <map-name> FOR <table-name>
        [PLACEMENT VIA INDEX <index-name>]
        STORE [COLUMNS ([col-name,])]
        [IN <area-name>
         | RANDOMLY ACROSS <area-list>]
    ;
    
    where <area-name> is the name of a Dec/RDB STORAGE AREA, which is basically
    a file that contains one or more tables/indices etc. There are options to
    specify area choice by column value, fullness, how to store BLOBs etc etc.
    
    I realize that this is way too complex for a first pass, but it gives an
    idea of where you *might* want to go, and hence, possibly, a reason for
    starting out with something like:
    
    CREATE STORAGE MAP <map-name> for <table-name> STORE IN <area-name>;
    
    
    P.S. I really hope this is more cogent than my last message.
    
    ----------------------------------------------------------------
    Philip Warner                    |     __---_____
    Albatross Consulting Pty. Ltd.   |----/       -  \
    (A.C.N. 008 659 498)             |          /(@)   ______---_
    Tel: (+61) 0500 83 82 81         |                 _________  \
    Fax: (+61) 0500 83 82 82         |                 ___________ |
    Http://www.rhyme.com.au          |                /           \|
                                     |    --________--
    PGP key available upon request,  |  /
    and from pgp5.ai.mit.edu:11371   |/
    
    
  144. Re: Big 7.1 open items

    Chris <chrisb@nimrod.itg.telstra.com.au> — 2000-06-21T02:27:45Z

    Tom Lane wrote:
    > Some unhappiness was raised about
    > depending on symlinks for this function, but I didn't hear one single
    > concrete reason not to do it, nor an alternative design. 
    
    Are symlinks portable?
    
    
  145. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-21T03:45:13Z

    [ Charset ISO-8859-1 unsupported, converting... ]
    > > -----Original Message-----
    > > From: Peter Eisentraut
    > >
    > > Bruce Momjian writes:
    > >
    > > > If we have a new CREATE DATABASE LOCATION command, we can say:
    > > >
    > > > 	CREATE DATABASE LOCATION dbloc IN '/var/private/pgsql';
    > > > 	CREATE DATABASE newdb IN dbloc;
    > >
    > > We kind of have this already, with CREATE DATABASE foo WITH LOCATION =
    > > 'bar'; but of course with environment variable kludgery. But it's a start.
    > >
    > > > 	mkdir /var/private/pgsql/dbloc
    > > > 	ln -s /var/private/pgsql/dbloc data/base/dbloc
    > >
    > > I think the problem with this was that you'd have to do an extra lookup
    > > into, say, pg_location to resolve this. Some people are talking about
    > > blind writes, this is not really blind.
    > >
    > > > 	CREATE LOCATION tabloc IN '/var/private/pgsql';
    > > > 	CREATE TABLE newtab ... IN tabloc;
    > >
    > > Okay, so we'd have "table spaces" and "database spaces". Seems like one
    > > "space" ought to be enough.
    > 
    > Does your "database space" correspond to current PostgreSQL's database ?
    > And is it different from SCHEMA ?
    
    OK, seems I have things a little confused.  My whole idea of database
    locations vs. normal locations is flawed.  Here is my new proposal.
    
    First, I believe there should be locations define per database, not
    global locations.
    
    I recommend 
    
    	CREATE TABLESPACE tabloc USING '/var/private/pgsql';
    	CREATE TABLE newtab ... IN tabloc;
    
    and this does:
    
    	mkdir /var/private/pgsql/dbname
    	mkdir /var/private/pgsql/dbname/tabloc
     	ln -s /var/private/pgsql/dbname/tabloc data/base/tabloc
    
    I recommend making a dbname in each directory, then putting the
    location inside there.
    
    This allows the same directory to be used for tablespaces by several
    databases, and allows databases created in locations without making
    special per-database locations.
    
    I can give a more specific proposal if people wish.
    
    Comments?
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  146. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-21T03:46:23Z

    > Tom Lane wrote:
    > > Some unhappiness was raised about
    > > depending on symlinks for this function, but I didn't hear one single
    > > concrete reason not to do it, nor an alternative design. 
    > 
    > Are symlinks portable?
    
    Sure, and if the system loading it can not create the required symlinks
    because the directories don't exist, it can just skip the symlink step.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  147. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-21T04:06:42Z

    Bruce Momjian <pgman@candle.pha.pa.us> writes:
    > I recommend making a dbname in each directory, then putting the
    > location inside there.
    
    This still seems backwards to me.  Why is it better than tablespace
    directory inside database directory?
    
    One significant problem with it is that there's no longer (AFAICS)
    a "default" per-database directory that corresponds to the current
    working directory of backends running in that database.  Thus,
    for example, it's not immediately clear where temporary files and
    backend core-dump files will end up.  Also, you've just added an
    essential extra level (if not two) to the pathnames that backends will
    use to address files.
    
    There is a great deal to be said for
    	..../database/tablespace/filename
    where .../database/ is the working directory of a backend running in
    that database, so that the relative pathname used by that backend to
    get to a table is just tablespace/filename.  I fail to see any advantage
    in reversing the pathname order.  If you see one, enlighten me.
    
    			regards, tom lane
    
    
  148. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-21T04:33:01Z

    > Bruce Momjian <pgman@candle.pha.pa.us> writes:
    > > I recommend making a dbname in each directory, then putting the
    > > location inside there.
    > 
    > This still seems backwards to me.  Why is it better than tablespace
    > directory inside database directory?
    
    Yes, that is what I want too.
    
    > 
    > One significant problem with it is that there's no longer (AFAICS)
    > a "default" per-database directory that corresponds to the current
    > working directory of backends running in that database.  Thus,
    > for example, it's not immediately clear where temporary files and
    > backend core-dump files will end up.  Also, you've just added an
    > essential extra level (if not two) to the pathnames that backends will
    > use to address files.
    > 
    > There is a great deal to be said for
    > 	..../database/tablespace/filename
    > where .../database/ is the working directory of a backend running in
    > that database, so that the relative pathname used by that backend to
    > get to a table is just tablespace/filename.  I fail to see any advantage
    > in reversing the pathname order.  If you see one, enlighten me.
    
    Yes, agreed.  I was thinking this:
    
    	CREATE TABLESPACE loc USING '/var/pgsql'
    
    does:
    
    	ln -s /var/pgsql/dbname/loc data/base/dbname/loc 
    
    In this way, the database has a view of its main directory, plus a /loc
    subdirectory for the tablespace.  In the other location, we have
    /var/pgsql/dbname/loc because this allows different databases to use:
    
    	CREATE TABLESPACE loc USING '/var/pgsql'
    
    and they do not collide with each other in /var/pgsql.  It puts /loc
    inside the dbname that created it.  It also allows:
    
    	CREATE DATABASE loc IN '/var/pgsql'
    
    to work because this does:
    
    	ln -s /var/pgsql/dbname data/base/dbname
    
    Seems we should create the dbname and loc directories for the users
    automatically in the synlink target to keep things clean.  It prevents
    them from accidentally having two databases point to the same directory.
    
    Comments?
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  149. Re: Big 7.1 open items

    Chris <chrisb@nimrod.itg.telstra.com.au> — 2000-06-21T04:45:01Z

    Bruce Momjian wrote:
    > 
    > > Tom Lane wrote:
    > > > Some unhappiness was raised about
    > > > depending on symlinks for this function, but I didn't hear one single
    > > > concrete reason not to do it, nor an alternative design.
    > >
    > > Are symlinks portable?
    > 
    > Sure, and if the system loading it can not create the required symlinks
    > because the directories don't exist, it can just skip the symlink step.
    
    What I meant is, would you still be able to create tablespaces on
    systems without symlinks? That would seem to be a desirable feature.
    
    
  150. RE: Big 7.1 open items

    Hiroshi Inoue <inoue@tpf.co.jp> — 2000-06-21T04:55:01Z

    > -----Original Message-----
    > From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
    > 
    > Bruce Momjian <pgman@candle.pha.pa.us> writes:
    > > I recommend making a dbname in each directory, then putting the
    > > location inside there.
    > 
    > This still seems backwards to me.  Why is it better than tablespace
    > directory inside database directory?
    > 
    > One significant problem with it is that there's no longer (AFAICS)
    > a "default" per-database directory that corresponds to the current
    > working directory of backends running in that database.  Thus,
    > for example, it's not immediately clear where temporary files and
    > backend core-dump files will end up.  Also, you've just added an
    > essential extra level (if not two) to the pathnames that backends will
    > use to address files.
    > 
    > There is a great deal to be said for
    > 	..../database/tablespace/filename
    
    OK,I seem to have gotten the answer for the question
       Is tablespace defined per PostgreSQL's database ?
    
    You and Bruce
       1) tablespace is per database
    Peter seems to have the following idea(?? not sure)
       2) database = tablespace
    My opinion
       3) database and tablespace are relatively irrelevant.
           I assume PostgreSQL's database would correspond 
           to the concept of SCHEMA.
    
    It seems we are different from the first.
    Shoudln't we reach an agreement on it in the first place ?
    
    Regards.
    
    Hiroshi Inoue
    Inoue@tpf.co.jp
    
    
    
  151. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-21T05:09:52Z

    Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes:
    > What I meant is, would you still be able to create tablespaces on
    > systems without symlinks? That would seem to be a desirable feature.
    
    All else being equal, it'd be nice.  Since all else is not equal,
    exactly how much sweat are we willing to expend on supporting that
    feature on such systems --- to the exclusion of other features we
    might expend the same sweat on, with more widely useful results?
    
    Bear in mind that everything will still *work* just fine on such a
    platform, you just don't have a way to spread the database across
    multiple filesystems.  That's only an issue if the platform has a
    fairly Unixy notion of filesystems ... but no symlinks.
    
    A few messages back someone was opining that we were wasting our time
    thinking about tablespaces at all, because any modern platform can
    create disk-spanning filesystems for itself, so applications don't have
    to worry.  I don't buy that argument in general, but I'm quite willing
    to quote it for the *very* few systems that are Unixy enough to run
    Postgres in the first place, but not quite Unixy enough to have
    symlinks.
    
    You gotta draw the line somewhere at what you will support, and
    this particular line seems to me to be entirely reasonable and
    justifiable.  YMMV...
    
    			regards, tom lane
    
    
  152. RE: Big 7.1 open items

    Don Baccus <dhogaza@pacifier.com> — 2000-06-21T05:12:48Z

    At 11:22 AM 6/21/00 +1000, Philip J. Warner wrote:
    
    >It may be worth considering leaving the CREATE TABLE statement alone.
    >Dec/RDB uses a new statement entirely to define where a table goes...
    
    It's worth considering, but on the other hand Oracle users greatly
    outnumber Compaq/RDB users these days...
    
    If there's no SQL92 guidance for implementing a feature, I'm pretty much in
    favor of tracking Oracle, whose SQL dialect is rapidly becoming a
    de-facto standard. 
    
    I'm not saying I like the fact, Oracle's a pain in the ass.  But when
    adopting existing syntax, might as well adopt that of the crushing
    borg.
    
    
    
    - Don Baccus, Portland OR <dhogaza@pacifier.com>
      Nature photos, on-line guides, Pacific Northwest
      Rare Bird Alert Service and other goodies at
      http://donb.photo.net.
    
    
  153. Re: Big 7.1 open items

    Don Baccus <dhogaza@pacifier.com> — 2000-06-21T05:16:50Z

    At 12:27 PM 6/21/00 +1000, Chris Bitmead wrote:
    >Tom Lane wrote:
    >> Some unhappiness was raised about
    >> depending on symlinks for this function, but I didn't hear one single
    >> concrete reason not to do it, nor an alternative design. 
    >
    >Are symlinks portable?
    
    In today's world?  Yeah, I think so.
    
    My only unhappiness has hinged around the possibility that a new
    storage scheme might temp folks to toss aside the sgmr abstraction,
    or weaken it.
    
    It doesn't appear that this will happen. 
    
    Given an adequate sgmr abstraction, it doesn't really matter what
    low-level model is adopted in some sense (i.e. other models might
    become available, the implemented model might get replaced, etc -
    without breaking backends).
    
    Obviously we'll all be using the default model for some time, maybe
    forever, but if mistakes are made maintaining the smgr abstraction
    means that replacements are possible.  Or kinky substitutes like
    working with DAFS.
    
    
    
    - Don Baccus, Portland OR <dhogaza@pacifier.com>
      Nature photos, on-line guides, Pacific Northwest
      Rare Bird Alert Service and other goodies at
      http://donb.photo.net.
    
    
  154. Re: Big 7.1 open items

    Thomas Lockhart <lockhart@alumni.caltech.edu> — 2000-06-21T05:19:29Z

    > Yes, I didn't like the environment variable stuff.  In fact, I would
    > like to not mention the symlink location anywhere in the database, so 
    > it can be changed without changing it in the database.
    
    Well, as y'all have noticed, I think there are strong reasons to use
    environment variables to manage locations, and that symlinks are a
    potential portability and robustness problem.
    
    An additional point which has relevance to this whole discussion:
    
    In the future we may allow system resource such as tables to carry names
    which use multi-byte encodings. afaik these encodings are not allowed to
    be used for physical file names, and even if they were the utility of
    using standard operating system utilities like ls goes way down.
    
    istm that from a portability and evolutionary standpoint OID-only file
    names (or at least file names *not* based on relation/class names) is a
    requirement.
    
    Comments?
    
                       - Thomas
    
    
  155. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-21T05:23:57Z

    "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
    >> There is a great deal to be said for
    >> ..../database/tablespace/filename
    
    > OK,I seem to have gotten the answer for the question
    >    Is tablespace defined per PostgreSQL's database ?
    
    Not necessarily --- the tablespace subdirectories could be symlinks
    pointing to the same place (assuming you use OIDs or something to keep
    the table filenames unique even across databases).  This is just an
    implementation mechanism; it doesn't foreclose the policy decision
    whether tablespaces are database-local or installation-wide.
    
    (OTOH, pathnames like tablespace/database would pretty much force
    tablespaces to be installation-wide whether you wanted it that way
    or not.)
    
    > My opinion
    >    3) database and tablespace are relatively irrelevant.
    >        I assume PostgreSQL's database would correspond 
    >        to the concept of SCHEMA.
    
    My inclindation is that tablespaces should be installation-wide, but
    I'm not completely sold on it.  In any case I could see wanting a
    permissions mechanism that would only allow some databases to have
    tables in a particular tablespace.
    
    We do need to think more about how traditional Postgres databases
    fit together with SCHEMA.  Maybe we wouldn't even need multiple
    databases per installation if we had SCHEMA done right.
    
    			regards, tom lane
    
    
  156. Re: Big 7.1 open items

    Ross Reedstrom <reedstrm@rice.edu> — 2000-06-21T05:45:02Z

    On Wed, Jun 21, 2000 at 01:23:57AM -0400, Tom Lane wrote:
    > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
    > 
    > > My opinion
    > >    3) database and tablespace are relatively irrelevant.
    > >        I assume PostgreSQL's database would correspond 
    > >        to the concept of SCHEMA.
    > 
    > My inclindation is that tablespaces should be installation-wide, but
    > I'm not completely sold on it.  In any case I could see wanting a
    > permissions mechanism that would only allow some databases to have
    > tables in a particular tablespace.
    > 
    > We do need to think more about how traditional Postgres databases
    > fit together with SCHEMA.  Maybe we wouldn't even need multiple
    > databases per installation if we had SCHEMA done right.
    > 
    
    The important point I think is that tablespaces are about physical
    storage/namespace, and SCHEMA are about logical namespace: it would make
    sense for tables from multiple schema to live in the same tablespace,
    as well as tables from one schema to be stored in multiple tablespaces.
    
    Ross
    -- 
    Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
    NSBRI Research Scientist/Programmer
    Computer and Information Technology Institute
    Rice University, 6100 S. Main St.,  Houston, TX 77005
    
    
  157. Re: Big 7.1 open items

    Chris <chrisb@nimrod.itg.telstra.com.au> — 2000-06-21T06:13:47Z

    "Ross J. Reedstrom" wrote:
    
    > The important point I think is that tablespaces are about physical
    > storage/namespace, and SCHEMA are about logical namespace: it would make
    > sense for tables from multiple schema to live in the same tablespace,
    > as well as tables from one schema to be stored in multiple tablespaces.
    
    If we accept that argument (which sounds good) then wouldn't we have...
    
    data/base/db1/table1 -> ../../../tablespace/ts1/db1.table1
    data/base/db1/table2 -> ../../../tablespace/ts1/db1.table2
    data/tablespace/ts1/db1.table1
    data/tablespace/ts1/db1.table2
    
    In other words there is a directory for databases, and a directory for
    tablespaces. Database tables are symlinked to the appropriate
    tablespace. So there is multiple databases per tablespace and multiple
    tablespaces per database.
    
    
  158. RE: Big 7.1 open items

    Philip Warner <pjw@rhyme.com.au> — 2000-06-21T06:55:58Z

    At 22:12 20/06/00 -0700, Don Baccus wrote:
    >At 11:22 AM 6/21/00 +1000, Philip J. Warner wrote:
    >
    >>It may be worth considering leaving the CREATE TABLE statement alone.
    >>Dec/RDB uses a new statement entirely to define where a table goes...
    >
    >It's worth considering, but on the other hand Oracle users greatly
    >outnumber Compaq/RDB users these days...
    
    It's actually Oracle/Rdb, but I call it Dec/Rdb to distinguish it from
    'Oracle/Oracle'. It was acquired by Oracle, supposedly because Oracle
    wanted their optimizer, management and tuning tools (although that was only
    hearsay). They *say* that they plan to merge the two products.
    
    What I was trying to suggest was that the CREATE TABLE statement will get
    very overloaded, and it might be worth avoiding having to support two
    storage management syntaxes if/when it becomes desirable to create a
    'storage' statement of some kind.
    
    
    >
    >I'm not saying I like the fact, Oracle's a pain in the ass.  But when
    >adopting existing syntax, might as well adopt that of the crushing
    >borg.
    >
    
    Only if it is a good thing, or part of a real standard. Philosophically,
    where possible I would prefer to see statement that are *in* the SQL
    standard (ie. CREATE TABLE) to be left as unencumbered as possible.
    
    
    ----------------------------------------------------------------
    Philip Warner                    |     __---_____
    Albatross Consulting Pty. Ltd.   |----/       -  \
    (A.C.N. 008 659 498)             |          /(@)   ______---_
    Tel: (+61) 0500 83 82 81         |                 _________  \
    Fax: (+61) 0500 83 82 82         |                 ___________ |
    Http://www.rhyme.com.au          |                /           \|
                                     |    --________--
    PGP key available upon request,  |  /
    and from pgp5.ai.mit.edu:11371   |/
    
    
  159. RE: Big 7.1 open items

    Hiroshi Inoue <inoue@tpf.co.jp> — 2000-06-21T09:37:02Z

    > -----Original Message-----
    > From: pgsql-hackers-owner@hub.org
    > [mailto:pgsql-hackers-owner@hub.org]On Behalf Of Chris Bitmead
    >
    > "Ross J. Reedstrom" wrote:
    >
    > > The important point I think is that tablespaces are about physical
    > > storage/namespace, and SCHEMA are about logical namespace: it would make
    > > sense for tables from multiple schema to live in the same tablespace,
    > > as well as tables from one schema to be stored in multiple tablespaces.
    >
    > If we accept that argument (which sounds good) then wouldn't we have...
    >
    > data/base/db1/table1 -> ../../../tablespace/ts1/db1.table1
    > data/base/db1/table2 -> ../../../tablespace/ts1/db1.table2
    > data/tablespace/ts1/db1.table1
    > data/tablespace/ts1/db1.table2
    >
    
    Hmm,is above symlinking business really preferable just because
    it is possible ?  Why do we have to be dependent upon directory
    tree representation when we handle db structure ?
    
    Regards.
    
    Hiroshi Inoue
    Inoue@tpf.co.jp
    
    
    
  160. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-21T14:55:39Z

    > > Sure, and if the system loading it can not create the required symlinks
    > > because the directories don't exist, it can just skip the symlink step.
    > 
    > What I meant is, would you still be able to create tablespaces on
    > systems without symlinks? That would seem to be a desirable feature.
    
    You could create tablespaces, but you could not point them at different
    drives.  The issue is that we don't store the symlink location in the
    database, just the tablespace name.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  161. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-21T15:07:20Z

    Bruce Momjian <pgman@candle.pha.pa.us> writes:
    > Yes, agreed.  I was thinking this:
    > 	CREATE TABLESPACE loc USING '/var/pgsql'
    > does:
    > 	ln -s /var/pgsql/dbname/loc data/base/dbname/loc 
    > In this way, the database has a view of its main directory, plus a /loc
    > subdirectory for the tablespace.  In the other location, we have
    > /var/pgsql/dbname/loc because this allows different databases to use:
    > 	CREATE TABLESPACE loc USING '/var/pgsql'
    > and they do not collide with each other in /var/pgsql.
    
    But they don't collide anyway, because the dbname is already unique.
    Isn't the extra subdirectory a waste?
    
    Because table files will have installation-wide unique names, there's
    no really good reason to have either level of subdirectory; you could
    just make
    	CREATE TABLESPACE loc USING '/var/pgsql'
    do
    	ln -s /var/pgsql data/base/dbname/loc 
    and it'd still work even if multiple DBs were using the same tablespace.
    
    However, forcing creation of a subdirectory does give you the chance to
    make sure the subdir is owned by postgres and has the right permissions,
    so there's something to be said for that.  It might be reasonable to do
    	mkdir /var/pgsql/dbname
    	chmod 700 /var/pgsql/dbname
    	ln -s /var/pgsql/dbname data/base/dbname/loc 
    
    			regards, tom lane
    
    
  162. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-21T15:08:45Z

    > At 12:27 PM 6/21/00 +1000, Chris Bitmead wrote:
    > >Tom Lane wrote:
    > >> Some unhappiness was raised about
    > >> depending on symlinks for this function, but I didn't hear one single
    > >> concrete reason not to do it, nor an alternative design. 
    > >
    > >Are symlinks portable?
    > 
    > In today's world?  Yeah, I think so.
    > 
    > My only unhappiness has hinged around the possibility that a new
    > storage scheme might temp folks to toss aside the sgmr abstraction,
    > or weaken it.
    > 
    > It doesn't appear that this will happen. 
    > 
    > Given an adequate sgmr abstraction, it doesn't really matter what
    > low-level model is adopted in some sense (i.e. other models might
    > become available, the implemented model might get replaced, etc -
    > without breaking backends).
    > 
    > Obviously we'll all be using the default model for some time, maybe
    > forever, but if mistakes are made maintaining the smgr abstraction
    > means that replacements are possible.  Or kinky substitutes like
    > working with DAFS.
    
    The symlink solution where the actual symlink location is not stored
    in the database is certainly abstract.  We store that info in the file
    system, which is where it belongs.  We only query the symlink location
    when we need it for database location dumping.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  163. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-21T15:11:40Z

    > > Yes, I didn't like the environment variable stuff.  In fact, I would
    > > like to not mention the symlink location anywhere in the database, so 
    > > it can be changed without changing it in the database.
    > 
    > Well, as y'all have noticed, I think there are strong reasons to use
    > environment variables to manage locations, and that symlinks are a
    > potential portability and robustness problem.
    
    Sorry, disagree.  Environment variables are a pain to administer, and
    quite counter-intuitive.
    
    I also don't see any portability or robustness problems.  Can you be
    more specific?
    
    > An additional point which has relevance to this whole discussion:
    > 
    > In the future we may allow system resource such as tables to carry names
    > which use multi-byte encodings. afaik these encodings are not allowed to
    > be used for physical file names, and even if they were the utility of
    > using standard operating system utilities like ls goes way down.
    
    That is really a different issues of file names.  Multi-byte table names
    can be made to hold just the oid.  We have complete control over that
    because the file name will be in pg_class.
    
    > istm that from a portability and evolutionary standpoint OID-only file
    > names (or at least file names *not* based on relation/class names) is a
    > requirement.
    
    Maybe a requirement at some point for some installations, but I hope not
    a general requirement.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  164. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-21T15:19:49Z

    > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
    > >> There is a great deal to be said for
    > >> ..../database/tablespace/filename
    > 
    > > OK,I seem to have gotten the answer for the question
    > >    Is tablespace defined per PostgreSQL's database ?
    > 
    > Not necessarily --- the tablespace subdirectories could be symlinks
    > pointing to the same place (assuming you use OIDs or something to keep
    > the table filenames unique even across databases).  This is just an
    > implementation mechanism; it doesn't foreclose the policy decision
    > whether tablespaces are database-local or installation-wide.
    
    Seems we are better just auto-creating a directory that matches the
    dbname.
    
    > 
    > (OTOH, pathnames like tablespace/database would pretty much force
    > tablespaces to be installation-wide whether you wanted it that way
    > or not.)
    
    
    > 
    > > My opinion
    > >    3) database and tablespace are relatively irrelevant.
    > >        I assume PostgreSQL's database would correspond 
    > >        to the concept of SCHEMA.
    > 
    > My inclindation is that tablespaces should be installation-wide, but
    > I'm not completely sold on it.  In any case I could see wanting a
    > permissions mechanism that would only allow some databases to have
    > tables in a particular tablespace.
    
    On idea is to allow tablespaces defined in template1 to be propogated to
    newly created directories, with the symlinks adjusted so they use the
    proper dbname in the symlink.
    
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  165. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-21T15:21:04Z

    > The important point I think is that tablespaces are about physical
    > storage/namespace, and SCHEMA are about logical namespace: it would make
    > sense for tables from multiple schema to live in the same tablespace,
    > as well as tables from one schema to be stored in multiple tablespaces.
    > 
    
    It seems mixing the physical layout and the logical SCHEMA would have
    problems because people have different reasons for using each feature.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  166. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-21T15:23:26Z

    > At 22:12 20/06/00 -0700, Don Baccus wrote:
    > >At 11:22 AM 6/21/00 +1000, Philip J. Warner wrote:
    > >
    > >>It may be worth considering leaving the CREATE TABLE statement alone.
    > >>Dec/RDB uses a new statement entirely to define where a table goes...
    > >
    > >It's worth considering, but on the other hand Oracle users greatly
    > >outnumber Compaq/RDB users these days...
    > 
    > It's actually Oracle/Rdb, but I call it Dec/Rdb to distinguish it from
    > 'Oracle/Oracle'. It was acquired by Oracle, supposedly because Oracle
    > wanted their optimizer, management and tuning tools (although that was only
    > hearsay). They *say* that they plan to merge the two products.
    > 
    > What I was trying to suggest was that the CREATE TABLE statement will get
    > very overloaded, and it might be worth avoiding having to support two
    > storage management syntaxes if/when it becomes desirable to create a
    > 'storage' statement of some kind.
    > 
    
    Seems adding tablespace to CREATE TABLE/INDEX/DATABASE is pretty simple.
    Doing it as a separate command seems cumbersome.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  167. Re: Big 7.1 open items

    Thomas Lockhart <lockhart@alumni.caltech.edu> — 2000-06-21T15:27:36Z

    > Sorry, disagree.  Environment variables are a pain to administer, and
    > quite counter-intuitive.
    
    Well, I guess we disagree. But until we have a complete proposed
    solution, we should leave environment variables on the table, since they
    *do* allow some decoupling of logical and physical storage, and *do*
    give the administrator some control over resources *that the admin would
    not otherwise have*.
    
    > > istm that from a portability and evolutionary standpoint OID-only 
    > > file names (or at least file names *not* based on relation/class 
    > > names) is a requirement.
    > Maybe a requirement at some point for some installations, but I hope 
    > not a general requirement.
    
    If a table name can have characters which are not legal for file names,
    then how would you propose to support it? If we are doing a
    restructuring of the storage scheme, this should be taken into account.
    
    lockhart=# create table "one/two" (i int);
    ERROR:  cannot create one/two
    
    Why not? It demonstrates an unfortunate linkage between file systems and
    database resources.
    
                         - Thomas
    
    
  168. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-21T15:28:09Z

    Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
    > Well, as y'all have noticed, I think there are strong reasons to use
    > environment variables to manage locations, and that symlinks are a
    > potential portability and robustness problem.
    
    Reasons?  Evidence?
    
    > An additional point which has relevance to this whole discussion:
    > In the future we may allow system resource such as tables to carry names
    > which use multi-byte encodings. afaik these encodings are not allowed to
    > be used for physical file names, and even if they were the utility of
    > using standard operating system utilities like ls goes way down.
    
    Good point, although in one sense a string is a string --- as long as
    we don't allow embedded nulls in server-side encodings, we could use
    anything that Postgres thought was a name in a filename, and the OS
    should take it.  But if your local ls doesn't show it the way you see
    in Postgres, the usefulness of having the tablename in the filename
    goes way down.
    
    > istm that from a portability and evolutionary standpoint OID-only file
    > names (or at least file names *not* based on relation/class names) is a
    > requirement.
    
    No argument from me ;-).  I've been looking for compromise positions
    but I still think that pure numeric filenames are the cleanest solution.
    
    There's something else that should be taken into account: for WAL, the
    log will need to record the table file that each insert/delete/update
    operation affects.  To do that with the smgr-token-is-a-pathname
    approach I was suggesting yesterday, I think you have to record the
    database name and pathname in each WAL log entry.  That's 64 bytes/log
    entry which is a *lot*.  If we bit the bullet and restricted ourselves
    to numeric filenames then the log would need just four numeric values:
    	database OID
    	tablespace OID
    	relation OID
    	relation version number
    (this set of 4 values would also be an smgr file reference token).
    16 bytes/log entry looks much better than 64.
    
    At the moment I can recall the following opinions:
    
    Pure OID filenames: Thomas, Tom, Marc, Peter E.
    
    OID+relname filenames: Bruce
    
    Vadim was in the pure-OID camp a few months ago, but I won't presume
    to list him there now since he hasn't been involved in this most
    recent round of discussions.  I'm not sure where anyone else stands...
    but at least in terms of the core group it's pretty clear where the
    majority opinion is.
    
    			regards, tom lane
    
    
  169. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-21T15:45:12Z

    > Bruce Momjian <pgman@candle.pha.pa.us> writes:
    > > Yes, agreed.  I was thinking this:
    > > 	CREATE TABLESPACE loc USING '/var/pgsql'
    > > does:
    > > 	ln -s /var/pgsql/dbname/loc data/base/dbname/loc 
    > > In this way, the database has a view of its main directory, plus a /loc
    > > subdirectory for the tablespace.  In the other location, we have
    > > /var/pgsql/dbname/loc because this allows different databases to use:
    > > 	CREATE TABLESPACE loc USING '/var/pgsql'
    > > and they do not collide with each other in /var/pgsql.
    > 
    > But they don't collide anyway, because the dbname is already unique.
    > Isn't the extra subdirectory a waste?
    
    Not really.  Yes, we could put them all in the same directory, but why
    bother.  Probably easier to put them in unique directories per database.
    Cuts down on directory searches to open file, and allows 'du' to return
    meaningful numbers per database.  If you don't do that, you can't really
    tell what files belong to which databases.
    
    > 
    > Because table files will have installation-wide unique names, there's
    > no really good reason to have either level of subdirectory; you could
    > just make
    > 	CREATE TABLESPACE loc USING '/var/pgsql'
    > do
    > 	ln -s /var/pgsql data/base/dbname/loc 
    > and it'd still work even if multiple DBs were using the same tablespace.
    > 
    > However, forcing creation of a subdirectory does give you the chance to
    > make sure the subdir is owned by postgres and has the right permissions,
    > so there's something to be said for that.  It might be reasonable to do
    > 	mkdir /var/pgsql/dbname
    > 	chmod 700 /var/pgsql/dbname
    > 	ln -s /var/pgsql/dbname data/base/dbname/loc 
    
    Yes, that is true.  My idea is that they may want to create loc1 and
    loc2 which initially point to the same location, but later may be moved.
    For example, one tablespace for tables, another for indexes.  They may
    initially point to the same directory, but later be split.  Seems we
    need to keep the actual tablespace information relivant by using
    different directories on the other end too.
    
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  170. Re: Big 7.1 open items

    Lamar Owen <lamar.owen@wgcr.org> — 2000-06-21T15:48:19Z

    Tom Lane wrote:
     
    > Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
    > > Well, as y'all have noticed, I think there are strong reasons to use
    > > environment variables to manage locations, and that symlinks are a
    > > potential portability and robustness problem.
     
    > Reasons?  Evidence?
    
    Does Win32 do symlinks these days?  I know Win32 does envvars, and Win32
    is currently a supported platform.
    
    I'm not thrilled with either solution -- envvars have their problems
    just as surely as symlinks do.
     
    > At the moment I can recall the following opinions:
     
    > Pure OID filenames: Thomas, Tom, Marc, Peter E.
    
    FWIW, count me here.  I have tried administering my system using the
    filenames -- and have been bitten.  Better admin tools in the PostgreSQL
    package beat using standard filesystem tools -- the PostgreSQL tools can
    be WAL-aware, transaction-aware, and can provide consistent results. 
    Filesystem tools never will be able to provide consistent results for a
    database system that must remain up 24x7, as many if not most PostgreSQL
    installations must.
     
    > OID+relname filenames: Bruce
    
    Sorry Bruce -- I understand and am sympathetic to your position, and, at
    one time, I agreed with it.  But not any more.
    
    --
    Lamar Owen
    WGCR Internet Radio
    1 Peter 4:11
    
    
  171. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-21T16:03:12Z

    > FWIW, count me here.  I have tried administering my system using the
    > filenames -- and have been bitten.  Better admin tools in the PostgreSQL
    > package beat using standard filesystem tools -- the PostgreSQL tools can
    > be WAL-aware, transaction-aware, and can provide consistent results. 
    > Filesystem tools never will be able to provide consistent results for a
    > database system that must remain up 24x7, as many if not most PostgreSQL
    > installations must.
    >  
    > > OID+relname filenames: Bruce
    > 
    > Sorry Bruce -- I understand and am sympathetic to your position, and, at
    > one time, I agreed with it.  But not any more.
    
    I thought the most recent proposal was to just throw ~16 chars of the
    file name on the end of the file name, and that should not be used for
    anything except visibility.  WAL would not need to store that.  It could
    just grab the file name that matches the oid/sequence number.
    
    If people don't want table names in the file name, I totally understand,
    and we can move on without them.  I have made the best case I can for
    their inclusion, but if people are not convinced, then maybe I was
    wrong.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  172. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-21T16:10:15Z

    Bruce Momjian <pgman@candle.pha.pa.us> writes:
    > Yes, that is true.  My idea is that they may want to create loc1 and
    > loc2 which initially point to the same location, but later may be moved.
    > For example, one tablespace for tables, another for indexes.  They may
    > initially point to the same directory, but later be split.
    
    Well, that opens up a completely different issue, which is what about
    moving tables from one tablespace to another?
    
    I think the way you appear to be implying above (shut down the server
    so that you can rearrange subdirectories by hand) is the wrong way to
    go about it.  For one thing, lots of people don't want to shut down
    their servers completely for that long, but it's difficult to avoid
    doing so if you want to move files by filesystem commands.  For another
    thing, the above approach requires guessing in advance --- maybe long
    in advance --- how you are going to want to repartition your database
    when it gets too big for your existing storage.
    
    The right way to address this problem is to invent a "move table to
    new tablespace" command.  This'd be pretty trivial to implement based
    on a file-versioning approach: the new version of the pg_class tuple
    has a new tablespace identifier in it.
    
    			regards, tom lane
    
    
  173. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-21T16:14:59Z

    > Bruce Momjian <pgman@candle.pha.pa.us> writes:
    > > Yes, that is true.  My idea is that they may want to create loc1 and
    > > loc2 which initially point to the same location, but later may be moved.
    > > For example, one tablespace for tables, another for indexes.  They may
    > > initially point to the same directory, but later be split.
    > 
    > Well, that opens up a completely different issue, which is what about
    > moving tables from one tablespace to another?
    
    Are you suggesting that doing dbname/locname is somehow harder to do
    that?  If you are, I don't understand why.
    
    The general issue of moving tables between tablespaces can be done from
    in the database.  I don't think it is reasonable to shut down the db to
    do that.  However, I can see moving tablespaces to different symlinked
    locations may require a shutdown.
    
    > 
    > I think the way you appear to be implying above (shut down the server
    > so that you can rearrange subdirectories by hand) is the wrong way to
    > go about it.  For one thing, lots of people don't want to shut down
    > their servers completely for that long, but it's difficult to avoid
    > doing so if you want to move files by filesystem commands.  For another
    > thing, the above approach requires guessing in advance --- maybe long
    > in advance --- how you are going to want to repartition your database
    > when it gets too big for your existing storage.
    > 
    > The right way to address this problem is to invent a "move table to
    > new tablespace" command.  This'd be pretty trivial to implement based
    > on a file-versioning approach: the new version of the pg_class tuple
    > has a new tablespace identifier in it.
    
    Agreed.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  174. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-21T16:17:37Z

    Bruce Momjian <pgman@candle.pha.pa.us> writes:
    >> Sorry Bruce -- I understand and am sympathetic to your position, and, at
    >> one time, I agreed with it.  But not any more.
    
    > I thought the most recent proposal was to just throw ~16 chars of the
    > file name on the end of the file name, and that should not be used for
    > anything except visibility.  WAL would not need to store that.  It could
    > just grab the file name that matches the oid/sequence number.
    
    But that's extra complexity in WAL, plus extra complexity in renaming
    tables (if you want the filename to track the logical table name, which
    I expect you would), plus extra complexity in smgr and bufmgr and other
    places.
    
    I think people are coming around to the notion that it's better to keep
    these low-level operations simple, even if we need to expend more work
    on high-level admin tools as a result.
    
    But we do need to remember to expend that effort on tools!  Let's not
    drop the ball on that, folks.
    
    			regards, tom lane
    
    
  175. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-21T16:24:44Z

    Bruce Momjian <pgman@candle.pha.pa.us> writes:
    >> Well, that opens up a completely different issue, which is what about
    >> moving tables from one tablespace to another?
    
    > Are you suggesting that doing dbname/locname is somehow harder to do
    > that?  If you are, I don't understand why.
    
    It doesn't make it harder, but it still seems pointless to have the
    extra directory level.  Bear in mind that if we go with all-OID
    filenames then you're not going to be looking at "loc1" and "loc2"
    anyway, but at "5938171" and "8583727".  It's not much of a convenience
    to the admin to see that, so we might as well save a level of directory
    lookup.
    
    > The general issue of moving tables between tablespaces can be done from
    > in the database.  I don't think it is reasonable to shut down the db to
    > do that.  However, I can see moving tablespaces to different symlinked
    > locations may require a shutdown.
    
    Only if you insist on doing it outside the database using filesystem
    tools.  Another way is to create a new tablespace in the desired new
    location, then move the tables one-by-one to that new tablespace.
    
    I suppose either one might be preferable depending on your access
    patterns --- locking your most critical tables while they're being moved
    might be as bad as a total shutdown.
    
    			regards, tom lane
    
    
  176. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-21T16:40:35Z

    > Bruce Momjian <pgman@candle.pha.pa.us> writes:
    > >> Well, that opens up a completely different issue, which is what about
    > >> moving tables from one tablespace to another?
    > 
    > > Are you suggesting that doing dbname/locname is somehow harder to do
    > > that?  If you are, I don't understand why.
    > 
    > It doesn't make it harder, but it still seems pointless to have the
    > extra directory level.  Bear in mind that if we go with all-OID
    > filenames then you're not going to be looking at "loc1" and "loc2"
    > anyway, but at "5938171" and "8583727".  It's not much of a convenience
    > to the admin to see that, so we might as well save a level of directory
    > lookup.
    
    Just seems easier to have stuff segregates into separate per-db
    directories for clarity.  Also, as directories get bigger, finding a
    specific file in there becomes harder.  Putting 10 databases all in the
    same directory seems bad in this regard.
    
    > 
    > > The general issue of moving tables between tablespaces can be done from
    > > in the database.  I don't think it is reasonable to shut down the db to
    > > do that.  However, I can see moving tablespaces to different symlinked
    > > locations may require a shutdown.
    > 
    > Only if you insist on doing it outside the database using filesystem
    > tools.  Another way is to create a new tablespace in the desired new
    > location, then move the tables one-by-one to that new tablespace.
    > 
    > I suppose either one might be preferable depending on your access
    > patterns --- locking your most critical tables while they're being moved
    > might be as bad as a total shutdown.
    
    Seems we are better having the directory be a symlink so we don't have
    symlink overhead for every file open.  Also, symlinks when removed just
    remove symlink and not the file.  I don't think we want to be using
    symlinks for tables if we can avoid it.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  177. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-21T16:46:34Z

    Bruce Momjian <pgman@candle.pha.pa.us> writes:
    >>>> Are you suggesting that doing dbname/locname is somehow harder to do
    >>>> that?  If you are, I don't understand why.
    >> 
    >> It doesn't make it harder, but it still seems pointless to have the
    >> extra directory level.  Bear in mind that if we go with all-OID
    >> filenames then you're not going to be looking at "loc1" and "loc2"
    >> anyway, but at "5938171" and "8583727".  It's not much of a convenience
    >> to the admin to see that, so we might as well save a level of directory
    >> lookup.
    
    > Just seems easier to have stuff segregates into separate per-db
    > directories for clarity.  Also, as directories get bigger, finding a
    > specific file in there becomes harder.  Putting 10 databases all in the
    > same directory seems bad in this regard.
    
    Huh?  I wasn't arguing against making a db-specific directory below the
    tablespace point.  I was arguing against making *another* directory
    below that one.
    
    > I don't think we want to be using
    > symlinks for tables if we can avoid it.
    
    Agreed, but where did that come from?  None of these proposals mentioned
    symlinks for anything but directories, AFAIR.
    
    			regards, tom lane
    
    
  178. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-21T17:05:39Z

    > > Just seems easier to have stuff segregates into separate per-db
    > > directories for clarity.  Also, as directories get bigger, finding a
    > > specific file in there becomes harder.  Putting 10 databases all in the
    > > same directory seems bad in this regard.
    > 
    > Huh?  I wasn't arguing against making a db-specific directory below the
    > tablespace point.  I was arguing against making *another* directory
    > below that one.
    
    I was suggesting:
    
    	ln -s /var/pgsql/dbname/loc data/base/dbname/loc
    
    I thought you were suggesting:
    
    	ln -s /var/pgsql/dbname data/base/dbname/loc
    
    With this system:
    
    	ln -s /var/pgsql/dbname data/base/dbname/loc1
    	ln -s /var/pgsql/dbname data/base/dbname/loc2
    
    go into the same directory, which makes it impossible to move loc1
    easily using the file system.  Seems cheap to add the extra directory.
    
    > > I don't think we want to be using
    > > symlinks for tables if we can avoid it.
    > 
    > Agreed, but where did that come from?  None of these proposals mentioned
    > symlinks for anything but directories, AFAIR.
    
    I thought you mentioned it.  Sorry.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  179. Re: Big 7.1 open items

    Peter Eisentraut <peter_e@gmx.net> — 2000-06-21T18:16:10Z

    Tom Lane writes:
    
    > I think Peter was holding out for storing purely numeric tablespace OID
    > and table version in pg_class and having a hardwired mapping to pathname
    > somewhere in smgr.  However, I think that doing it that way gains only
    > micro-efficiency compared to passing a "name" around, while using the
    > name approach buys us flexibility that's needed for at least some of
    > the variants under discussion.
    
    But that name can only be a dozen or so characters, contain no slash or
    other funny characters, etc. That's really poor. Then the alternative is
    to have an internal name and an external canonical name. Then you have two
    names to worry about. Also consider that when you store both the table
    space oid and the internal name in pg_class you create redundant data.
    What if you rename the table space? Do you leave the internal name out of
    sync? Then what good is the internal name? I'm just concerned that we are
    creating at the table space level problems similar to that we're trying to
    get rid of at the relation and database level.
    
    
    -- 
    Peter Eisentraut                  Sernanders väg 10:115
    peter_e@gmx.net                   75262 Uppsala
    http://yi.org/peter-e/            Sweden
    
    
    
  180. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-21T18:42:21Z

    [ Charset ISO-8859-1 unsupported, converting... ]
    > Tom Lane writes:
    > 
    > > I think Peter was holding out for storing purely numeric tablespace OID
    > > and table version in pg_class and having a hardwired mapping to pathname
    > > somewhere in smgr.  However, I think that doing it that way gains only
    > > micro-efficiency compared to passing a "name" around, while using the
    > > name approach buys us flexibility that's needed for at least some of
    > > the variants under discussion.
    > 
    > But that name can only be a dozen or so characters, contain no slash or
    > other funny characters, etc. That's really poor. Then the alternative is
    > to have an internal name and an external canonical name. Then you have two
    > names to worry about. Also consider that when you store both the table
    > space oid and the internal name in pg_class you create redundant data.
    > What if you rename the table space? Do you leave the internal name out of
    > sync? Then what good is the internal name? I'm just concerned that we are
    > creating at the table space level problems similar to that we're trying to
    > get rid of at the relation and database level.
    
    Agreed.  Having table spaces stored by directories named by oid just
    seems very complicated for no reason.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  181. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-21T21:39:38Z

    Bruce Momjian <pgman@candle.pha.pa.us> writes:
    >> But that name can only be a dozen or so characters, contain no slash or
    >> other funny characters, etc. That's really poor. Then the alternative is
    >> to have an internal name and an external canonical name. Then you have two
    >> names to worry about. Also consider that when you store both the table
    >> space oid and the internal name in pg_class you create redundant data.
    >> What if you rename the table space? Do you leave the internal name out of
    >> sync? Then what good is the internal name? I'm just concerned that we are
    >> creating at the table space level problems similar to that we're trying to
    >> get rid of at the relation and database level.
    
    > Agreed.  Having table spaces stored by directories named by oid just
    > seems very complicated for no reason.
    
    Huh?  He just gave you two very good reasons: avoid Unix-derived
    limitations on the naming of tablespaces (and tables), and avoid
    problems with renaming tablespaces.
    
    I'm pretty much firmly back in the "OID and nothing but" camp.
    Or perhaps I should say "OID, file version, and nothing but",
    since we still need a version number to do CLUSTER etc.
    
    			regards, tom lane
    
    
  182. RE: Big 7.1 open items

    Hiroshi Inoue <inoue@tpf.co.jp> — 2000-06-21T23:37:42Z

    > -----Original Message-----
    > From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
    >
    > No argument from me ;-).  I've been looking for compromise positions
    > but I still think that pure numeric filenames are the cleanest solution.
    >
    > There's something else that should be taken into account: for WAL, the
    > log will need to record the table file that each insert/delete/update
    > operation affects.  To do that with the smgr-token-is-a-pathname
    > approach I was suggesting yesterday, I think you have to record the
    > database name and pathname in each WAL log entry.  That's 64 bytes/log
    > entry which is a *lot*.  If we bit the bullet and restricted ourselves
    > to numeric filenames then the log would need just four numeric values:
    > 	database OID
    > 	tablespace OID
    
    I strongly object to keep tablespace OID for smgr file reference token
    though we have to keep it for another purpose of cource. I've mentioned
    many times tablespace(where to store) info should be distinguished from
    *where it is stored* info. Generally tablespace isn't sufficiently
    restrictive
    for this purpose. e.g. there was an idea about round-robin. e.g. Oracle's
    tablespace could have pluaral files... etc.
    IMHO,it is misleading to use tablespace OID as (a part of) reference token.
    
    > 	relation OID
    > 	relation version number
    > (this set of 4 values would also be an smgr file reference token).
    > 16 bytes/log entry looks much better than 64.
    >
    
    Regards.
    
    Hiroshi Inoue
    Inoue@tpf.co.jp
    
    
    
  183. Re: Big 7.1 open items

    Chris <chrisb@nimrod.itg.telstra.com.au> — 2000-06-22T00:43:20Z

    Bruce Momjian wrote:
    
    > The symlink solution where the actual symlink location is not stored
    > in the database is certainly abstract.  We store that info in the file
    > system, which is where it belongs.  We only query the symlink location
    > when we need it for database location dumping.
    
    how would that work? would pg_dump dump the tablespace locations or not?
    
    
  184. RE: Big 7.1 open items

    Hiroshi Inoue <inoue@tpf.co.jp> — 2000-06-22T01:15:01Z

    > -----Original Message-----
    > From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
    > 
    > At the moment I can recall the following opinions:
    > 
    > Pure OID filenames: Thomas, Tom, Marc, Peter E.
    > 
    > OID+relname filenames: Bruce
    >
    
    Please add my opinion to the list.
    
    Unique-id filename: Hiroshi
     (Unqiue-id is irrelevant to OID/relname).
    
    Regards.
    
    Hiroshi Inoue
    Inoue@tpf.co.jp
    
    
  185. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-22T02:29:42Z

    > Bruce Momjian wrote:
    > 
    > > The symlink solution where the actual symlink location is not stored
    > > in the database is certainly abstract.  We store that info in the file
    > > system, which is where it belongs.  We only query the symlink location
    > > when we need it for database location dumping.
    > 
    > how would that work? would pg_dump dump the tablespace locations or not?
    > 
    
    pg_dump would recreate a CREATE TABLESPACE command:
    
    	printf("CREATE TABLESPACE %s USING %s", loc, symloc);
    
    where symloc would be SELECT symloc(loc) and return the value into a
    variable that is used by pg_dump.  The backend would do the lstat() and
    return the value to the client.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  186. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-22T03:27:10Z

    "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
    > Please add my opinion to the list.
    > Unique-id filename: Hiroshi
    >  (Unqiue-id is irrelevant to OID/relname).
    
    "Unique ID" is more or less equivalent to "OID + version number",
    right?
    
    I was trying earlier to convince myself that a single unique-ID value
    would be better than OID+version for the smgr interface, because it'd
    certainly be easier to pass around.  I failed to convince myself though,
    and the thing that bothered me was this.  Suppose you are trying to
    recover a corrupted database manually, and the only information you have
    about which table is which is a somewhat out-of-date listing of OIDs
    versus table names.  (Maybe it's out of date because you got it from
    your last backup tape.)  If the files are named OID+version you're not
    going to have much trouble seeing which is which, even if some of the
    versions are higher than what was on the tape.  But if version-updated
    tables are given entirely new unique IDs, you've got no hope at all of
    telling which one corresponds to what you had in the listing.  Maybe
    you can tell by looking through the physical file contents, but
    certainly this way is more fragile from the point of view of data
    recovery.
    
    			regards, tom lane
    
    
  187. Re: Big 7.1 open items

    Chris <chrisb@nimrod.itg.telstra.com.au> — 2000-06-22T03:43:56Z

    Bruce Momjian wrote:
    > 
    > > Bruce Momjian wrote:
    > >
    > > > The symlink solution where the actual symlink location is not stored
    > > > in the database is certainly abstract.  We store that info in the file
    > > > system, which is where it belongs.  We only query the symlink location
    > > > when we need it for database location dumping.
    > >
    > > how would that work? would pg_dump dump the tablespace locations or not?
    > >
    > 
    > pg_dump would recreate a CREATE TABLESPACE command:
    > 
    >         printf("CREATE TABLESPACE %s USING %s", loc, symloc);
    > 
    > where symloc would be SELECT symloc(loc) and return the value into a
    > variable that is used by pg_dump.  The backend would do the lstat() and
    > return the value to the client.
    
    I'm wondering if pg_dump should store the location of the tablespace. If
    your machine dies, you get a new machine to re-create the database, you
    may not want the tablespace in the same spot. And text-editing a
    gigabyte file would be extremely painful.
    
    
  188. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-22T04:03:27Z

    > > where symloc would be SELECT symloc(loc) and return the value into a
    > > variable that is used by pg_dump.  The backend would do the lstat() and
    > > return the value to the client.
    > 
    > I'm wondering if pg_dump should store the location of the tablespace. If
    > your machine dies, you get a new machine to re-create the database, you
    > may not want the tablespace in the same spot. And text-editing a
    > gigabyte file would be extremely painful.
    
    If the symlink create fails in CREATE TABLESPACE, it just creates an
    ordinary directory.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  189. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-22T04:29:42Z

    "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
    > I strongly object to keep tablespace OID for smgr file reference token
    > though we have to keep it for another purpose of cource. I've mentioned
    > many times tablespace(where to store) info should be distinguished from
    > *where it is stored* info.
    
    Sure.  But this proposal assumes that we're relying on symlinks to
    carry the information about physical locations corresponding to
    tablespace OIDs.  The backend just needs to know enough to access a
    relation file at a relative pathname like
    	tablespaceOID/relationOID
    (ignoring version and segment numbers for now).  Under the hood,
    a symlink for tablespaceOID gets the work done.
    
    Certainly this is not a perfect mechanism.  But it is simple, it
    is reliable, it is portable to most of the platforms we care about
    (yeah, I know we have a Win port, but you wouldn't ever recommend
    someone to run a *serious* database on it would you?), and in general
    I think the bang-for-the-buck ratio is enormous.  I do not want to
    have to deal with explicit tablespace bookkeeping in the backend,
    but that seems like what we'd have to do in order to improve on
    symlinks.
    
    			regards, tom lane
    
    
  190. Re: Big 7.1 open items

    Don Baccus <dhogaza@pacifier.com> — 2000-06-22T05:41:22Z

    At 01:43 PM 6/22/00 +1000, Chris Bitmead wrote:
    
    >I'm wondering if pg_dump should store the location of the tablespace. If
    >your machine dies, you get a new machine to re-create the database, you
    >may not want the tablespace in the same spot. And text-editing a
    >gigabyte file would be extremely painful.
    
    So you don't dump your create tablespace statements, recognizing that on
    a new machine (due to upgrades or crashing) you might assign them to
    different directories/mount points/whatever.  That's the reason for
    wanting to hide physical allocation in tablespaces ... the rest of
    your datamodel doesn't need to know.
    
    Or you do dump your tablespaces, and knowing the paths assigned
    to various ones set up your new machine accordingly.
    
    
    
    - Don Baccus, Portland OR <dhogaza@pacifier.com>
      Nature photos, on-line guides, Pacific Northwest
      Rare Bird Alert Service and other goodies at
      http://donb.photo.net.
    
    
  191. Re: Big 7.1 open items

    Don Baccus <dhogaza@pacifier.com> — 2000-06-22T05:51:49Z

    At 12:03 AM 6/22/00 -0400, Bruce Momjian wrote:
    
    >If the symlink create fails in CREATE TABLESPACE, it just creates an
    >ordinary directory.
    
    Silent surprises - the earmark of truly professional software ...
    
    
    
    - Don Baccus, Portland OR <dhogaza@pacifier.com>
      Nature photos, on-line guides, Pacific Northwest
      Rare Bird Alert Service and other goodies at
      http://donb.photo.net.
    
    
  192. RE: Big 7.1 open items

    Hiroshi Inoue <inoue@tpf.co.jp> — 2000-06-22T05:56:07Z

    > -----Original Message-----
    > From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
    > 
    > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
    > > I strongly object to keep tablespace OID for smgr file reference token
    > > though we have to keep it for another purpose of cource. I've mentioned
    > > many times tablespace(where to store) info should be distinguished from
    > > *where it is stored* info.
    > 
    > Sure.  But this proposal assumes that we're relying on symlinks to
    > carry the information about physical locations corresponding to
    > tablespace OIDs.  The backend just needs to know enough to access a
    > relation file at a relative pathname like
    > 	tablespaceOID/relationOID
    > (ignoring version and segment numbers for now).  Under the hood,
    > a symlink for tablespaceOID gets the work done.
    >
    
    I think tablespaceOID is an easy substitution for the purpose.
    I don't like to depend on poor directory tree structure in dbms
    either.. 
     
    > Certainly this is not a perfect mechanism.  But it is simple, it
    > is reliable, it is portable to most of the platforms we care about
    > (yeah, I know we have a Win port, but you wouldn't ever recommend
    > someone to run a *serious* database on it would you?), and in general
    > I think the bang-for-the-buck ratio is enormous.  I do not want to
    > have to deal with explicit tablespace bookkeeping in the backend,
    > but that seems like what we'd have to do in order to improve on
    > symlinks.
    >
    
    I've already mentioned about it 10 times or so but unfortunately
    I see no one on my side yet. 
    OK,I've given up the discussion about it.  I don't want to waste
    my time any more.
    
    Regards.
    
    Hiroshi Inoue
    Inoue@tpf.co.jp
    
    
  193. Re: Big 7.1 open items

    Philip Warner <pjw@rhyme.com.au> — 2000-06-22T06:31:33Z

    At 23:27 21/06/00 -0400, Tom Lane wrote:
    >"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
    >> Please add my opinion to the list.
    >> Unique-id filename: Hiroshi
    >>  (Unqiue-id is irrelevant to OID/relname).
    >
    >I was trying earlier to convince myself that a single unique-ID value
    >would be better than OID+version for the smgr interface, because it'd
    >certainly be easier to pass around.  I failed to convince myself though,
    >and the thing that bothered me was this.  Suppose you are trying to
    >recover a corrupted database manually, and the only information you have
    >about which table is which is a somewhat out-of-date listing of OIDs
    >versus table names.
    
    This worries me a little; in the Dec/RDB world it is a very long time since
    database backups were done by copying the files. There is a database
    backup/restore utility which runs while the database is on-line and makes
    sure a valid snapshot is taken. Backing up storage areas (table spapces)
    can be done separately by the same utility, and again, it records enough
    information to ensure integrity. Maybe the thing to do is write a pg_backup
    utility, which in a first pass could, presumably, be synonymous with pg_dump?
    
    Am I missing something here? Is there a problem with backing up using
    'pg_dump | gzip'?
    
    
    >  (Maybe it's out of date because you got it from
    >your last backup tape.)  If the files are named OID+version you're not
    >going to have much trouble seeing which is which, even if some of the
    >versions are higher than what was on the tape.
    
    Unfortunately here you hit severe RI problems, unless you use a 'proper'
    database backup.
    
    
    >  But if version-updated
    >tables are given entirely new unique IDs, you've got no hope at all of
    >telling which one corresponds to what you had in the listing.  Maybe
    >you can tell by looking through the physical file contents, but
    >certainly this way is more fragile from the point of view of data
    >recovery.
    
    In the Dec/RDB world, one has to very occasionally restore from files (this
    only happens if multiple prior database backups and after-image journals
    are corrupt). In this case, there is a utility for examining and changing
    storage area file information. This is probably way over the top for
    PostgreSQL.
    
    [Aside: FWIW, the Dec/RDB storage area files are named by DBAs to be
    something meaningful to the DBA (eg. EMPLOYEE_ACHIVE), and can contain one
    of more tables etc. The files are never renamed or moved by the database
    without an instruction from the DBA. The 'storage manager' manages the
    datafiles internally. Usually, tables are allocated in chunks of multiples
    of some file-based buffer size, and the file grows as needed. This allows
    for disk read-ahead to be useful, while storing multiple tables in one
    file. As stated in a previous message, tables can also be split across
    storage areas]
    
    Once again, I hope I have not missed a fundamental fact...
    
    ----------------------------------------------------------------
    Philip Warner                    |     __---_____
    Albatross Consulting Pty. Ltd.   |----/       -  \
    (A.C.N. 008 659 498)             |          /(@)   ______---_
    Tel: (+61) 0500 83 82 81         |                 _________  \
    Fax: (+61) 0500 83 82 82         |                 ___________ |
    Http://www.rhyme.com.au          |                /           \|
                                     |    --________--
    PGP key available upon request,  |  /
    and from pgp5.ai.mit.edu:11371   |/
    
    
  194. Re: Big 7.1 open items

    Philip Warner <pjw@rhyme.com.au> — 2000-06-22T06:32:56Z

    At 13:43 22/06/00 +1000, Chris Bitmead wrote:
    >Bruce Momjian wrote:
    >
    >I'm wondering if pg_dump should store the location of the tablespace. If
    >your machine dies, you get a new machine to re-create the database, you
    >may not want the tablespace in the same spot. And text-editing a
    >gigabyte file would be extremely painful.
    >
    
    This is a very good point; the way Dec/RDB gets around it is to allow the
    'pg_restore' command to override storage settings when restoring a backup
    file.
    
    
    ----------------------------------------------------------------
    Philip Warner                    |     __---_____
    Albatross Consulting Pty. Ltd.   |----/       -  \
    (A.C.N. 008 659 498)             |          /(@)   ______---_
    Tel: (+61) 0500 83 82 81         |                 _________  \
    Fax: (+61) 0500 83 82 82         |                 ___________ |
    Http://www.rhyme.com.au          |                /           \|
                                     |    --________--
    PGP key available upon request,  |  /
    and from pgp5.ai.mit.edu:11371   |/
    
    
  195. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-22T07:05:00Z

    Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes:
    > I'm wondering if pg_dump should store the location of the tablespace. If
    > your machine dies, you get a new machine to re-create the database, you
    > may not want the tablespace in the same spot. And text-editing a
    > gigabyte file would be extremely painful.
    
    Might make sense to store the tablespace setup separately from the bulk
    of the data, but certainly you want some way to dump that info in a
    restorable form.
    
    I've been thinking lately that the pg_dump shove-it-all-in-one-file
    approach doesn't scale anyway.  We ought to start thinking about ways
    to make the standard dump method store schema separately from bulk
    data, for example.  That's offtopic for this thread but ought to be
    on the TODO list someplace...
    
    			regards, tom lane
    
    
  196. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-22T07:17:45Z

    "Philip J. Warner" <pjw@rhyme.com.au> writes:
    >> ... the thing that bothered me was this.  Suppose you are trying to
    >> recover a corrupted database manually, and the only information you have
    >> about which table is which is a somewhat out-of-date listing of OIDs
    >> versus table names.
    
    > This worries me a little; in the Dec/RDB world it is a very long time since
    > database backups were done by copying the files. There is a database
    > backup/restore utility which runs while the database is on-line and makes
    > sure a valid snapshot is taken. Backing up storage areas (table spapces)
    > can be done separately by the same utility, and again, it records enough
    > information to ensure integrity. Maybe the thing to do is write a pg_backup
    > utility, which in a first pass could, presumably, be synonymous with pg_dump?
    
    pg_dump already does the consistent-snapshot trick (it just has to run
    inside a single transaction).
    
    > Am I missing something here? Is there a problem with backing up using
    > 'pg_dump | gzip'?
    
    None, as long as your ambition extends no further than restoring your
    data to where it was at your last pg_dump.  I was thinking about the
    all-too-common-in-the-real-world scenario where you're hoping to recover
    some data more recent than your last backup from the fractured shards
    of your database...
    
    			regards, tom lane
    
    
  197. Re: Big 7.1 open items

    Philip Warner <pjw@rhyme.com.au> — 2000-06-22T07:50:15Z

    At 03:17 22/06/00 -0400, Tom Lane wrote:
    >
    >> This worries me a little; in the Dec/RDB world it is a very long time since
    >> database backups were done by copying the files. There is a database
    >> backup/restore utility which runs while the database is on-line and makes
    >> sure a valid snapshot is taken. Backing up storage areas (table spapces)
    >> can be done separately by the same utility, and again, it records enough
    >> information to ensure integrity. Maybe the thing to do is write a pg_backup
    >> utility, which in a first pass could, presumably, be synonymous with
    pg_dump?
    >
    >pg_dump already does the consistent-snapshot trick (it just has to run
    >inside a single transaction).
    >
    >> Am I missing something here? Is there a problem with backing up using
    >> 'pg_dump | gzip'?
    >
    >None, as long as your ambition extends no further than restoring your
    >data to where it was at your last pg_dump.  I was thinking about the
    >all-too-common-in-the-real-world scenario where you're hoping to recover
    >some data more recent than your last backup from the fractured shards
    >of your database...
    >
    
    pg_dump is a good basis for any pg_backup utility; perhaps as you indicated
    elsewhere, more carefull formatting of the dump files would make
    table-based restoration possible. In another response, I also suggested
    allowing overrides of placement information in a restore operation- the
    simplest approach would be an 'ignore-storage-parameters' flag. Does this
    sound reasonable? If so, then discussion of file-id based on OID needs not
    be too concerned about how db restoration is done.
    
    
    
    
    
    ----------------------------------------------------------------
    Philip Warner                    |     __---_____
    Albatross Consulting Pty. Ltd.   |----/       -  \
    (A.C.N. 008 659 498)             |          /(@)   ______---_
    Tel: (+61) 0500 83 82 81         |                 _________  \
    Fax: (+61) 0500 83 82 82         |                 ___________ |
    Http://www.rhyme.com.au          |                /           \|
                                     |    --________--
    PGP key available upon request,  |  /
    and from pgp5.ai.mit.edu:11371   |/
    
    
  198. RE: Big 7.1 open items

    Hiroshi Inoue <inoue@tpf.co.jp> — 2000-06-22T11:09:07Z

    > -----Original Message-----
    > From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
    > 
    > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
    > > Please add my opinion to the list.
    > > Unique-id filename: Hiroshi
    > >  (Unqiue-id is irrelevant to OID/relname).
    > 
    > "Unique ID" is more or less equivalent to "OID + version number",
    > right?
    >
    
    Hmm,no one seems to be on my side at this point also.
    OK,I change my mind as follows.
    
       OID except cygwin,unique-id on cygwin
    
    Regards.
    
    Hiroshi Inoue
    Inoue@tpf.co.jp
    
    
  199. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-22T14:35:19Z

    > At 01:43 PM 6/22/00 +1000, Chris Bitmead wrote:
    > 
    > >I'm wondering if pg_dump should store the location of the tablespace. If
    > >your machine dies, you get a new machine to re-create the database, you
    > >may not want the tablespace in the same spot. And text-editing a
    > >gigabyte file would be extremely painful.
    > 
    > So you don't dump your create tablespace statements, recognizing that on
    > a new machine (due to upgrades or crashing) you might assign them to
    > different directories/mount points/whatever.  That's the reason for
    > wanting to hide physical allocation in tablespaces ... the rest of
    > your datamodel doesn't need to know.
    > 
    > Or you do dump your tablespaces, and knowing the paths assigned
    > to various ones set up your new machine accordingly.
    
    I imagine we will have a -l flag to pg_dump to dump tablespace
    locations.  If they exist on the new machine, we use them.  If not, we
    create just directories with no symlinks.
    
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  200. Re: Big 7.1 open items

    Tom Lane <tgl@sss.pgh.pa.us> — 2000-06-22T15:27:30Z

    "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
    > OK,I change my mind as follows.
    >    OID except cygwin,unique-id on cygwin
    
    We don't really want to do that, do we?  That's a huge difference in
    behavior to have in just one port --- especially a port that none of
    the primary developers use (AFAIK anyway).  The cygwin port's normal
    state of existence will be "broken", surely, if we go that way.
    
    Besides which, OID alone doesn't give us a possibility of file
    versioning, and as I commented to Vadim I think we will want that,
    WAL or no WAL.  So it seems to me the two viable choices are
    unique-id or OID+version-number.  Either way, the file-naming behavior
    should be the same across all platforms.
    
    			regards, tom lane
    
    
  201. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-22T20:11:56Z

    > pg_dump is a good basis for any pg_backup utility; perhaps as you indicated
    > elsewhere, more carefull formatting of the dump files would make
    > table-based restoration possible. In another response, I also suggested
    > allowing overrides of placement information in a restore operation- the
    > simplest approach would be an 'ignore-storage-parameters' flag. Does this
    > sound reasonable? If so, then discussion of file-id based on OID needs not
    > be too concerned about how db restoration is done.
    
    My idea was to make dumping of tablespace locations/symlinks optional. 
    By trying to control it on the load end, you have to basically have some
    way of telling the backend to ignore the symlinks on load.  Right now,
    pg_dump just creates SQL commands and COPY commands.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  202. Re: Big 7.1 open items

    Marc G. Fournier <scrappy@hub.org> — 2000-06-22T22:05:38Z

    On Wed, 21 Jun 2000, Don Baccus wrote:
    
    > At 01:43 PM 6/22/00 +1000, Chris Bitmead wrote:
    > 
    > >I'm wondering if pg_dump should store the location of the tablespace. If
    > >your machine dies, you get a new machine to re-create the database, you
    > >may not want the tablespace in the same spot. And text-editing a
    > >gigabyte file would be extremely painful.
    > 
    > So you don't dump your create tablespace statements, recognizing that on
    > a new machine (due to upgrades or crashing) you might assign them to
    > different directories/mount points/whatever.  That's the reason for
    > wanting to hide physical allocation in tablespaces ... the rest of
    > your datamodel doesn't need to know.
    > 
    > Or you do dump your tablespaces, and knowing the paths assigned
    > to various ones set up your new machine accordingly.
    
    Or, modify pg_dump so that it auto-dumps to two files, one for schema, one
    for data.  then its easier to modify the schema on a large database if
    tablespaces change ...
    
    
    
    
  203. Re: Big 7.1 open items

    Chris <chrisb@nimrod.itg.telstra.com.au> — 2000-06-22T23:55:15Z

    The Hermit Hacker wrote:
    
    > Or, modify pg_dump so that it auto-dumps to two files, one for schema, one
    > for data.  then its easier to modify the schema on a large database if
    > tablespaces change ...
    
    That's a pretty good idea as an option. But I'd say keep the schema
    separate from the tablespace locations. And if you're going down that
    path why not create a directory automatically and dump each table into a
    separate file. On occasion I've had to restore one table by hand-editing
    the pg_dump, and that's a real pain.
    
    
  204. Re: Big 7.1 open items

    Philip Warner <pjw@rhyme.com.au> — 2000-06-23T01:52:49Z

    At 09:55 23/06/00 +1000, Chris Bitmead wrote:
    >The Hermit Hacker wrote:
    >
    >> Or, modify pg_dump so that it auto-dumps to two files, one for schema, one
    >> for data.  then its easier to modify the schema on a large database if
    >> tablespaces change ...
    >
    >That's a pretty good idea as an option. But I'd say keep the schema
    >separate from the tablespace locations. And if you're going down that
    >path why not create a directory automatically and dump each table into a
    >separate file. On occasion I've had to restore one table by hand-editing
    >the pg_dump, and that's a real pain.
    >
    
    Have a look at my message entitled:
    
    Proposal: More flexible backup/restore via pg_dump
    
    It's supposed to address these issues.
    
    
    
    ----------------------------------------------------------------
    Philip Warner                    |     __---_____
    Albatross Consulting Pty. Ltd.   |----/       -  \
    (A.C.N. 008 659 498)             |          /(@)   ______---_
    Tel: (+61) 0500 83 82 81         |                 _________  \
    Fax: (+61) 0500 83 82 82         |                 ___________ |
    Http://www.rhyme.com.au          |                /           \|
                                     |    --________--
    PGP key available upon request,  |  /
    and from pgp5.ai.mit.edu:11371   |/
    
    
  205. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-23T16:19:07Z

    [ Charset ISO-8859-1 unsupported, converting... ]
    > Bruce Momjian writes:
    > 
    > > Here is the list I have gotten of open 7.1 items:
    > 
    > > 	new location for config files
    > 
    > I'm on that task now, more or less by accident but I might as well get it
    > done. I'm reorganizing all the file name handling code for pg_hba.conf,
    > pg_indent.conf, pg_control, etc. so they have consistent accessor
    > routines. The DataDir global variable will disappear, you'll have to use
    > GetDataDir().
    > 
    
    Can we get agreement to remove our secondary password files, and make
    something that makes more sense?
    
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  206. Re: Big 7.1 open items

    Peter Eisentraut <peter_e@gmx.net> — 2000-06-23T16:20:26Z

    Bruce Momjian writes:
    
    > Here is the list I have gotten of open 7.1 items:
    
    > 	new location for config files
    
    I'm on that task now, more or less by accident but I might as well get it
    done. I'm reorganizing all the file name handling code for pg_hba.conf,
    pg_indent.conf, pg_control, etc. so they have consistent accessor
    routines. The DataDir global variable will disappear, you'll have to use
    GetDataDir().
    
    -- 
    Peter Eisentraut                  Sernanders väg 10:115
    peter_e@gmx.net                   75262 Uppsala
    http://yi.org/peter-e/            Sweden
    
    
    
  207. Re: Big 7.1 open items

    Bruce Momjian <pgman@candle.pha.pa.us> — 2000-06-25T00:59:19Z

    [ Charset ISO-8859-1 unsupported, converting... ]
    > Bruce Momjian writes:
    > 
    > > Can we get agreement to remove our secondary password files, and make
    > > something that makes more sense?
    > 
    > How about this: Normally secondary password files look like
    > 
    > username:ABS5SGh1EL6bk
    > 
    > We could add the option of making them look like
    > 
    > username:+
    > 
    > which means "look into pg_shadow". That would be fully backward
    > compatible, allows the use of alter user with password, and avoids
    > creating any extra system tables (that would need to be dumped to plain
    > text). And the coding looks very simple.
    
    Yes, perfect. In fact, how about:
    
    > username
    
    as doing that.  Any username with no colon uses pg_shadow.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      pgman@candle.pha.pa.us               |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  208. Re: Big 7.1 open items

    Peter Eisentraut <peter_e@gmx.net> — 2000-06-25T01:00:51Z

    Bruce Momjian writes:
    
    > Can we get agreement to remove our secondary password files, and make
    > something that makes more sense?
    
    How about this: Normally secondary password files look like
    
    username:ABS5SGh1EL6bk
    
    We could add the option of making them look like
    
    username:+
    
    which means "look into pg_shadow". That would be fully backward
    compatible, allows the use of alter user with password, and avoids
    creating any extra system tables (that would need to be dumped to plain
    text). And the coding looks very simple.
    
    -- 
    Peter Eisentraut                  Sernanders väg 10:115
    peter_e@gmx.net                   75262 Uppsala
    http://yi.org/peter-e/            Sweden