Thread

  1. Volunteer: Large Tuples / Tuple chaining

    Christof Petig <christof.petig@wtal.de> — 1999-12-09T22:13:51Z

    Hello,
    
    I'll donate some (read all freely available) of my spare time to
    implementing tuple
    chaining. It looks like this feature is most wanted and it would be a
    pity to hold this until post 7.0. Personally I don't need it, yet ...
    But I will definitely find a use for it once available ;-) And it looks
    like a good start for hacking on pgsql.
    
    I already dived into the depth of pgsql's page and tuple structures and
    it looks like it is possible. But before I start coding I would like to
    hear some more experienced opinions on how to implement it.
    
    Did you alread discuss technical matters about the implementation? How
    can I get in touch with it? (Simply browse the mailing list archives?)
    
    Here's a layout how I imagine the work:
    
    What is needed:
    - lay out a tuple continuation structure
    - put tuple into multiple chunks when pages are considered, reconcile
    when
      loaded from disk
      (how to continue a tuple - need a structure)
      how is a tuple (read page item) addressed? ItemPointerData
      I imagine to store a continuation address as the last bytes of the
    tuple unless it
      fits into one page.
      I need to mark large tuples (how, just one flag in tuple)
      How to tell a maximum possible size last block from a continued 
      (which carries a pointer to the next one at its end)? 
      Or don't care: make item continued and put last 6(?) bytes into a new
    block
    - note that the continued tuples are not referenced directly (vacuum?)
      mark them as used. I hope vacuum operates on a tuple basis and has no
    concept of
      pages
    - I guess that the tuple pointer points into page memory, if multiple
    pages 
      are concatenated for a tuple, these pages must not reside in memory
    but
      the full tuple's memory must be allocated (from a memory similar to
    pages)
      (shared mem?)
    - should be possible for memory only pages 
      see PageGetPageSize but od_pagesize is 16bit!
      Reuse another variable? Another type of page? (32bit od_pagesize)
      
    Very fascinated by this large beast of ancient code to explore
          Christof
    
    PS: I think the documentation on page layout is far outdated (or points
    into the future since it speaks about ItemContinuationData structures.)
    Should I update it?
    The table doesn't match actual structure components. At least I don't
    understand what it's about. The source code mentions a different page
    layout.
    
    PPS: Do not pity me, I have ten+ years of coding experience in C.
    
    PPPS: Could someone in few words tell me what an access method is (a
    tuple is an access method, log pages are another?)
    
    
    
  2. RE: [HACKERS] Volunteer: Large Tuples / Tuple chaining

    Hiroshi Inoue <inoue@tpf.co.jp> — 1999-12-10T15:33:36Z

    > -----Original Message-----
    > From: owner-pgsql-hackers@postgreSQL.org 
    > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Christof Petig
    > 
    > Hello,
    > 
    > I'll donate some (read all freely available) of my spare time to
    > implementing tuple
    > chaining. It looks like this feature is most wanted and it would be a
    > pity to hold this until post 7.0. Personally I don't need it, yet ...
    > But I will definitely find a use for it once available ;-) And it looks
    > like a good start for hacking on pgsql.
    > 
    > I already dived into the depth of pgsql's page and tuple structures and
    > it looks like it is possible. But before I start coding I would like to
    > hear some more experienced opinions on how to implement it.
    >
    
    Will you put a long tuple into a long logical page(continued multiple
    phisical(?) pages) ?
    I'm suspicious about the way that allows non-page-formatted page.
    
    Anyway it would need a big change around bufmgr/smgr etc.
    Could someone estimate the influence/danger before going forward ?
    
    Regards.
    
    Hiroshi Inoue
    Inoue@tpf.co.jp
     
    
    
    
  3. Re: [HACKERS] Volunteer: Large Tuples / Tuple chaining

    Christof Petig <christof.petig@wtal.de> — 1999-12-13T21:59:27Z

    Hiroshi Inoue wrote:
    > 
    > Will you put a long tuple into a long logical page(continued multiple
    > phisical(?) pages) ?
    > I'm suspicious about the way that allows non-page-formatted page.
    > 
    > Anyway it would need a big change around bufmgr/smgr etc.
    > Could someone estimate the influence/danger before going forward ?
    > 
    
    I planned to use as many of PostgreSQL data structures unaltered as
    possible. Storing one Tuple in multiple Items should not pose too much
    danger on bufmgr and smgr unless they access tuple internals. (I didn't
    check that yet). This would mean that on disk Items do no longer
    correspond to Tuples. (Some of them might form one tuple).
    
    I dropped the plan of Unformatted pages very soon. But the issue of
    tuple in-memory-storage remains (I don't know the internals of
    allocating/freeing, yet).
    
    Christof
    
    
    
    
  4. Re: [HACKERS] Volunteer: Large Tuples / Tuple chaining

    Bruce Momjian <pgman@candle.pha.pa.us> — 1999-12-14T01:59:38Z

    Thanks.  Seems like Jan is going to be doing this.
    
    
    > Hello,
    > 
    > I'll donate some (read all freely available) of my spare time to
    > implementing tuple
    > chaining. It looks like this feature is most wanted and it would be a
    > pity to hold this until post 7.0. Personally I don't need it, yet ...
    > But I will definitely find a use for it once available ;-) And it looks
    > like a good start for hacking on pgsql.
    > 
    > I already dived into the depth of pgsql's page and tuple structures and
    > it looks like it is possible. But before I start coding I would like to
    > hear some more experienced opinions on how to implement it.
    > 
    > Did you alread discuss technical matters about the implementation? How
    > can I get in touch with it? (Simply browse the mailing list archives?)
    > 
    > Here's a layout how I imagine the work:
    > 
    > What is needed:
    > - lay out a tuple continuation structure
    > - put tuple into multiple chunks when pages are considered, reconcile
    > when
    >   loaded from disk
    >   (how to continue a tuple - need a structure)
    >   how is a tuple (read page item) addressed? ItemPointerData
    >   I imagine to store a continuation address as the last bytes of the
    > tuple unless it
    >   fits into one page.
    >   I need to mark large tuples (how, just one flag in tuple)
    >   How to tell a maximum possible size last block from a continued 
    >   (which carries a pointer to the next one at its end)? 
    >   Or don't care: make item continued and put last 6(?) bytes into a new
    > block
    > - note that the continued tuples are not referenced directly (vacuum?)
    >   mark them as used. I hope vacuum operates on a tuple basis and has no
    > concept of
    >   pages
    > - I guess that the tuple pointer points into page memory, if multiple
    > pages 
    >   are concatenated for a tuple, these pages must not reside in memory
    > but
    >   the full tuple's memory must be allocated (from a memory similar to
    > pages)
    >   (shared mem?)
    > - should be possible for memory only pages 
    >   see PageGetPageSize but od_pagesize is 16bit!
    >   Reuse another variable? Another type of page? (32bit od_pagesize)
    >   
    > Very fascinated by this large beast of ancient code to explore
    >       Christof
    > 
    > PS: I think the documentation on page layout is far outdated (or points
    > into the future since it speaks about ItemContinuationData structures.)
    > Should I update it?
    > The table doesn't match actual structure components. At least I don't
    > understand what it's about. The source code mentions a different page
    > layout.
    > 
    > PPS: Do not pity me, I have ten+ years of coding experience in C.
    > 
    > PPPS: Could someone in few words tell me what an access method is (a
    > tuple is an access method, log pages are another?)
    > 
    > 
    > ************
    > 
    
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      maillist@candle.pha.pa.us            |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  5. RE: [HACKERS] Volunteer: Large Tuples / Tuple chaining

    Hiroshi Inoue <inoue@tpf.co.jp> — 1999-12-14T08:58:57Z

    > -----Original Message-----
    > From: christof@to.wtal.de [mailto:christof@to.wtal.de]On Behalf Of
    > Christof Petig
    > 
    > Hiroshi Inoue wrote:
    > > 
    > > Will you put a long tuple into a long logical page(continued multiple
    > > phisical(?) pages) ?
    > > I'm suspicious about the way that allows non-page-formatted page.
    > > 
    > > Anyway it would need a big change around bufmgr/smgr etc.
    > > Could someone estimate the influence/danger before going forward ?
    > > 
    > 
    > I planned to use as many of PostgreSQL data structures unaltered as
    > possible. Storing one Tuple in multiple Items should not pose too much
    > danger on bufmgr and smgr unless they access tuple internals. (I didn't
    > check that yet). This would mean that on disk Items do no longer
    > correspond to Tuples. (Some of them might form one tuple).
    >
    
    Hmm,we have discussed about LONG.
    Change by LONG is transparent to users and would resolve
    the big tuple problem mostly.
    I'm suspicious that tuple chaining is worth the work now.
    
    At least a consensus is needed before going,I think.
    Bad design would only introduce a confusion.
    
    Regards.
    
    Hiroshi Inoue
    Inoue@tpf.co.jp 
    
    
  6. Re: [HACKERS] Volunteer: Large Tuples / Tuple chaining

    Bruce Momjian <pgman@candle.pha.pa.us> — 1999-12-14T16:25:40Z

    > > I planned to use as many of PostgreSQL data structures unaltered as
    > > possible. Storing one Tuple in multiple Items should not pose too much
    > > danger on bufmgr and smgr unless they access tuple internals. (I didn't
    > > check that yet). This would mean that on disk Items do no longer
    > > correspond to Tuples. (Some of them might form one tuple).
    > >
    > 
    > Hmm,we have discussed about LONG.
    > Change by LONG is transparent to users and would resolve
    > the big tuple problem mostly.
    > I'm suspicious that tuple chaining is worth the work now.
    > 
    > At least a consensus is needed before going,I think.
    > Bad design would only introduce a confusion.
    
    Agreed.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      maillist@candle.pha.pa.us            |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  7. Re: [HACKERS] Volunteer: Large Tuples / Tuple chaining

    Jan Wieck <wieck@debis.com> — 1999-12-14T18:45:11Z

    Bruce Momjian wrote:
    
    > > > I planned to use as many of PostgreSQL data structures unaltered as
    > > > possible. Storing one Tuple in multiple Items should not pose too much
    > > > danger on bufmgr and smgr unless they access tuple internals. (I didn't
    > > > check that yet). This would mean that on disk Items do no longer
    > > > correspond to Tuples. (Some of them might form one tuple).
    > > >
    > >
    > > Hmm,we have discussed about LONG.
    > > Change by LONG is transparent to users and would resolve
    > > the big tuple problem mostly.
    > > I'm suspicious that tuple chaining is worth the work now.
    > >
    > > At least a consensus is needed before going,I think.
    > > Bad design would only introduce a confusion.
    >
    > Agreed.
    
    Me too.
    
        I  think that only a combination of LONG attributes and split
        tuples will be a complete solution.
    
        What I'm worried about is to make the  segments  of  a  large
        tuple  specialized  things in the main table. The reliability
        of Vacuum is one of the most important things for any  system
        in production. While the general operation of vacuum seems to
        be well known, it's requirements for atomicy of some  actions
        appears  to  be  lesser. The more chunks a tuple consists of,
        the more possible an abort of vacuum in the middle  of  their
        moving  becomes.  So keeping the links of chained tuples fail
        safe intact is IMHO an issue, a little underestimated in this
        discussion.
    
        Maybe we can split tuples in another way, must think about it
        for another hour - 'til later.
    
    
    Jan
    
    --
    
    #======================================================================#
    # It's easier to get forgiveness for being wrong than for being right. #
    # Let's break this rule - forgive me.                                  #
    #========================================= wieck@debis.com (Jan Wieck) #
    
    
    
    
  8. RE: [HACKERS] Volunteer: Large Tuples / Tuple chaining

    Hiroshi Inoue <inoue@tpf.co.jp> — 1999-12-15T02:43:51Z

    > -----Original Message-----
    > From: Jan Wieck [mailto:wieck@debis.com]
    > Sent: Wednesday, December 15, 1999 3:45 AM
    > 
    > Bruce Momjian wrote:
    > 
    > > > > I planned to use as many of PostgreSQL data structures unaltered as
    > > > > possible. Storing one Tuple in multiple Items should not 
    > pose too much
    > > > > danger on bufmgr and smgr unless they access tuple 
    > internals. (I didn't
    > > > > check that yet). This would mean that on disk Items do no longer
    > > > > correspond to Tuples. (Some of them might form one tuple).
    > > > >
    > > >
    > > > Hmm,we have discussed about LONG.
    > > > Change by LONG is transparent to users and would resolve
    > > > the big tuple problem mostly.
    > > > I'm suspicious that tuple chaining is worth the work now.
    > > >
    > > > At least a consensus is needed before going,I think.
    > > > Bad design would only introduce a confusion.
    > >
    > > Agreed.
    > 
    > Me too.
    > 
    >     I  think that only a combination of LONG attributes and split
    >     tuples will be a complete solution.
    > 
    >     What I'm worried about is to make the  segments  of  a  large
    >     tuple  specialized  things in the main table. The reliability
    >     of Vacuum is one of the most important things for any  system
    >     in production. While the general operation of vacuum seems to
    >     be well known, it's requirements for atomicy of some  actions
    >     appears  to  be  lesser. The more chunks a tuple consists of,
    >     the more possible an abort of vacuum in the middle  of  their
    >     moving  becomes.  So keeping the links of chained tuples fail
    >     safe intact is IMHO an issue, a little underestimated in this
    >     discussion.
    >
    
    There exists another related problem.
    Vacuum could hardly move big tuples if some tuples of each page
    live long. Though we have to move a long tuple at once,there won't
    be so many clean pages.
    
    Probably vacuum couldn't move even a 8K tuple in some cases.
    The problem is already there,more or less.
    But it seems very difficult to solve this problem without giving up
    to preserve consistency in case of a crash. 
    
    Regards.
     
    Hiroshi Inoue
    Inoue@tpf.co.jp
    
    
  9. Re: [HACKERS] Volunteer: Large Tuples / Tuple chaining

    Bruce Momjian <pgman@candle.pha.pa.us> — 1999-12-15T02:52:14Z

    Remember, chaining tuples had all sorts of performance, vacuum, code
    handling, and UPDATE problems.  They buy us very little, and almost
    nothing if we have LONG tables.
    
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      maillist@candle.pha.pa.us            |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
  10. Re: [HACKERS] Volunteer: Large Tuples / Tuple chaining

    Christof Petig <christof.petig@wtal.de> — 1999-12-15T08:27:39Z

    Bruce Momjian wrote:
    > 
    > Remember, chaining tuples had all sorts of performance, vacuum, code
    > handling, and UPDATE problems.  They buy us very little, and almost
    > nothing if we have LONG tables.
    > 
    
    I had already contacted Jan in private Email. Since we share country,
    native language and time zone, this is even the most comfortable way.
    
    I agree with the concerns you mailed and will (most likely) start
    helping Jan to implement LONG. As I had seen your LONG discussion
    _after_ my original post, this had been a strange coincidence. But I had
    been following it with interest.
    
         Christof