Thread

  1. Re: [HACKERS] vacuum process size

    Tom Lane <tgl@sss.pgh.pa.us> — 1999-08-24T16:20:22Z

    I have been looking some more at the vacuum-process-size issue, and
    I am having a hard time understanding why the VPageList data structure
    is the critical one.  As far as I can see, there should be at most one
    pointer in it for each disk page of the relation.  OK, you were
    vacuuming a table with something like a quarter million pages, so
    the end size of the VPageList would have been something like a megabyte,
    and given the inefficient usage of repalloc() in the original code,
    a lot more space than that would have been wasted as the list grew.
    So doubling the array size at each step is a good change.
    
    But there are a lot more tuples than pages in most relations.
    
    I see two lists with per-tuple data in vacuum.c, "vtlinks" in
    vc_scanheap and "vtmove" in vc_rpfheap, that are both being grown with
    essentially the same technique of repalloc() after every N entries.
    I'm not entirely clear on how many tuples get put into each of these
    lists, but it sure seems like in ordinary circumstances they'd be much
    bigger space hogs than any of the three VPageList lists.
    
    I recommend going to a doubling approach for each of these lists as
    well as for VPageList.
    
    There is a fourth usage of repalloc with the same method, for "ioid"
    in vc_getindices.  This only gets one entry per index on the current
    relation, so it's unlikely to be worth changing on its own merit.
    But it might be worth building a single subroutine that expands a
    growable list of entries (taking sizeof() each entry as a parameter)
    and applying it in all four places.
    
    			regards, tom lane
    
    
  2. Re: [HACKERS] vacuum process size

    Brian E Gallew <geek+@cmu.edu> — 1999-08-24T17:01:12Z

    Then <tgl@sss.pgh.pa.us> spoke up and said:
    > So doubling the array size at each step is a good change.
    > 
    > But there are a lot more tuples than pages in most relations.
    > 
    > I see two lists with per-tuple data in vacuum.c, "vtlinks" in
    > vc_scanheap and "vtmove" in vc_rpfheap, that are both being grown with
    > essentially the same technique of repalloc() after every N entries.
    > I'm not entirely clear on how many tuples get put into each of these
    > lists, but it sure seems like in ordinary circumstances they'd be much
    > bigger space hogs than any of the three VPageList lists.
    > 
    > I recommend going to a doubling approach for each of these lists as
    > well as for VPageList.
    
    Question: is there reliable information in pg_statistics (or other
    system tables) which can be used to make a reasonable estimate for the
    sizes of these structures before initial allocation?  Certainly the
    file size can be gotten from a stat (some portability issues, sparse
    file issues).
    
    
    -- 
    =====================================================================
    | JAVA must have been developed in the wilds of West Virginia.      |
    | After all, why else would it support only single inheritance??    |
    =====================================================================
    | Finger geek@cmu.edu for my public key.                            |
    =====================================================================
    
  3. RE: [HACKERS] vacuum process size

    Hiroshi Inoue <inoue@tpf.co.jp> — 1999-08-25T01:11:42Z

    > -----Original Message-----
    > From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
    > Sent: Wednesday, August 25, 1999 1:20 AM
    > To: t-ishii@sra.co.jp
    > Cc: Mike Mascari; Hiroshi Inoue; pgsql-hackers@postgreSQL.org
    > Subject: Re: [HACKERS] vacuum process size 
    > 
    > 
    > I have been looking some more at the vacuum-process-size issue, and
    > I am having a hard time understanding why the VPageList data structure
    > is the critical one.  As far as I can see, there should be at most one
    > pointer in it for each disk page of the relation.  OK, you were
    > vacuuming a table with something like a quarter million pages, so
    > the end size of the VPageList would have been something like a megabyte,
    > and given the inefficient usage of repalloc() in the original code,
    > a lot more space than that would have been wasted as the list grew.
    > So doubling the array size at each step is a good change.
    > 
    > But there are a lot more tuples than pages in most relations.
    > 
    > I see two lists with per-tuple data in vacuum.c, "vtlinks" in
    > vc_scanheap and "vtmove" in vc_rpfheap, that are both being grown with
    > essentially the same technique of repalloc() after every N entries.
    > I'm not entirely clear on how many tuples get put into each of these
    > lists, but it sure seems like in ordinary circumstances they'd be much
    > bigger space hogs than any of the three VPageList lists.
    >
    
    AFAIK,both vtlinks and vtmove are NULL if vacuum is executed
    without concurrent transactions.
    They won't be so big unless loooong concurrent transactions exist.
     
    Regards.
    
    Hiroshi Inoue
    Inoue@tpf.co.jp