Thread
-
Re: [HACKERS] vacuum process size
Tom Lane <tgl@sss.pgh.pa.us> — 1999-08-24T16:20:22Z
I have been looking some more at the vacuum-process-size issue, and I am having a hard time understanding why the VPageList data structure is the critical one. As far as I can see, there should be at most one pointer in it for each disk page of the relation. OK, you were vacuuming a table with something like a quarter million pages, so the end size of the VPageList would have been something like a megabyte, and given the inefficient usage of repalloc() in the original code, a lot more space than that would have been wasted as the list grew. So doubling the array size at each step is a good change. But there are a lot more tuples than pages in most relations. I see two lists with per-tuple data in vacuum.c, "vtlinks" in vc_scanheap and "vtmove" in vc_rpfheap, that are both being grown with essentially the same technique of repalloc() after every N entries. I'm not entirely clear on how many tuples get put into each of these lists, but it sure seems like in ordinary circumstances they'd be much bigger space hogs than any of the three VPageList lists. I recommend going to a doubling approach for each of these lists as well as for VPageList. There is a fourth usage of repalloc with the same method, for "ioid" in vc_getindices. This only gets one entry per index on the current relation, so it's unlikely to be worth changing on its own merit. But it might be worth building a single subroutine that expands a growable list of entries (taking sizeof() each entry as a parameter) and applying it in all four places. regards, tom lane
-
Re: [HACKERS] vacuum process size
Brian E Gallew <geek+@cmu.edu> — 1999-08-24T17:01:12Z
Then <tgl@sss.pgh.pa.us> spoke up and said: > So doubling the array size at each step is a good change. > > But there are a lot more tuples than pages in most relations. > > I see two lists with per-tuple data in vacuum.c, "vtlinks" in > vc_scanheap and "vtmove" in vc_rpfheap, that are both being grown with > essentially the same technique of repalloc() after every N entries. > I'm not entirely clear on how many tuples get put into each of these > lists, but it sure seems like in ordinary circumstances they'd be much > bigger space hogs than any of the three VPageList lists. > > I recommend going to a doubling approach for each of these lists as > well as for VPageList. Question: is there reliable information in pg_statistics (or other system tables) which can be used to make a reasonable estimate for the sizes of these structures before initial allocation? Certainly the file size can be gotten from a stat (some portability issues, sparse file issues). -- ===================================================================== | JAVA must have been developed in the wilds of West Virginia. | | After all, why else would it support only single inheritance?? | ===================================================================== | Finger geek@cmu.edu for my public key. | =====================================================================
-
RE: [HACKERS] vacuum process size
Hiroshi Inoue <inoue@tpf.co.jp> — 1999-08-25T01:11:42Z
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Wednesday, August 25, 1999 1:20 AM > To: t-ishii@sra.co.jp > Cc: Mike Mascari; Hiroshi Inoue; pgsql-hackers@postgreSQL.org > Subject: Re: [HACKERS] vacuum process size > > > I have been looking some more at the vacuum-process-size issue, and > I am having a hard time understanding why the VPageList data structure > is the critical one. As far as I can see, there should be at most one > pointer in it for each disk page of the relation. OK, you were > vacuuming a table with something like a quarter million pages, so > the end size of the VPageList would have been something like a megabyte, > and given the inefficient usage of repalloc() in the original code, > a lot more space than that would have been wasted as the list grew. > So doubling the array size at each step is a good change. > > But there are a lot more tuples than pages in most relations. > > I see two lists with per-tuple data in vacuum.c, "vtlinks" in > vc_scanheap and "vtmove" in vc_rpfheap, that are both being grown with > essentially the same technique of repalloc() after every N entries. > I'm not entirely clear on how many tuples get put into each of these > lists, but it sure seems like in ordinary circumstances they'd be much > bigger space hogs than any of the three VPageList lists. > AFAIK,both vtlinks and vtmove are NULL if vacuum is executed without concurrent transactions. They won't be so big unless loooong concurrent transactions exist. Regards. Hiroshi Inoue Inoue@tpf.co.jp