Thread

  1. Re: Small patch for GiST: move childoffnum to child

    Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-07-13T18:35:44Z

    On 30.06.2011 07:50, Jeff Janes wrote:
    > My concern is that I am unable to prove to myself simply by reading
    > the code that the 24 line chunk deleted from gistFindPath (near ***
    > 919,947 ****) are no longer needed.  My familiarity with the gist code
    > is low enough that it is not surprising that I cannot prove this to
    > myself from first principles.  I have no reason to believe it is not
    > correct, it is just that I can't convince myself that it is correct.
    
    This is the piece of code we're talking about:
    
    > ***************
    > *** 919,947 **** gistFindPath(Relation r, BlockNumber child)
    >   			blkno = ItemPointerGetBlockNumber(&(idxtuple->t_tid));
    >   			if (blkno == child)
    >   			{
    > - 				OffsetNumber poff = InvalidOffsetNumber;
    > -
    > - 				/* make childs links */
    > - 				ptr = top;
    > - 				while (ptr->parent)
    > - 				{
    > - 					/* move childoffnum.. */
    > - 					if (ptr == top)
    > - 					{
    > - 						/* first iteration */
    > - 						poff = ptr->parent->childoffnum;
    > - 						ptr->parent->childoffnum = ptr->childoffnum;
    > - 					}
    > - 					else
    > - 					{
    > - 						OffsetNumber tmp = ptr->parent->childoffnum;
    > -
    > - 						ptr->parent->childoffnum = poff;
    > - 						poff = tmp;
    > - 					}
    > - 					ptr = ptr->parent;
    > - 				}
    > - 				top->childoffnum = i;
    >   				UnlockReleaseBuffer(buffer);
    >   				return top;
    >   			}
    
    Now that I look closer at the patch, I think it's in fact incorrect. The 
    removed code used to store the offset of the downlink in the direct 
    parent of the child that was searched for, in top->childoffnum. That's 
    the last removed line: top->childoffnum = i. With the patch, that is 
    stored nowhere. gistFindPath() needs to return it somehow, so that it 
    gets updated in the stack returned by gistFindCorrectParent.
    
    Attached is a modified patch that fixes that. I couldn't resist some 
    cosmetic changes along the way, sorry about that. I made gistFindPath 
    use a regular List instead of carrying the extra 'next' field in 
    GISTInsertStack. That seems much cleaner as that field is only needed 
    for local storage in the highly unlikely case that gistFindPath() is 
    called. I also made the error cases use elog() instead of assertions.
    
    > So I tried provoking situations where this surrounding code section
    > would get executed, both patched and unpatched.  I have been unable to
    > do so--apparently this code is for an incredibly obscure situation
    > which I can't induce at will.
    
    You'll need a concurrent split of the root page, while you're splitting 
    a page at some lower level. For example:
    
         R
    L1     L2
    
    R is the root page, and L1 and L2 are leaf pages. Now, imagine that you 
    insert a tuple into L2, causing it to split into pages L2* and L3. Your 
    insertion stack looks like R->L2. Before you have a chance to insert the 
    downlink for L3 into R, someone else splits the root page:
    
           R
       I1         I2
    L1 L3      L2* L3
    
    The new parent of L2 is the new internal page I2, but 
    gistFindCorrectParent() will never visit that page. The insertion stack 
    is R->L2, so gistFindCorrectParent() will only search R, and won't find 
    the downlink for L2 there anymore.
    
    The only practical way to test that is to use a debugger or add some 
    debugging statements to the code. Here's what I did:
    
    1. Create a test table and gist index:
    
    CREATE TABLE gisttest (p point);
    CREATE INDEX i_gisttest ON gisttest USING gist (p)
    
    2. Insert some test data. Use two different values so that you can 
    conveniently later insert into distinct branches of the gist tree.
    
    INSERT INTO gisttest SELECT point(1,1) FROM generate_series(1,1000); 
    INSERT INTO gisttest SELECT point(10,10) FROM generate_series(1,1000);
    
    3. Attach a debugger to the backend process, and create a couple of 
    breakpoints:
    
    (gdb) break gistSplit
    Breakpoint 1 at 0x46ace1: file gist.c, line 1295.
    (gdb) break gistFindPath
    Breakpoint 2 at 0x469740: file gist.c, line 884.
    (gdb) cont
    Continuing.
    
    4. Insert some more tuples to the table using the debugged backend:
    
    INSERT INTO gisttest SELECT point(1,1) FROM generate_series(1,1000);
    
    This hits the breakpoint at gistSplit:
    
    Breakpoint 1, gistSplit (r=0x7f84f4bad328, page=0x7f84f1b8b180 "",
         itup=0x2032118, len=186, giststate=0x7fff8615e9e0) at gist.c:1295
    1295		SplitedPageLayout *res = NULL;
    
    5. The backend is now in the middle of splitting a leaf page. Now we 
    need to make its GISTInsertStack obsolete by concurrently splitting the 
    root page. Open another psql session, and insert more data elsewhere in 
    the gist index, to cause the root to split:
    
    postgres=# INSERT INTO gisttest SELECT point(10,10) FROM 
    generate_series(1,100000);
    INSERT 0 100000
    
    6. You can now let the first backend continue. It will hit the 
    breakpoint in gistFindPath():
    
    (gdb) cont
    Continuing.
    
    Breakpoint 2, gistFindPath (r=0x7f84f4bad328, child=6,
         newchildoffnum=0x20320d8) at gist.c:884
    884		top = (GISTInsertStack *) palloc0(sizeof(GISTInsertStack));
    
    
    
    BTW, the b-tree code deals with this scenario slightly differently. 
    Splitting the root in b-tree splits the root page like any other page, 
    and creates a new root page on a different block, while in GiST the root 
    page is always at block number 0, and root split moves all the existing 
    tuples on the root page to different blocks. Correspondingly, the code 
    in b-tree to re-find the parent of a page also works slightly 
    differently. The b-tree re-find algorithm just moves right until it 
    finds the new parent. It will always find the parent, because it can 
    only have moved right at the same level it used to be. However, in 
    b-tree it's possible that the page that used to be the root page is not 
    the root anymore. In that case the b-tree code does something similar to 
    gistFindPath(), and starts scanning from the leftmost page at the right 
    level. Search for "concurrent ROOT page split" in nbtinsert.c to find 
    that code.
    
    -- 
       Heikki Linnakangas
       EnterpriseDB   http://www.enterprisedb.com