Thread

  1. Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)

    Tom Lane <tgl@sss.pgh.pa.us> — 2011-07-02T03:01:46Z

    Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
    > I couldn't resist looking into this, and came up with the attached 
    > patch. I tested this with:
    
    > CREATE TABLE words (word text);
    > COPY words FROM '/usr/share/dict/words';
    > CREATE INDEX i_words ON words USING gist (word gist_trgm_ops);
    
    > And then ran "REINDEX INDEX i_words" a few times with and without the 
    > patch. Without the patch, reindex takes about 4.7 seconds. With the 
    > patch, 3.7 seconds. That's a worthwhile gain on its own, but becomes 
    > even more important with Alexander's fast GiST build patch, which calls 
    > the penalty function more.
    
    I tested this on HPPA (big-endian, picky about alignment) to verify that
    you had that code path right.  It's right, but on that platform the
    speedup in the "reindex dict/words" test is only about 6%.  I'm afraid
    that the benefit is a lot more machine-specific than we could wish.
    
    I tweaked the patch a bit further (don't pessimize the boundary case
    where there's exactly 4n+1 trigrams, avoid forcing trg1 into memory,
    improve the comment) and attach that version below.  This is a little
    bit faster but I still wonder if it's worth the price of adding an
    obscure dependency on the header size.
    
    > I used the attached showsign-debug.patch to verify that the patched 
    > makesign function produces the same results as the existing code. I 
    > haven't tested the big-endian code, however.
    
    That didn't look terribly convincing.  I added a direct validation that
    old and new code give the same results, a la
    
      	if (ISARRKEY(newval))
      	{
      		BITVEC		sign;
    + 		BITVEC		osign;
      
      		makesign(sign, newval);
    + 		origmakesign(osign, newval);
    + 		Assert(memcmp(sign, osign, sizeof(BITVEC)) == 0);
      
      		if (ISALLTRUE(origval))
      			*penalty = ((float) (SIGLENBIT - sizebitvec(sign))) / (float) (SIGLENBIT + 1);
    
    and ran the regression tests and the dict/words example with that.
    
    			regards, tom lane