Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)

Tom Lane <tgl@sss.pgh.pa.us>

From: Tom Lane <tgl@sss.pgh.pa.us>
To: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>, Alexander Korotkov <aekorotkov@gmail.com>
Date: 2011-07-02T03:01:46Z
Lists: pgsql-hackers

Attachments

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> I couldn't resist looking into this, and came up with the attached 
> patch. I tested this with:

> CREATE TABLE words (word text);
> COPY words FROM '/usr/share/dict/words';
> CREATE INDEX i_words ON words USING gist (word gist_trgm_ops);

> And then ran "REINDEX INDEX i_words" a few times with and without the 
> patch. Without the patch, reindex takes about 4.7 seconds. With the 
> patch, 3.7 seconds. That's a worthwhile gain on its own, but becomes 
> even more important with Alexander's fast GiST build patch, which calls 
> the penalty function more.

I tested this on HPPA (big-endian, picky about alignment) to verify that
you had that code path right.  It's right, but on that platform the
speedup in the "reindex dict/words" test is only about 6%.  I'm afraid
that the benefit is a lot more machine-specific than we could wish.

I tweaked the patch a bit further (don't pessimize the boundary case
where there's exactly 4n+1 trigrams, avoid forcing trg1 into memory,
improve the comment) and attach that version below.  This is a little
bit faster but I still wonder if it's worth the price of adding an
obscure dependency on the header size.

> I used the attached showsign-debug.patch to verify that the patched 
> makesign function produces the same results as the existing code. I 
> haven't tested the big-endian code, however.

That didn't look terribly convincing.  I added a direct validation that
old and new code give the same results, a la

  	if (ISARRKEY(newval))
  	{
  		BITVEC		sign;
+ 		BITVEC		osign;
  
  		makesign(sign, newval);
+ 		origmakesign(osign, newval);
+ 		Assert(memcmp(sign, osign, sizeof(BITVEC)) == 0);
  
  		if (ISALLTRUE(origval))
  			*penalty = ((float) (SIGLENBIT - sizebitvec(sign))) / (float) (SIGLENBIT + 1);

and ran the regression tests and the dict/words example with that.

			regards, tom lane