Thread
-
Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)
Tom Lane <tgl@sss.pgh.pa.us> — 2011-07-02T03:01:46Z
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > I couldn't resist looking into this, and came up with the attached > patch. I tested this with: > CREATE TABLE words (word text); > COPY words FROM '/usr/share/dict/words'; > CREATE INDEX i_words ON words USING gist (word gist_trgm_ops); > And then ran "REINDEX INDEX i_words" a few times with and without the > patch. Without the patch, reindex takes about 4.7 seconds. With the > patch, 3.7 seconds. That's a worthwhile gain on its own, but becomes > even more important with Alexander's fast GiST build patch, which calls > the penalty function more. I tested this on HPPA (big-endian, picky about alignment) to verify that you had that code path right. It's right, but on that platform the speedup in the "reindex dict/words" test is only about 6%. I'm afraid that the benefit is a lot more machine-specific than we could wish. I tweaked the patch a bit further (don't pessimize the boundary case where there's exactly 4n+1 trigrams, avoid forcing trg1 into memory, improve the comment) and attach that version below. This is a little bit faster but I still wonder if it's worth the price of adding an obscure dependency on the header size. > I used the attached showsign-debug.patch to verify that the patched > makesign function produces the same results as the existing code. I > haven't tested the big-endian code, however. That didn't look terribly convincing. I added a direct validation that old and new code give the same results, a la if (ISARRKEY(newval)) { BITVEC sign; + BITVEC osign; makesign(sign, newval); + origmakesign(osign, newval); + Assert(memcmp(sign, osign, sizeof(BITVEC)) == 0); if (ISALLTRUE(origval)) *penalty = ((float) (SIGLENBIT - sizebitvec(sign))) / (float) (SIGLENBIT + 1); and ran the regression tests and the dict/words example with that. regards, tom lane