Thread

  1. Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)

    Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-07-02T18:37:28Z

    On 02.07.2011 21:07, Tom Lane wrote:
    > I wrote:
    >> I tweaked the patch a bit further (don't pessimize the boundary case
    >> where there's exactly 4n+1 trigrams, avoid forcing trg1 into memory,
    >> improve the comment) and attach that version below.  This is a little
    >> bit faster but I still wonder if it's worth the price of adding an
    >> obscure dependency on the header size.
    >
    > It occurred to me that we could very easily remove that objection by
    > making the code dynamically detect when it's reached a suitably aligned
    > trigram.  The attached version of the patch does it that way.  It seems
    > to be a percent or so slower than my previous version, but I think that
    > making the function robust against header changes is probably well worth
    > that price.
    
    Ah, that opens the door to do something I considered earlier but 
    rejected because of alignment: instead of three 32-bit word fetches, we 
    could fetch one 64-bit word and 32-bit word. Might shave a few more 
    cycles...
    
    Meanwhile, I experimented with optimizing the other part of the loop: 
    the HASH() macros for setting the right bits in the signature. With the 
    default compile-time settings, the signature array is 95 bits. 
    Currently, it's stored in a BITVEC, which is a byte array, but we could 
    store it in two 64-bit integers, which makes it possible to write SETBIT 
    differently. I experimented with a few approaches, first was essentially 
    this:
    
    + /* Set the nth bit in the signature, in s1 or s2 */
    + #define HASH_S(h) \
    + 	do {									\
    + 		unsigned int n = HASHVAL(h);		\
    + 		if (((uint64) (n)) < 64)			\
    + 			s1 |= (uint64) 1<<(n);			\
    + 		else								\
    + 			s2 |= (uint64) 1<<((n) - 64);	\
    + 	} while(0)
    
    That was a bit faster on my x64 laptop, but slightly slower on my ia64 
    HP-UX box. My second try was to use lookup tables, patch attached. That 
    was yet faster on x64, and a small win on the ia64 box too. I'm not sure 
    it's worth the added code complexity, though.
    
    Here's a summary of the timings I got with different versions:
    
    ia64 HP-UX (anole):
    
    unpatched: 11194.038 ms
    fast_makesign-tom: 10064.980 ms
    fast_makesign-2int: 10649.726 ms
    fast_makesign-tbl: 9951.547 ms
    
    x64 laptop:
    
    unpatched: 4595,209 ms
    fast_makesign-tom: 3346,548 ms
    fast_makesign-2int: 3102,874 ms
    fast_makesign-tbl: 2997,854 ms
    
    I used the same "REINDEX INDEX i_words" test I used earlier, repeated 
    each run a couple of times, and took the lowest number. 
    fast_makesign-tom is the first patch you posted, I haven't tested your 
    latest one. fast_makesign-2int is with the HASH_S() macro above, and 
    has_makesign-tbl is with the attached patch.
    
    PS. in case you wonder why the HP-UX box is so much slower than my 
    laptop; this box isn't really meant for performance testing. It's just 
    something I happen to have access to, I think it's a virtual machine of 
    some sort. The numbers are very repeatable, however.
    
    -- 
       Heikki Linnakangas
       EnterpriseDB   http://www.enterprisedb.com