Thread

  1. Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)

    Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-06-24T16:51:37Z

    On 24.06.2011 11:40, Heikki Linnakangas wrote:
    > On 21.06.2011 13:08, Alexander Korotkov wrote:
    >> I believe it's due to relatively expensive penalty
    >> method in that opclass.
    >
    > Hmm, I wonder if it could be optimized. I did a quick test, creating a
    > gist_trgm_ops index on a list of English words from
    > /usr/share/dict/words. oprofile shows that with the patch, 60% of the
    > CPU time is spent in the makesign() function.
    
    I couldn't resist looking into this, and came up with the attached 
    patch. I tested this with:
    
    CREATE TABLE words (word text);
    COPY words FROM '/usr/share/dict/words';
    CREATE INDEX i_words ON words USING gist (word gist_trgm_ops);
    
    And then ran "REINDEX INDEX i_words" a few times with and without the 
    patch. Without the patch, reindex takes about 4.7 seconds. With the 
    patch, 3.7 seconds. That's a worthwhile gain on its own, but becomes 
    even more important with Alexander's fast GiST build patch, which calls 
    the penalty function more.
    
    I used the attached showsign-debug.patch to verify that the patched 
    makesign function produces the same results as the existing code. I 
    haven't tested the big-endian code, however.
    
    -- 
       Heikki Linnakangas
       EnterpriseDB   http://www.enterprisedb.com