Thread

  1. Re: EUC_JP and SJIS conversion improvement

    Tatsuo Ishii <t-ishii@sra.co.jp> — 2005-06-24T02:43:41Z

    > The character-code conversion from EUC_JP to SJIS is executed by
    > converting two stages. The first stage is conversion from EUC_JP to MIC.
    > The next stage is conversion from MIC to SJIS. (Conversion from SJIS to
    > EUC_JP is also similar.)
    > 
    > It is not so efficient, because it is necessary to allocate the
    > buffer for MIC, and to execute the calculation for conversion twice.
    > 
    > In the attached patch, it enables the direct conversion of EUC_JP and
    > SJIS. Additionally, there is an improvement that reduce the call of
    > pg_mic_mblen. 
    > 
    > The effect of the patch that I measured is as follows:
    > 
    > o The Data for test was created by 'pgbench -i'.
    > 
    > o Test SQL:
    > set client_encoding to 'SJIS';
    > select * from accounts;
    > 
    > o Test results: Linux(CPU: Pentium III, Compiler option: -O2)
    >  - original: 2.920s
    >  - patched : 2.278s
    > 
    > regards,
    > 
    > ---
    > Atsushi Ogawa
    
    I have tested Atsushi's patches with PostgreSQL 8.0.3 on my Note PC
    running Linux 2.4 and got following results (database encoding is
    EUC_JP):
    
    1) without patches
    
    $ time psql -c 'set client_encoding to 'SJIS';select * from accounts;'  test >/dev/null
    
    real	0m4.926s
    user	0m1.680s
    sys	0m0.090s
    
    2) with patches
    
    $ time psql -c 'set client_encoding to 'SJIS';select * from accounts;'  test >/dev/null
    
    real	0m3.816s
    user	0m1.560s
    sys	0m0.070s
    
    3) no encoding conversions
    
    $ time psql -c 'set client_encoding to 'EUC_JP';select * from accounts;'  test >/dev/null
    
    real	0m3.220s
    user	0m1.760s
    sys	0m0.070s
    
    I got the 52% overhead decreases to 18% with the patches. This is a
    huge improvement! I will commit to current if there's no objection.
    --
    Tatsuo Ishii