Re: EUC_JP and SJIS conversion improvement

Tatsuo Ishii <t-ishii@sra.co.jp>

From: Tatsuo Ishii <t-ishii@sra.co.jp>
To: a_ogawa@hi-ho.ne.jp
Cc: pgsql-hackers@postgresql.org, pgsql-patches@postgresql.org
Date: 2005-06-24T02:43:41Z
Lists: pgsql-hackers
> The character-code conversion from EUC_JP to SJIS is executed by
> converting two stages. The first stage is conversion from EUC_JP to MIC.
> The next stage is conversion from MIC to SJIS. (Conversion from SJIS to
> EUC_JP is also similar.)
> 
> It is not so efficient, because it is necessary to allocate the
> buffer for MIC, and to execute the calculation for conversion twice.
> 
> In the attached patch, it enables the direct conversion of EUC_JP and
> SJIS. Additionally, there is an improvement that reduce the call of
> pg_mic_mblen. 
> 
> The effect of the patch that I measured is as follows:
> 
> o The Data for test was created by 'pgbench -i'.
> 
> o Test SQL:
> set client_encoding to 'SJIS';
> select * from accounts;
> 
> o Test results: Linux(CPU: Pentium III, Compiler option: -O2)
>  - original: 2.920s
>  - patched : 2.278s
> 
> regards,
> 
> ---
> Atsushi Ogawa

I have tested Atsushi's patches with PostgreSQL 8.0.3 on my Note PC
running Linux 2.4 and got following results (database encoding is
EUC_JP):

1) without patches

$ time psql -c 'set client_encoding to 'SJIS';select * from accounts;'  test >/dev/null

real	0m4.926s
user	0m1.680s
sys	0m0.090s

2) with patches

$ time psql -c 'set client_encoding to 'SJIS';select * from accounts;'  test >/dev/null

real	0m3.816s
user	0m1.560s
sys	0m0.070s

3) no encoding conversions

$ time psql -c 'set client_encoding to 'EUC_JP';select * from accounts;'  test >/dev/null

real	0m3.220s
user	0m1.760s
sys	0m0.070s

I got the 52% overhead decreases to 18% with the patches. This is a
huge improvement! I will commit to current if there's no objection.
--
Tatsuo Ishii