Thread

  1. Re: Patch for collation using ICU

    Tatsuo Ishii <t-ishii@sra.co.jp> — 2005-05-08T00:08:45Z

    > Bruce Momjian wrote:
    > > 
    > > There are two reasons for that optimization --- first, some 
    > > locale support is broken and Unicode encoding with a C locale 
    > > crashes (not an issue for ICU), and second, it is an 
    > > optimization for languages like Japanese that want to use 
    > > unicode, but don't need a locale because upper/lower means 
    > > nothing in those character sets.
    > 
    > No, upper/lower means nothing in those languages, so why would you need
    > to optimize upper/lower if they're not used??
    > And if they are, it's obviously because the text contains characters
    > from other languages (probably english) and as such they should behave
    > correctly.
    
    Yes, Japanese (and probably Chinese and Korean) languages include
    ASCII character. More precisely ASCII is part of Japanese
    encodings(LATIN1 is not, however). And we have no problem at all with
    glibc/C locale. See below("unitest" is an UNICODE database).
    
    unitest=# create table t1(t text);
    CREATE TABLE
    unitest=# \encoding EUC_JP
    unitest=# insert into t1 values('abcあいう');
    INSERT 1842628 1
    unitest=# select upper(t) from t1;
       upper   
    -----------
     ABCあいう
    (1 row)
    
    So Japanese(including ASCII)/UNICODE behavior is perfectly correct at
    this moment. So I strongly object removing that optimization.
    --
    Tatsuo Ishii