Thread

  1. Re: Patch for collation using ICU

    Tatsuo Ishii <t-ishii@sra.co.jp> — 2005-05-08T05:41:17Z

    > Alvaro Herrera wrote:
    > > Sent: Sunday, May 08, 2005 2:49 PM
    > > To: John Hansen
    > > Cc: Tatsuo Ishii; pgman@candle.pha.pa.us; 
    > > girgen@pingpong.net; pgsql-hackers@postgresql.org
    > > Subject: Re: [HACKERS] Patch for collation using ICU
    > > 
    > > On Sun, May 08, 2005 at 02:07:29PM +1000, John Hansen wrote:
    > > > Tatsuo Ishii wrote:
    > > 
    > > > > So Japanese(including ASCII)/UNICODE behavior is 
    > > perfectly correct 
    > > > > at this moment.
    > > > 
    > > > Right, so you _never_ use accented ascii characters in Japanese? 
    > > > (like è for example, whose uppercase is È)
    > > 
    > > That isn't ASCII.  It's latin1 or some other ASCII extension.
    > 
    > Point taken...
    > But...
    > 
    > If you want EUC_JP (Japanese + ASCII) then use that as your backend encoding, not UTF-8 (unicode).
    > UTF-8 encoded databases are very useful for representing multiple languages in the same database,
    > but this usefulness vanishes if functions like upper/lower doesn't work correctly.
    
    I'm just curious if Germany/French/Spanish mixed text can be sorted
    correctly. I think these languages need their own locales even with
    UNICODE/ICU.
    
    > So optimizing for 3 languages breaks more than a hundred, that's doesn't seem fair!
    
    Why don't you add a GUC variable or some such to control the
    upper/lower behavior?
    --
    Tatsuo Ishii