Thread

  1. Re: Turkish locale bug

    Sezai YILMAZ <sezaiy@ata.cs.hun.edu.tr> — 2001-02-20T09:24:59Z

    
    Tom Lane wrote:
    > 
    > Sezai YILMAZ <sezaiy@ata.cs.hun.edu.tr> writes:
    > > With Turkish locale it is not possible to write SQL queries in
    > > CAPITAL letters. SQL identifiers like "INSERT" and "UNION" first
    > > are downgraded to "ınsert" and "unıon". Then "ınsert" and "unıon"
    > > does not match as SQL identifier.
    > 
    > Ugh.
    > 
    > >                     for(i = 0; yytext[i]; i++)
    > >                           if (isascii((unsigned char)yytext[i]) &&
    > >                                 isupper(yytext[i]))
    > >                                 yytext[i] = tolower(yytext[i]);
    > 
    > > I think it should be better to use another thing which does what
    > > function tolower() does but only in English language. This should
    > > stay in English locale. I think this will solve the problem.
    > 
    > > yytext[i] += 32;
    > 
    > Hm.  Several problems here:
    > 
    > (1) This solution would break in other locales where isupper() may
    > return TRUE for characters other than 'A'..'Z'.
    > 
    > (2) We could fix that by gutting the isascii/isupper test as well,
    > reducing it to "yytext[i] >= 'A' && yytext[i] <= 'Z'", but I'd prefer to
    > still be able to say that "identifiers fold to lower case" works for
    > whatever the local locale thinks is upper and lower case.  It would be
    > strange if identifier folding did not agree with the SQL lower()
    > function.
    > 
    > (3) I do not like the idea of hard-wiring knowledge of ASCII encoding
    > here, even if it's unlikely that anyone would ever try to run Postgres
    > on a non-ASCII-based system.
    > 
    > I see your problem, but I'm not sure of a solution that doesn't have bad
    > side-effects elsewhere.  Ideas anyone?
    > 
    >                         regards, tom lane
    
    You are right. What about this one?
    
    ================================================================
    {identifier}    {
                        int i;
                        ScanKeyword             *keyword;
    
                       /* I think many platforms understands the 
                          following and sets locale to 7-bit ASCII 
                          character set (English) */
    
    		    setlocale(LC_ALL, "C");  
    
                        for(i = 0; yytext[i]; i++)
                              if (isascii((unsigned char)yytext[i]) &&
                                    isupper(yytext[i]))
                                    yytext[i] = tolower(yytext[i]);
    
                        /* This sets locale to default locale which 
                           user prefer to use */
    
    		    setlocale(LC_ALL, "");  
    ================================================================
    
    This works on my Linux box. But, I am not sure with other 
    platforms. What do you think about performance?
    
    regards
    -sezai