Thread

  1. Turkish locale bug

    Sezai YILMAZ <sezaiy@ata.cs.hun.edu.tr> — 2001-02-19T11:50:05Z

    Your name               : Sezai YILMAZ      
    Your email address      : sezaiy@ata.cs.hun.edu.tr
    
    
    System Configuration
    ---------------------
      Architecture (example: Intel Pentium)         : AMD Duron
    
      Operating System (example: Linux 2.0.26 ELF)  : Linux 2.2.17 ELF
    
      PostgreSQL version (example: PostgreSQL-7.0):   PostgreSQL-7.0.3
    
      Compiler used (example:  gcc 2.8.0)           : gcc 2.95.3
    
    
    Please enter a FULL description of your problem:
    ------------------------------------------------
    
    Locale support for Turkish causes a problem. The problem is with 
    character 'I' (capital of 9.th character of English alphabet). 
    When character 'I' is given to tolower() function and locale is 
    set to "tr_TR", it downgrades to special Turkish character 'ı' 
    (its is called "y acute"), not 'i'. This causes the following 
    problem:
    
    With Turkish locale it is not possible to write SQL queries in 
    CAPITAL letters. SQL identifiers like "INSERT" and "UNION" first 
    are downgraded to "ınsert" and "unıon". Then "ınsert" and "unıon" 
    does not match as SQL identifier.
    
    
    
    Please describe a way to repeat the problem.   Please try to provide a
    concise reproducible example, if at all possible: 
    ----------------------------------------------------------------------
    
    When you set "LC_ALL" environment variable to "tr_TR" this 
    problem happens.
    
    
    
    If you know how this problem might be fixed, list the solution below:
    ---------------------------------------------------------------------
    
    In file:
    
    [postgresqlsourcepath]/src/backend/parser/scan.l
    
    This block uses function tolower() which is affected by locale 
    settings of the shell which runs postmaster.
    
    ================================================================
    {identifier}    {
                        int i;
                        ScanKeyword             *keyword;
    
                        for(i = 0; yytext[i]; i++)
                              if (isascii((unsigned char)yytext[i]) &&
                                    isupper(yytext[i]))
                                    yytext[i] = tolower(yytext[i]);
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ================================================================
    
    I think it should be better to use another thing which does what 
    function tolower() does but only in English language. This should
    stay in English locale. I think this will solve the problem.
    
    'a' - 'A' = 32
    
    So we can use the following line instead of the last line marked 
    in above block.
    
    yytext[i] += 32;