Thread

  1. Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

    Hannu Krosing <hannu@trust.ee> — 1999-06-09T18:32:03Z

    Tom Lane wrote:
    > 
    > Bruce Momjian <maillist@candle.pha.pa.us> writes:
    > > This certainly explains it.  With locale enabled, LIKE does not use
    > > indexes because we can't figure out how to do the indexing trick with
    > > non-ASCII character sets because we can't figure out the maximum
    > > character value for a particular encoding.
    > 
    > We don't actually need the *maximum* character value, what we need is
    > to be able to generate a *slightly larger* character value.
    > 
    > For example, what the parser is doing now:
    >         fld LIKE 'abc%' ==> fld <= 'abc\377'
    > is not even really right in ASCII locale, because it will reject a
    > data value like 'abc\377x'.
    > 
    > I think what we really want is to generate the "next value of the
    > same length" and use a < comparison.  In ASCII locale this means
    >         fld LIKE 'abc%' ==> fld < 'abd'
    > which is reliable regardless of what comes after abc in the data.
    > The trick is to figure out a "next" value without assuming a lot
    > about the local character set and collation sequence.
    
    in single-byte locales it should be easy:
    
    1. sort a char[256] array from 0-255 using the current locale settings,
     do it once, either at startup or when first needed.
    
    2. use binary search on that array to locate the last char before %
     in this sorted array:
     if (it is not the last char in sorted array)
     then (replace that char with the one at index+1)
     else (
       if (it is not the first char in like string)
       then (discard the last char and goto 2.
       else (don't do the end restriction)
     )
    
    some locales where the string is already sorted may use special 
    treatment (ASCII, CYRILLIC) 
    
    > But I am worried whether this trick will work in multibyte locales ---
    > incrementing the last byte might generate an invalid character sequence
    > and produce unpredictable results from strcmp.  So we need some help
    > from someone who knows a lot about collation orders and multibyte
    > character representations.
    
    for double-byte locales something similar should work, but getting
    the initial array is probably tricky
    
    ----------------
    Hannu