Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem
Hannu Krosing <hannu@trust.ee>
From: Hannu Krosing <hannu@trust.ee>
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Bruce Momjian <maillist@candle.pha.pa.us>, Daniel Kalchev <daniel@digsys.bg>, Hiroshi Inoue <Inoue@tpf.co.jp>, pgsql-hackers@postgreSQL.org
Date: 1999-06-09T18:32:03Z
Lists: pgsql-hackers
Tom Lane wrote: > > Bruce Momjian <maillist@candle.pha.pa.us> writes: > > This certainly explains it. With locale enabled, LIKE does not use > > indexes because we can't figure out how to do the indexing trick with > > non-ASCII character sets because we can't figure out the maximum > > character value for a particular encoding. > > We don't actually need the *maximum* character value, what we need is > to be able to generate a *slightly larger* character value. > > For example, what the parser is doing now: > fld LIKE 'abc%' ==> fld <= 'abc\377' > is not even really right in ASCII locale, because it will reject a > data value like 'abc\377x'. > > I think what we really want is to generate the "next value of the > same length" and use a < comparison. In ASCII locale this means > fld LIKE 'abc%' ==> fld < 'abd' > which is reliable regardless of what comes after abc in the data. > The trick is to figure out a "next" value without assuming a lot > about the local character set and collation sequence. in single-byte locales it should be easy: 1. sort a char[256] array from 0-255 using the current locale settings, do it once, either at startup or when first needed. 2. use binary search on that array to locate the last char before % in this sorted array: if (it is not the last char in sorted array) then (replace that char with the one at index+1) else ( if (it is not the first char in like string) then (discard the last char and goto 2. else (don't do the end restriction) ) some locales where the string is already sorted may use special treatment (ASCII, CYRILLIC) > But I am worried whether this trick will work in multibyte locales --- > incrementing the last byte might generate an invalid character sequence > and produce unpredictable results from strcmp. So we need some help > from someone who knows a lot about collation orders and multibyte > character representations. for double-byte locales something similar should work, but getting the initial array is probably tricky ---------------- Hannu