Re: Remaining dependency on setlocale()
Jeff Davis <pgsql@j-davis.com>
From: Jeff Davis <pgsql@j-davis.com>
To: Peter Eisentraut <peter@eisentraut.org>, Chao Li <li.evan.chao@gmail.com>
Cc: Thomas Munro <thomas.munro@gmail.com>, Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgresql.org
Date: 2025-12-23T20:09:08Z
Lists: pgsql-hackers
Attachments
- v13-0001-fuzzystrmatch-use-pg_ascii_toupper.patch (text/x-patch)
- v13-0002-Control-LC_COLLATE-with-GUC.patch (text/x-patch)
On Wed, 2025-12-17 at 11:39 +0100, Peter Eisentraut wrote: > For Metaphone, I found the reference implementation linked from its > Wikipedia page, and it looks like our implementation is pretty > closely > aligned to that. That reference implementation also contains the > C-with-cedilla case explicitly. The correct fix here would probably > be > to change the implementation to work on wide characters. But I think > for the moment you could try a shortcut like, use pg_ascii_toupper(), > but if the encoding is LATIN1 (or LATIN9 or whichever other encodings > also contain C-with-cedilla at that code point), then explicitly > uppercase that one as well. This would preserve the existing > behavior. Done, attached new patches. Interestingly, WIN1256 encodes only the SMALL LETTER C WITH CEDILLA. I think, for the purposes here, we can still consider it to "uppercase" to \xc7, so that it can still be treated as the same sound. Technically I think that would be an improvement over the current code in this edge case, and suggests that case folding would be a better approach than uppercasing. Regards, Jeff Davis