Re: Remaining dependency on setlocale()
Jeff Davis <pgsql@j-davis.com>
From: Jeff Davis <pgsql@j-davis.com>
To: Peter Eisentraut <peter@eisentraut.org>, Chao Li <li.evan.chao@gmail.com>
Cc: Thomas Munro <thomas.munro@gmail.com>, Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgresql.org
Date: 2025-12-12T20:11:40Z
Lists: pgsql-hackers
Attachments
- v12-0001-Use-multibyte-aware-extraction-of-pattern-prefix.patch (text/x-patch)
- v12-0002-Remove-unused-single-byte-char_is_cased-API.patch (text/x-patch)
- v12-0003-Fix-multibyte-issue-in-ltree_strncasecmp.patch (text/x-patch)
- v12-0004-Fix-inconsistency-between-ltree_strncasecmp-and-.patch (text/x-patch)
- v12-0005-downcase_identifier-use-method-table-from-locale.patch (text/x-patch)
- v12-0006-Avoid-global-LC_CTYPE-dependency-in-pg_locale_ic.patch (text/x-patch)
- v12-0007-fuzzystrmatch-use-pg_ascii_toupper.patch (text/x-patch)
- v12-0008-Control-LC_COLLATE-with-GUC.patch (text/x-patch)
On Fri, 2025-12-05 at 16:01 +0100, Peter Eisentraut wrote: > v11-0003-Fix-inconsistency-between-ltree_strncasecmp-and-.patch > > The function comment reads "Check if b has a prefix of a." -- Is that > the same as "Check if a is a prefix of b."? The latter might be > clearer. Yes, fixed. Note: I separated this into two patches. 0003 fixes the multibyte mishandling issue, and 0004 consistently performs case folding. 0003 is backpatchable, I believe. > but the patch removes SB_lower_char(). Fixed and committed. > v11-0006-Use-multibyte-aware-extraction-of-pattern-prefix.patch > > Might have a small typo in the commit message: > > ; and preserve and char-at-a-time logic for bytea. Fixed. I also changed it into two functions: like_fixed_prefix(), which is almost unchanged from the original; and like_fixed_prefix_ci(), which is multibyte and locale-aware. It was too confusing to have single-byte and multi-byte logic in the same function, and they didn't share much code anyway. > case '\xc7': /* C with cedilla */ > > so the premise that "fuzzystrmatch is designed for ASCII" does not > appear to be correct. Needs more analysis. > > (But apparently it's not multibyte aware at all, so I don't know what > to > do about that.) I didn't notice that, thank you. Agreed, we need a bit more discussion around this case as well as soundex(). > v11-0008-downcase_identifier-use-method-table-from-locale.patch > > I'm confused here about the name of the function pg_strfold_ident(). > In > general, case "folding" results in an opaque string that is really > only > useful for comparing against other case-folded strings. But for > identifiers we are actually interested lower-casing. I think this > should be corrected in the API naming. Agreed and fixed. Also, I added 0006, which saves a locale_t object for ICU in this one case where it's required. Surely that's not what we want in the long term, but we don't have the infrastructure for decoding pg_wchar into code points yet, and 0006 avoids the dependency on the global LC_CTYPE setting. > v11-0009-Control-LC_COLLATE-with-GUC.patch > > I know there were some complaints about compatibility with > extensions, > but I don't think anything concrete was presented. I would like to > see > more evidence that we need this. > > Also, recall that we used to have a lc_collate GUC, and in the end > people got confused that it didn't actually show a meaningful value > when > you used ICU. So we removed that. It seems adding this back in > would > create a similar kind of confusion. So to avoid that, maybe this > should > be called fallback_lc_collate or something like that. Yes, this is a POC patch and needs more discussion. What are your thoughts about a similar lc_ctype GUC, though? That has slightly different trade-offs. I believe v12 0001-0005 are about ready for commit, and 0003 should be backported. Regards, Jeff Davis