Re: Remaining dependency on setlocale()

Peter Eisentraut <peter@eisentraut.org>

From: Peter Eisentraut <peter@eisentraut.org>
To: Jeff Davis <pgsql@j-davis.com>, Chao Li <li.evan.chao@gmail.com>
Cc: Thomas Munro <thomas.munro@gmail.com>, Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgresql.org
Date: 2025-12-05T15:01:35Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. fuzzystrmatch: use pg_ascii_toupper().

  2. Avoid global LC_CTYPE dependency in pg_locale_icu.c.

  3. downcase_identifier(): use method table from locale provider.

  4. ltree: fix case-insensitive matching.

  5. Fix multibyte issue in ltree_strncasecmp().

  6. Use multibyte-aware extraction of pattern prefixes.

  7. Add pg_iswcased().

  8. Remove char_tolower() API.

  9. Make regex "max_chr" depend on encoding, not provider.

  10. Change some callers to use pg_ascii_toupper().

  11. Allow pg_locale_t APIs to work when ctype_is_c.

  12. Add #define for UNICODE_CASEMAP_BUFSZ.

  13. Inline pg_ascii_tolower() and pg_ascii_toupper().

  14. Avoid global LC_CTYPE dependency in pg_locale_libc.c.

  15. Force LC_COLLATE to C in postmaster.

  16. Change wchar2char() and char2wchar() to accept a locale_t.

  17. Use pg_ascii_tolower()/pg_ascii_toupper() where appropriate.

  18. inet_net_pton.c: use pg_ascii_tolower() rather than tolower().

  19. isn.c: use pg_ascii_toupper() instead of toupper().

  20. contrib/spi/refint.c: use pg_ascii_tolower() instead.

  21. copyfromparse.c: use pg_ascii_tolower() rather than tolower().

  22. Revert "Tidy up locale thread safety in ECPG library."

  23. Tidy up locale thread safety in ECPG library.

  24. All supported systems have locale_t.

On 29.11.25 21:50, Jeff Davis wrote:
> All fixed, thank you! (I apologize for posting a patch in that state to
> begin with...)
> 
> I also reorganized slightly to separate out the pg_iswcased() API into
> its own patch, and moved the like_support.c changes from the ctype_is_c
> patch (already committed: 1476028225) into the pattern prefixes patch.

I reviewed the v11 patches.  But I wasn't able to apply them locally 
(couldn't find a starting commit where they applied cleanly), so I 
haven't tested them.

Patches 0001 through 0006 seem generally ok, with some small comments:

v11-0003-Fix-inconsistency-between-ltree_strncasecmp-and-.patch

The function comment reads "Check if b has a prefix of a." -- Is that 
the same as "Check if a is a prefix of b."?  The latter might be clearer.


v11-0004-Remove-char_tolower-API.patch

The updated comment reads

+        * For efficiency reasons, in the C locale we don't call lower() 
on the
+        * pattern and text, but instead call SB_lower_char on each 
character.

but the patch removes SB_lower_char().


v11-0006-Use-multibyte-aware-extraction-of-pattern-prefix.patch

Might have a small typo in the commit message:

; and preserve and char-at-a-time logic for bytea.


For the remaining patches I have some more substantial questions.

v11-0007-fuzzystrmatch-use-pg_ascii_toupper.patch

dmetaphone.c has a comment

case '\xc7':        /* C with cedilla */

so the premise that "fuzzystrmatch is designed for ASCII" does not
appear to be correct.  Needs more analysis.

(But apparently it's not multibyte aware at all, so I don't know what to 
do about that.)


v11-0008-downcase_identifier-use-method-table-from-locale.patch

I'm confused here about the name of the function pg_strfold_ident().  In 
general, case "folding" results in an opaque string that is really only 
useful for comparing against other case-folded strings.  But for 
identifiers we are actually interested lower-casing.  I think this 
should be corrected in the API naming.


v11-0009-Control-LC_COLLATE-with-GUC.patch

I know there were some complaints about compatibility with extensions, 
but I don't think anything concrete was presented.  I would like to see 
more evidence that we need this.

Also, recall that we used to have a lc_collate GUC, and in the end 
people got confused that it didn't actually show a meaningful value when 
you used ICU.  So we removed that.  It seems adding this back in would 
create a similar kind of confusion.  So to avoid that, maybe this should 
be called fallback_lc_collate or something like that.

If we were to proceed with this patch, it should have some documentation 
and tests.