Re: Remaining dependency on setlocale()

Peter Eisentraut <peter@eisentraut.org>

From: Peter Eisentraut <peter@eisentraut.org>
To: Jeff Davis <pgsql@j-davis.com>, Thomas Munro <thomas.munro@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgresql.org
Date: 2025-11-12T18:59:33Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. fuzzystrmatch: use pg_ascii_toupper().

  2. Avoid global LC_CTYPE dependency in pg_locale_icu.c.

  3. downcase_identifier(): use method table from locale provider.

  4. ltree: fix case-insensitive matching.

  5. Fix multibyte issue in ltree_strncasecmp().

  6. Use multibyte-aware extraction of pattern prefixes.

  7. Add pg_iswcased().

  8. Remove char_tolower() API.

  9. Make regex "max_chr" depend on encoding, not provider.

  10. Change some callers to use pg_ascii_toupper().

  11. Allow pg_locale_t APIs to work when ctype_is_c.

  12. Add #define for UNICODE_CASEMAP_BUFSZ.

  13. Inline pg_ascii_tolower() and pg_ascii_toupper().

  14. Avoid global LC_CTYPE dependency in pg_locale_libc.c.

  15. Force LC_COLLATE to C in postmaster.

  16. Change wchar2char() and char2wchar() to accept a locale_t.

  17. Use pg_ascii_tolower()/pg_ascii_toupper() where appropriate.

  18. inet_net_pton.c: use pg_ascii_tolower() rather than tolower().

  19. isn.c: use pg_ascii_toupper() instead of toupper().

  20. contrib/spi/refint.c: use pg_ascii_tolower() instead.

  21. copyfromparse.c: use pg_ascii_tolower() rather than tolower().

  22. Revert "Tidy up locale thread safety in ECPG library."

  23. Tidy up locale thread safety in ECPG library.

  24. All supported systems have locale_t.

On 29.10.25 01:19, Jeff Davis wrote:
> On Wed, 2025-07-23 at 19:11 -0700, Jeff Davis wrote:
>> On Fri, 2025-07-11 at 11:48 +1200, Thomas Munro wrote:
>>> On Fri, Jul 11, 2025 at 6:22 AM Jeff Davis <pgsql@j-davis.com>
>>> wrote:
>>>> I don't have a great windows development environment, and it
>>>> appears CI
>>>> and the buildfarm don't offer great coverage either. Can I ask
>>>> for
>>>> a
>>>> volunteer to do the windows side of this work?
>>>
>>> Me neither but I'm willing to help with that, and have done lots of
>>> closely related things through trial-by-CI...
> 
> Attached a new patch series, v6.
> 
> Rather than creating new global locale_t objects, this series (along
> with a separate patch for NLS[1]) removes the dependency on the global
> LC_CTYPE entirely. It's a bunch of small patches that replace direct
> calls to tolower()/toupper() with calls into the provider.
> 
> An assumption of these patches is that, in the UTF-8 encoding, the
> logic in pg_tolower()/pg_toupper() is equivalent to
> pg_ascii_tolower()/pg_ascii_toupper().

I'm getting a bit confused by all these different variant function 
names.  Like we have now

tolower
TOLOWER
char_tolower
pg_tolower
pg_strlower
pg_ascii_tolower
downcase_identifier

and maybe more, and upper versions.

This patch set makes changes like

-           else if (IS_HIGHBIT_SET(ch2) && isupper(ch2))
-               ch2 = tolower(ch2);
+           else if (IS_HIGHBIT_SET(ch2))
+               ch2 = TOLOWER(ch2);

So there is apparently some semantic difference between tolower() and 
TOLOWER(), which is represented by the fact that the function name is 
all upper case?  Actually, it's a macro and could mean different things 
in different contexts.

And there is very little documentation accompanying all these different 
functions.  For example, struct collate_methods and struct ctype_methods 
contain barely any documentation at all.

Many of these issues are pre-existing, but I just figured it has reached 
a point where we need to do something about it.