Re: Remaining dependency on setlocale()

Jeff Davis <pgsql@j-davis.com>

From: Jeff Davis <pgsql@j-davis.com>
To: Peter Eisentraut <peter@eisentraut.org>, Thomas Munro <thomas.munro@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgresql.org
Date: 2025-07-08T00:56:03Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. fuzzystrmatch: use pg_ascii_toupper().

  2. Avoid global LC_CTYPE dependency in pg_locale_icu.c.

  3. downcase_identifier(): use method table from locale provider.

  4. ltree: fix case-insensitive matching.

  5. Fix multibyte issue in ltree_strncasecmp().

  6. Use multibyte-aware extraction of pattern prefixes.

  7. Add pg_iswcased().

  8. Remove char_tolower() API.

  9. Make regex "max_chr" depend on encoding, not provider.

  10. Change some callers to use pg_ascii_toupper().

  11. Allow pg_locale_t APIs to work when ctype_is_c.

  12. Add #define for UNICODE_CASEMAP_BUFSZ.

  13. Inline pg_ascii_tolower() and pg_ascii_toupper().

  14. Avoid global LC_CTYPE dependency in pg_locale_libc.c.

  15. Force LC_COLLATE to C in postmaster.

  16. Change wchar2char() and char2wchar() to accept a locale_t.

  17. Use pg_ascii_tolower()/pg_ascii_toupper() where appropriate.

  18. inet_net_pton.c: use pg_ascii_tolower() rather than tolower().

  19. isn.c: use pg_ascii_toupper() instead of toupper().

  20. contrib/spi/refint.c: use pg_ascii_tolower() instead.

  21. copyfromparse.c: use pg_ascii_tolower() rather than tolower().

  22. Revert "Tidy up locale thread safety in ECPG library."

  23. Tidy up locale thread safety in ECPG library.

  24. All supported systems have locale_t.

Attachments

On Wed, 2025-06-11 at 12:15 -0700, Jeff Davis wrote:
> > v1-0008-Set-process-LC_COLLATE-C-and-LC_CTYPE-C.patch
> > 
> > As I mentioned earlier in the thread, I don't think we can do this
> > for 
> > LC_CTYPE, because otherwise system error messages would not come
> > out
> > in 
> > the right encoding.
> 
> Changed it so that it only sets LC_COLLATE to C, and leaves LC_CTYPE
> set to datctype.
> 
> Unfortunately, as long as LC_CTYPE is set to a real locale, there's a
> danger of accidentally depending on that setting. Can the encoding be
> controlled with LC_MESSAGES instead of LC_CTYPE?
> 
> Do you have an example of how things can go wrong?

I looked into this a bit, and if I understand correctly, the only
problem is with strerror() and strerror_r(), which depend on
LC_MESSAGES for the language but LC_CTYPE to find the right encoding.

I attached some example C code to illustrate how strerror() is affected
by both LC_MESSAGES and LC_CTYPE. For example:

   $ ./strerror de_DE.UTF-8 de_DE.UTF-8
   LC_CTYPE set to: de_DE.UTF-8
   LC_MESSAGES set to: de_DE.UTF-8
   Error message (from strerror(EILSEQ)): Ungültiges oder
unvollständiges Multi-Byte- oder Wide-Zeichen
   $ ./strerror C de_DE.UTF-8
   LC_CTYPE set to: C
   LC_MESSAGES set to: de_DE.UTF-8
   Error message (from strerror(EILSEQ)): Ung?ltiges oder
unvollst?ndiges Multi-Byte- oder Wide-Zeichen

On unix-based systems, we can use newlocale() to initialize a global
variable with both LC_CTYPE and LC_MESSAGES set. The LC_MESSAGES
portion would need to be updated every time the GUC changes, which is
not great.

Windows would be a different story, though: strerror() doesn't seem to
have a variant that accepts a _locale_t object, and even if it did, I
don't see a way to create a _locale_t object with LC_MESSAGES and
LC_CTYPE set to different values. One idea is to use
_configthreadlocale(_ENABLE_PER_THREAD_LOCALE), and then use
setlocale(), which could enable us to use setlocale() similar to how we
use uselocale() on other systems. That would be awkward, though.

Thoughts? That seems like a lot of work just for the case of
strerror()/strerror_r().

Regards,
	Jeff Davis

[1]
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/configthreadlocale?view=msvc-170