Re: Remaining dependency on setlocale()

Jeff Davis <pgsql@j-davis.com>

From: Jeff Davis <pgsql@j-davis.com>
To: Peter Eisentraut <peter@eisentraut.org>, Thomas Munro <thomas.munro@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgresql.org
Date: 2025-07-09T22:52:34Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. fuzzystrmatch: use pg_ascii_toupper().

  2. Avoid global LC_CTYPE dependency in pg_locale_icu.c.

  3. downcase_identifier(): use method table from locale provider.

  4. ltree: fix case-insensitive matching.

  5. Fix multibyte issue in ltree_strncasecmp().

  6. Use multibyte-aware extraction of pattern prefixes.

  7. Add pg_iswcased().

  8. Remove char_tolower() API.

  9. Make regex "max_chr" depend on encoding, not provider.

  10. Change some callers to use pg_ascii_toupper().

  11. Allow pg_locale_t APIs to work when ctype_is_c.

  12. Add #define for UNICODE_CASEMAP_BUFSZ.

  13. Inline pg_ascii_tolower() and pg_ascii_toupper().

  14. Avoid global LC_CTYPE dependency in pg_locale_libc.c.

  15. Force LC_COLLATE to C in postmaster.

  16. Change wchar2char() and char2wchar() to accept a locale_t.

  17. Use pg_ascii_tolower()/pg_ascii_toupper() where appropriate.

  18. inet_net_pton.c: use pg_ascii_tolower() rather than tolower().

  19. isn.c: use pg_ascii_toupper() instead of toupper().

  20. contrib/spi/refint.c: use pg_ascii_tolower() instead.

  21. copyfromparse.c: use pg_ascii_tolower() rather than tolower().

  22. Revert "Tidy up locale thread safety in ECPG library."

  23. Tidy up locale thread safety in ECPG library.

  24. All supported systems have locale_t.

On Mon, 2025-07-07 at 17:56 -0700, Jeff Davis wrote:
> I looked into this a bit, and if I understand correctly, the only
> problem is with strerror() and strerror_r(), which depend on
> LC_MESSAGES for the language but LC_CTYPE to find the right encoding.

...

> Windows would be a different story, though: strerror() doesn't seem
> to
> have a variant that accepts a _locale_t object, and even if it did, I
> don't see a way to create a _locale_t object with LC_MESSAGES and
> LC_CTYPE set to different values.

I think I have an answer to the second part here:

"For information about the format of the locale argument, see Locale
names, Languages, and Country/Region strings."

https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/create-locale-wcreate-locale?view=msvc-170

and when I follow that link, I see:

  "You can specify multiple category types, separated by semicolons.
Category types that aren't specified use the current locale setting.
For example, this code snippet sets the current locale for all
categories to de-DE, and then sets the categories LC_MONETARY to en-GB
and LC_TIME to es-ES:

  _wsetlocale(LC_ALL, L"de-DE");
  _wsetlocale(LC_ALL, L"LC_MONETARY=en-GB;LC_TIME=es-ES");"

https://learn.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=msvc-170

So we just need to construct a string of the right form, and we can
have a _locale_t object representing the global locale for all
categories. I'm not sure exactly how we escape the individual locale
names, but it might be enough to just reject ';' in the locale name (at
least for windows).

The first problem -- how to affect the encoding of strings returned by
strerror() on windows -- may be solvable as well. It looks like
LC_MESSAGES is not supported at all on windows, so the only thing to be
concerned about is the encoding, which is affected by LC_CTYPE. But
windows doesn't offer uselocale() or strerror_l(). The only way seems
to be to call _configthreadlocale(_ENABLE_PER_THREAD_LOCALE) and then
setlocale(LC_CTYPE, datctype) right before strerror(), and switch it
back to "C" right afterward. Comments welcome.

Regards,
	Jeff Davis