Re: Remaining dependency on setlocale()

Jeff Davis <pgsql@j-davis.com>

From: Jeff Davis <pgsql@j-davis.com>
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Peter Eisentraut <peter@eisentraut.org>, Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgresql.org
Date: 2025-07-24T18:10:40Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. fuzzystrmatch: use pg_ascii_toupper().

  2. Avoid global LC_CTYPE dependency in pg_locale_icu.c.

  3. downcase_identifier(): use method table from locale provider.

  4. ltree: fix case-insensitive matching.

  5. Fix multibyte issue in ltree_strncasecmp().

  6. Use multibyte-aware extraction of pattern prefixes.

  7. Add pg_iswcased().

  8. Remove char_tolower() API.

  9. Make regex "max_chr" depend on encoding, not provider.

  10. Change some callers to use pg_ascii_toupper().

  11. Allow pg_locale_t APIs to work when ctype_is_c.

  12. Add #define for UNICODE_CASEMAP_BUFSZ.

  13. Inline pg_ascii_tolower() and pg_ascii_toupper().

  14. Avoid global LC_CTYPE dependency in pg_locale_libc.c.

  15. Force LC_COLLATE to C in postmaster.

  16. Change wchar2char() and char2wchar() to accept a locale_t.

  17. Use pg_ascii_tolower()/pg_ascii_toupper() where appropriate.

  18. inet_net_pton.c: use pg_ascii_tolower() rather than tolower().

  19. isn.c: use pg_ascii_toupper() instead of toupper().

  20. contrib/spi/refint.c: use pg_ascii_tolower() instead.

  21. copyfromparse.c: use pg_ascii_tolower() rather than tolower().

  22. Revert "Tidy up locale thread safety in ECPG library."

  23. Tidy up locale thread safety in ECPG library.

  24. All supported systems have locale_t.

On Wed, 2025-07-23 at 19:11 -0700, Jeff Davis wrote:
> The patch feels a bit over-engineered, but I'd like to know what you
> think. It would be great if you could test/debug the windows NLS-
> enabled paths.

Let me explain how it ended up looking over-engineered, and perhaps
someone has a simpler solution.

For gettext, we already configure the encoding with
bind_textdomain_codeset(). All it needs is LC_MESSAGES set properly,
which can be done with uselocale(), as a semi-permanent setting until
the next GUC change, just like setlocale() today. There are a couple
minor problems for platforms without uselocale(). For windows, we could
just permanently do:

  _configthreadlocale(_ENABLE_PER_THREAD_LOCALE)

and then use _wsetlocale. For NetBSD, I don't have a solution, but
perhaps we can just reject new lc_messages settings after startup, or
just defer the problem until threading actually becomes a pressing
issue.

The main problem is with strerror_r(). To get the right LC_MESSAGES
setting, we need the separate path for windows (which has neither
uselocale() nor strerror_l()). Because we need to keep track of that
path anyway, I used it for gettext as well to have a cleaner separation
for the entire message translation locale. That means we can avoid
permanent locale settings, and reduce the chances that we accidentally
depend on the global locale.

Regards,
	Jeff Davis