Re: Remaining dependency on setlocale()

Jeff Davis <pgsql@j-davis.com>

From: Jeff Davis <pgsql@j-davis.com>
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Peter Eisentraut <peter@eisentraut.org>, Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgresql.org
Date: 2025-10-14T23:26:47Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. fuzzystrmatch: use pg_ascii_toupper().

  2. Avoid global LC_CTYPE dependency in pg_locale_icu.c.

  3. downcase_identifier(): use method table from locale provider.

  4. ltree: fix case-insensitive matching.

  5. Fix multibyte issue in ltree_strncasecmp().

  6. Use multibyte-aware extraction of pattern prefixes.

  7. Add pg_iswcased().

  8. Remove char_tolower() API.

  9. Make regex "max_chr" depend on encoding, not provider.

  10. Change some callers to use pg_ascii_toupper().

  11. Allow pg_locale_t APIs to work when ctype_is_c.

  12. Add #define for UNICODE_CASEMAP_BUFSZ.

  13. Inline pg_ascii_tolower() and pg_ascii_toupper().

  14. Avoid global LC_CTYPE dependency in pg_locale_libc.c.

  15. Force LC_COLLATE to C in postmaster.

  16. Change wchar2char() and char2wchar() to accept a locale_t.

  17. Use pg_ascii_tolower()/pg_ascii_toupper() where appropriate.

  18. inet_net_pton.c: use pg_ascii_tolower() rather than tolower().

  19. isn.c: use pg_ascii_toupper() instead of toupper().

  20. contrib/spi/refint.c: use pg_ascii_tolower() instead.

  21. copyfromparse.c: use pg_ascii_tolower() rather than tolower().

  22. Revert "Tidy up locale thread safety in ECPG library."

  23. Tidy up locale thread safety in ECPG library.

  24. All supported systems have locale_t.

On Thu, 2025-07-24 at 11:10 -0700, Jeff Davis wrote:
> The main problem is with strerror_r()...

Postgres messages, like "division by zero" are translated just fine
without LC_CTYPE; gettext() only needs LC_MESSAGES and the server
encoding. So these are fine.

We use strerror_r() to translate the system errno into a readable
message, like "No such file or directory", i.e. the %m replacements.
That needs LC_CTYPE set (just for the encoding, not the
language/region) as well as LC_MESSAGES (for the language/region).

When using a locale provider other than libc, it's unfortunate to
require LC_CTYPE to be set for just this one single purpose. The locale
itself, e.g. the "en_US" part, is not used at all; only the encoding
part of the setting is relevant. And there is no value other than "C"
that works on all platforms. It's fairly confusing to explain why the
LC_CTYPE setting is required for the builtin or ICU providers at all.
Also, while it's far from the biggest challenge when it comes to
multithreading, it does cause thread-safety headaches on platforms
without uselocale().

Perhaps we could get the ASCII message and run it through gettext()?
That would be extra work for translators, but perhaps not a lot, given
that it's a small and static set of messages in practice. That would
also have the benefit that either NLS is enabled or not -- right now,
since the translation happens in two different ways you can end up with
partially-translated messages. It would also result in consistent
translations across platforms.

Regards,
	Jeff Davis