Re: Remaining dependency on setlocale()

Thomas Munro <thomas.munro@gmail.com>

From: Thomas Munro <thomas.munro@gmail.com>
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Jeff Davis <pgsql@j-davis.com>, pgsql-hackers@postgresql.org
Date: 2024-08-09T21:42:29Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. fuzzystrmatch: use pg_ascii_toupper().

  2. Avoid global LC_CTYPE dependency in pg_locale_icu.c.

  3. downcase_identifier(): use method table from locale provider.

  4. ltree: fix case-insensitive matching.

  5. Fix multibyte issue in ltree_strncasecmp().

  6. Use multibyte-aware extraction of pattern prefixes.

  7. Add pg_iswcased().

  8. Remove char_tolower() API.

  9. Make regex "max_chr" depend on encoding, not provider.

  10. Change some callers to use pg_ascii_toupper().

  11. Allow pg_locale_t APIs to work when ctype_is_c.

  12. Add #define for UNICODE_CASEMAP_BUFSZ.

  13. Inline pg_ascii_tolower() and pg_ascii_toupper().

  14. Avoid global LC_CTYPE dependency in pg_locale_libc.c.

  15. Force LC_COLLATE to C in postmaster.

  16. Change wchar2char() and char2wchar() to accept a locale_t.

  17. Use pg_ascii_tolower()/pg_ascii_toupper() where appropriate.

  18. inet_net_pton.c: use pg_ascii_tolower() rather than tolower().

  19. isn.c: use pg_ascii_toupper() instead of toupper().

  20. contrib/spi/refint.c: use pg_ascii_tolower() instead.

  21. copyfromparse.c: use pg_ascii_tolower() rather than tolower().

  22. Revert "Tidy up locale thread safety in ECPG library."

  23. Tidy up locale thread safety in ECPG library.

  24. All supported systems have locale_t.

On Wed, Aug 7, 2024 at 7:07 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Wed, Aug 7, 2024 at 10:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Jeff Davis <pgsql@j-davis.com> writes:
> > > But there are a couple problems:
> >
> > > 1. I don't think it's supported on Windows.
> >
> > Can't help with that, but surely Windows has some thread-safe way.
>
> It does.  It's not exactly the same, instead there is a thing you can
> call that puts setlocale() itself into a thread-local mode, but last
> time I checked that mode was missing on MinGW so that's a bit of an
> obstacle.

Actually the MinGW situation might be better than that these days.  I
know of three environments where we currently have to keep code
working on MinGW: build farm animal fairywren (msys2 compiler
toochain), CI's optional "Windows - Server 2019, MinGW64 - Meson"
task, and CI's "CompilerWarnings" task, in the "mingw_cross_warning"
step (which actually runs on Linux, and uses configure rather than
meson).  All three environments show that they have
_configthreadlocale.  So could we could simply require it on Windows?
Then it might be possible to write a replacement implementation of
uselocale() that does a two-step dance with _configthreadlocale() and
setlocale(), restoring both afterwards if they changed.  That's what
ECPG open-codes already.

The NetBSD situation is more vexing.  I was trying to find out if
someone is working on it and unfortunately it looks like there is a
principled stand against adding it:

https://mail-index.netbsd.org/tech-userlevel/2015/12/28/msg009546.html
https://mail-index.netbsd.org/netbsd-users/2017/02/14/msg019352.html

They're right that we really just want to use "C" in some places, and
their LC_C_LOCALE is a very useful system-provided value to be able to
pass into _l functions.  It's a shame it's non-standard, because
without it you have to allocate a locale_t for "C" and keep it
somewhere to feed to _l functions...