Re: Remaining dependency on setlocale()

Thomas Munro <thomas.munro@gmail.com>

From: Thomas Munro <thomas.munro@gmail.com>
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Jeff Davis <pgsql@j-davis.com>, pgsql-hackers@postgresql.org
Date: 2024-08-07T07:07:40Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. fuzzystrmatch: use pg_ascii_toupper().

  2. Avoid global LC_CTYPE dependency in pg_locale_icu.c.

  3. downcase_identifier(): use method table from locale provider.

  4. ltree: fix case-insensitive matching.

  5. Fix multibyte issue in ltree_strncasecmp().

  6. Use multibyte-aware extraction of pattern prefixes.

  7. Add pg_iswcased().

  8. Remove char_tolower() API.

  9. Make regex "max_chr" depend on encoding, not provider.

  10. Change some callers to use pg_ascii_toupper().

  11. Allow pg_locale_t APIs to work when ctype_is_c.

  12. Add #define for UNICODE_CASEMAP_BUFSZ.

  13. Inline pg_ascii_tolower() and pg_ascii_toupper().

  14. Avoid global LC_CTYPE dependency in pg_locale_libc.c.

  15. Force LC_COLLATE to C in postmaster.

  16. Change wchar2char() and char2wchar() to accept a locale_t.

  17. Use pg_ascii_tolower()/pg_ascii_toupper() where appropriate.

  18. inet_net_pton.c: use pg_ascii_tolower() rather than tolower().

  19. isn.c: use pg_ascii_toupper() instead of toupper().

  20. contrib/spi/refint.c: use pg_ascii_tolower() instead.

  21. copyfromparse.c: use pg_ascii_tolower() rather than tolower().

  22. Revert "Tidy up locale thread safety in ECPG library."

  23. Tidy up locale thread safety in ECPG library.

  24. All supported systems have locale_t.

On Wed, Aug 7, 2024 at 10:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
> > But there are a couple problems:
>
> > 1. I don't think it's supported on Windows.
>
> Can't help with that, but surely Windows has some thread-safe way.

It does.  It's not exactly the same, instead there is a thing you can
call that puts setlocale() itself into a thread-local mode, but last
time I checked that mode was missing on MinGW so that's a bit of an
obstacle.

How far can we get by using more _l() functions?  For example, [1]
shows a use of strftime() that I think can be converted to
strftime_l() so that it doesn't depend on setlocale().  Since POSIX
doesn't specify every obvious _l function, we might need to provide
any missing wrappers that save/restore thread-locally with
uselocale().  Windows doesn't have uselocale(), but it generally
doesn't need such wrappers because it does have most of the obvious
_l() functions.

> > 2. I don't see a good way to canonicalize a locale name, like in
> > check_locale(), which uses the result of setlocale().
>
> What I can tell you about that is that check_locale's expectation
> that setlocale does any useful canonicalization is mostly wishful
> thinking [1].  On a lot of platforms you just get the input string
> back again.  If that's the only thing keeping us on setlocale,
> I think we could drop it.  (Perhaps we should do some canonicalization
> of our own instead?)

+1

I know it does something on Windows (we know the EDB installer gives
it strings like "Language,Country" and it converts them to
"Language_Country.Encoding", see various threads about it all going
wrong), but I'm not sure it does anything we actually want to
encourage.  I'm hoping we can gradually screw it down so that we only
have sane BCP 47 in the system on that OS, and I don't see why we
wouldn't just use them verbatim.

[1] https://www.postgresql.org/message-id/CA%2BhUKGJ%3Dca39Cg%3Dy%3DS89EaCYvvCF8NrZRO%3Duog-cnz0VzC6Kfg%40mail.gmail.com