Re: Remaining dependency on setlocale()

Thomas Munro <thomas.munro@gmail.com>

From: Thomas Munro <thomas.munro@gmail.com>
To: Jeff Davis <pgsql@j-davis.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgresql.org
Date: 2024-08-12T22:43:14Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. fuzzystrmatch: use pg_ascii_toupper().

  2. Avoid global LC_CTYPE dependency in pg_locale_icu.c.

  3. downcase_identifier(): use method table from locale provider.

  4. ltree: fix case-insensitive matching.

  5. Fix multibyte issue in ltree_strncasecmp().

  6. Use multibyte-aware extraction of pattern prefixes.

  7. Add pg_iswcased().

  8. Remove char_tolower() API.

  9. Make regex "max_chr" depend on encoding, not provider.

  10. Change some callers to use pg_ascii_toupper().

  11. Allow pg_locale_t APIs to work when ctype_is_c.

  12. Add #define for UNICODE_CASEMAP_BUFSZ.

  13. Inline pg_ascii_tolower() and pg_ascii_toupper().

  14. Avoid global LC_CTYPE dependency in pg_locale_libc.c.

  15. Force LC_COLLATE to C in postmaster.

  16. Change wchar2char() and char2wchar() to accept a locale_t.

  17. Use pg_ascii_tolower()/pg_ascii_toupper() where appropriate.

  18. inet_net_pton.c: use pg_ascii_tolower() rather than tolower().

  19. isn.c: use pg_ascii_toupper() instead of toupper().

  20. contrib/spi/refint.c: use pg_ascii_tolower() instead.

  21. copyfromparse.c: use pg_ascii_tolower() rather than tolower().

  22. Revert "Tidy up locale thread safety in ECPG library."

  23. Tidy up locale thread safety in ECPG library.

  24. All supported systems have locale_t.

On Mon, Aug 12, 2024 at 4:53 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> ... though if we wanted to replace all use of localeconv and struct
> lconv with nl_langinfo_l() calls,

Gah.  I realised while trying the above that you can't really replace
localeconv() with nl_langinfo_l() as the GNU documentation recommends,
because some of the lconv fields we're using are missing from
langinfo.h in POSIX (only GNU added them all, that was a good idea).
:-(

Next idea:

Windows: its localeconv() returns pointer to thread-local storage,
good, but it still needs setlocale().  So the options are: make our
own lconv-populator function for Windows, using GetLocaleInfoEx(), or
do that _configthreadlocale() dance (possibly excluding some MinGW
configurations from working)
Systems that have localeconv_l(): use that
POSIX: use uselocale() and also put a big global lock around
localeconv() call + accessing result (optionally skipping that on an
OS-by-OS basis after confirming that its implementation doesn't really
need it)

The reason the uselocale() + localeconv() seems to require a Big Lock
(by default at least) is that the uselocale() deals only with the
"current locale" aspect, not the output buffer aspect.  Clearly the
standard allows for it to be thread-local storage (that's why since
2008 it says that after thread-exit you can't access the result, and I
guess that's how it works on real systems (?)), but it also seems to
allow for a single static buffer (that's why it says that it's not
re-entrant, and any call to localeconv() might clobber it).  That
might be OK in practice because we tend to cache that stuff, eg when
assigning GUC lc_monetary (that cache would presumably become
thread-local in the first phase of the multithreading plan), so the
locking shouldn't really hurt.

The reason we'd have to have three ways, and not just two, is again
that NetBSD declined to implement uselocale().

I'll try this in a bit unless someone else has better ideas or plans
for this part... sorry for the drip-feeding.