Re: Remaining dependency on setlocale()

Thomas Munro <thomas.munro@gmail.com>

From: Thomas Munro <thomas.munro@gmail.com>
To: Robert Haas <robertmhaas@gmail.com>
Cc: Joe Conway <mail@joeconway.com>, Jeff Davis <pgsql@j-davis.com>, Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgresql.org
Date: 2024-08-07T21:40:56Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. fuzzystrmatch: use pg_ascii_toupper().

  2. Avoid global LC_CTYPE dependency in pg_locale_icu.c.

  3. downcase_identifier(): use method table from locale provider.

  4. ltree: fix case-insensitive matching.

  5. Fix multibyte issue in ltree_strncasecmp().

  6. Use multibyte-aware extraction of pattern prefixes.

  7. Add pg_iswcased().

  8. Remove char_tolower() API.

  9. Make regex "max_chr" depend on encoding, not provider.

  10. Change some callers to use pg_ascii_toupper().

  11. Allow pg_locale_t APIs to work when ctype_is_c.

  12. Add #define for UNICODE_CASEMAP_BUFSZ.

  13. Inline pg_ascii_tolower() and pg_ascii_toupper().

  14. Avoid global LC_CTYPE dependency in pg_locale_libc.c.

  15. Force LC_COLLATE to C in postmaster.

  16. Change wchar2char() and char2wchar() to accept a locale_t.

  17. Use pg_ascii_tolower()/pg_ascii_toupper() where appropriate.

  18. inet_net_pton.c: use pg_ascii_tolower() rather than tolower().

  19. isn.c: use pg_ascii_toupper() instead of toupper().

  20. contrib/spi/refint.c: use pg_ascii_tolower() instead.

  21. copyfromparse.c: use pg_ascii_tolower() rather than tolower().

  22. Revert "Tidy up locale thread safety in ECPG library."

  23. Tidy up locale thread safety in ECPG library.

  24. All supported systems have locale_t.

On Thu, Aug 8, 2024 at 6:18 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Aug 7, 2024 at 1:29 PM Joe Conway <mail@joeconway.com> wrote:
> > FWIW I see all of these in glibc:
> >
> > isalnum_l, isalpha_l, isascii_l, isblank_l, iscntrl_l, isdigit_l,
> > isgraph_l,  islower_l, isprint_l, ispunct_l, isspace_l, isupper_l,
> > isxdigit_l
>
> On my MacBook (Ventura, 13.6.7), I see all of these except for isascii_l.

Those (except isascii_l) are from POSIX 2008[1].  They were absorbed
from "Extended API Set Part 4"[2], along with locale_t (that's why
there is a header <xlocale.h> on a couple of systems even though after
absorption they are supposed to be in <locale.h>).  We already
decided that all computers have that stuff (commit 8d9a9f03), but the
reality is a little messier than that... NetBSD hasn't implemented
uselocale() yet[3], though it has a good set of _l functions.  As
discussed in [3], ECPG code is therefore currently broken in
multithreaded clients because it's falling back to a setlocale() path,
and I think Windows+MinGW must be too (it lacks
HAVE__CONFIGTHREADLOCALE), but those both have a good set of _l
functions.  In that thread I tried to figure out how to use _l
functions to fix that problem, but ...

The issue there is that we have our own snprintf.c, that implicitly
requires LC_NUMERIC to be "C" (it is documented as always printing
floats a certain way ignoring locale and that's what the callers there
want in frontend and backend code, but in reality it punts to system
snprintf for floats, assuming that LC_NUMERIC is "C", which we
configure early in backend startup, but frontend code has to do it for
itself!).  So we could use snprintf_l or strtod_l instead, but POSIX
hasn't got those yet.  Or we could use own own Ryu code (fairly
specific), but integrating Ryu into our snprintf.c (and correctly
implementing all the %... stuff?) sounds like quite a hard,
devil-in-the-details kind of an undertaking to me.  Or maybe it's
easy, I dunno.  As for the _l functions, you could probably get away
with "every computer has either uselocale() or snprintf_() (or
strtod_()?)" and have two code paths in our snprintf.c.  But then we'd
also need a place to track a locale_t for a long-lived newlocale("C"),
which was too messy in my latest attempt...

[1] https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/functions/isspace.html
[2] https://pubs.opengroup.org/onlinepubs/9699939499/toc.pdf
[3] https://www.postgresql.org/message-id/flat/CWZBBRR6YA8D.8EHMDRGLCKCD%40neon.tech