Re: Remaining dependency on setlocale()

Thomas Munro <thomas.munro@gmail.com>

From: Thomas Munro <thomas.munro@gmail.com>
To: Jeff Davis <pgsql@j-davis.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgresql.org
Date: 2024-12-13T09:44:06Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. fuzzystrmatch: use pg_ascii_toupper().

  2. Avoid global LC_CTYPE dependency in pg_locale_icu.c.

  3. downcase_identifier(): use method table from locale provider.

  4. ltree: fix case-insensitive matching.

  5. Fix multibyte issue in ltree_strncasecmp().

  6. Use multibyte-aware extraction of pattern prefixes.

  7. Add pg_iswcased().

  8. Remove char_tolower() API.

  9. Make regex "max_chr" depend on encoding, not provider.

  10. Change some callers to use pg_ascii_toupper().

  11. Allow pg_locale_t APIs to work when ctype_is_c.

  12. Add #define for UNICODE_CASEMAP_BUFSZ.

  13. Inline pg_ascii_tolower() and pg_ascii_toupper().

  14. Avoid global LC_CTYPE dependency in pg_locale_libc.c.

  15. Force LC_COLLATE to C in postmaster.

  16. Change wchar2char() and char2wchar() to accept a locale_t.

  17. Use pg_ascii_tolower()/pg_ascii_toupper() where appropriate.

  18. inet_net_pton.c: use pg_ascii_tolower() rather than tolower().

  19. isn.c: use pg_ascii_toupper() instead of toupper().

  20. contrib/spi/refint.c: use pg_ascii_tolower() instead.

  21. copyfromparse.c: use pg_ascii_tolower() rather than tolower().

  22. Revert "Tidy up locale thread safety in ECPG library."

  23. Tidy up locale thread safety in ECPG library.

  24. All supported systems have locale_t.

On Fri, Dec 13, 2024 at 8:22 AM Jeff Davis <pgsql@j-davis.com> wrote:
> On Wed, 2024-08-14 at 12:00 -0700, Jeff Davis wrote:
> > On Wed, 2024-08-14 at 14:31 +1200, Thomas Munro wrote:
> > > 1.  The process global locale is always "C".  If you ever call
> > > uselocale(), it can only be for short stretches, and you have to
> > > restore it straight after; perhaps it is only ever used in
> > > replacement
> > > _l() functions for systems that lack them.  You need to use _l()
> > > functions for all non-"C" locales.  The current database default
> > > needs
> > > to be available as a variable (in future: thread-local variable, or
> > > reachable from one), so you can use it in _l() functions.  The "C"
> > > locale can be accessed implicitly with non-l() functions, or you
> > > could
> > > ban those to reduce confusion and use foo_l(..., LC_GLOBAL_LOCALE)
> > > for
> > > "C".  Or a name like PG_C_LOCALE, which, in backend code could be
> > > just
> > > LC_GLOBAL_LOCALE, while in frontend/library code it could be the
> > > singleton mechanism I showed in CF#5166.
> >
> > +1 to this approach. It makes things more consistent across platforms
> > and avoids surprising dependencies on the global setting.
> >
> > We'll have to be careful that each call site is either OK with C, or
> > that it gets changed to an _l() variant. We also have to be careful
> > about extensions.
>
> Did we reach a conclusion here? Any thoughts on moving in this
> direction, and whether 18 is the right time to do it?

I think this is the best way, and I haven't seen anyone supporting any
other idea.  (I'm working on those setlocale()-removal patches I
mentioned, more very soon...)