Re: Remaining dependency on setlocale()

Peter Eisentraut <peter@eisentraut.org>

From: Peter Eisentraut <peter@eisentraut.org>
To: Thomas Munro <thomas.munro@gmail.com>, Jeff Davis <pgsql@j-davis.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgresql.org
Date: 2024-12-17T12:14:21Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. fuzzystrmatch: use pg_ascii_toupper().

  2. Avoid global LC_CTYPE dependency in pg_locale_icu.c.

  3. downcase_identifier(): use method table from locale provider.

  4. ltree: fix case-insensitive matching.

  5. Fix multibyte issue in ltree_strncasecmp().

  6. Use multibyte-aware extraction of pattern prefixes.

  7. Add pg_iswcased().

  8. Remove char_tolower() API.

  9. Make regex "max_chr" depend on encoding, not provider.

  10. Change some callers to use pg_ascii_toupper().

  11. Allow pg_locale_t APIs to work when ctype_is_c.

  12. Add #define for UNICODE_CASEMAP_BUFSZ.

  13. Inline pg_ascii_tolower() and pg_ascii_toupper().

  14. Avoid global LC_CTYPE dependency in pg_locale_libc.c.

  15. Force LC_COLLATE to C in postmaster.

  16. Change wchar2char() and char2wchar() to accept a locale_t.

  17. Use pg_ascii_tolower()/pg_ascii_toupper() where appropriate.

  18. inet_net_pton.c: use pg_ascii_tolower() rather than tolower().

  19. isn.c: use pg_ascii_toupper() instead of toupper().

  20. contrib/spi/refint.c: use pg_ascii_tolower() instead.

  21. copyfromparse.c: use pg_ascii_tolower() rather than tolower().

  22. Revert "Tidy up locale thread safety in ECPG library."

  23. Tidy up locale thread safety in ECPG library.

  24. All supported systems have locale_t.

On 13.12.24 10:44, Thomas Munro wrote:
> On Fri, Dec 13, 2024 at 8:22 AM Jeff Davis <pgsql@j-davis.com> wrote:
>> On Wed, 2024-08-14 at 12:00 -0700, Jeff Davis wrote:
>>> On Wed, 2024-08-14 at 14:31 +1200, Thomas Munro wrote:
>>>> 1.  The process global locale is always "C".  If you ever call
>>>> uselocale(), it can only be for short stretches, and you have to
>>>> restore it straight after; perhaps it is only ever used in
>>>> replacement
>>>> _l() functions for systems that lack them.  You need to use _l()
>>>> functions for all non-"C" locales.  The current database default
>>>> needs
>>>> to be available as a variable (in future: thread-local variable, or
>>>> reachable from one), so you can use it in _l() functions.  The "C"
>>>> locale can be accessed implicitly with non-l() functions, or you
>>>> could
>>>> ban those to reduce confusion and use foo_l(..., LC_GLOBAL_LOCALE)
>>>> for
>>>> "C".  Or a name like PG_C_LOCALE, which, in backend code could be
>>>> just
>>>> LC_GLOBAL_LOCALE, while in frontend/library code it could be the
>>>> singleton mechanism I showed in CF#5166.
>>>
>>> +1 to this approach. It makes things more consistent across platforms
>>> and avoids surprising dependencies on the global setting.
>>>
>>> We'll have to be careful that each call site is either OK with C, or
>>> that it gets changed to an _l() variant. We also have to be careful
>>> about extensions.
>>
>> Did we reach a conclusion here? Any thoughts on moving in this
>> direction, and whether 18 is the right time to do it?
> 
> I think this is the best way, and I haven't seen anyone supporting any
> other idea.  (I'm working on those setlocale()-removal patches I
> mentioned, more very soon...)

I also think this is the right direction, and we'll get closer with the 
remaining patches that Thomas has lined up.

I think at this point, we could already remove all locale settings 
related to LC_COLLATE.  Nothing uses that anymore.

I think we will need to keep the global LC_CTYPE setting set to 
something useful, for example so that system error messages come out in 
the right encoding.

But I'm concerned about the the Perl_setlocale() dance in plperl.c. 
Perl apparently does a setlocale(LC_ALL, "") during startup, and that 
code is a workaround to reset everything back afterwards.  We need to be 
careful not to break that.

(Perl has fixed that in 5.19, but the fix requires that you set another 
environment variable before launching Perl, which you can't do in a 
threaded system, so we'd probably need another fix eventually.  See 
<https://github.com/Perl/perl5/issues/8274>.)