Re: Remaining dependency on setlocale()

Peter Eisentraut <peter@eisentraut.org>

From: Peter Eisentraut <peter@eisentraut.org>
To: Jeff Davis <pgsql@j-davis.com>, Thomas Munro <thomas.munro@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgresql.org
Date: 2025-10-31T09:40:41Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. fuzzystrmatch: use pg_ascii_toupper().

  2. Avoid global LC_CTYPE dependency in pg_locale_icu.c.

  3. downcase_identifier(): use method table from locale provider.

  4. ltree: fix case-insensitive matching.

  5. Fix multibyte issue in ltree_strncasecmp().

  6. Use multibyte-aware extraction of pattern prefixes.

  7. Add pg_iswcased().

  8. Remove char_tolower() API.

  9. Make regex "max_chr" depend on encoding, not provider.

  10. Change some callers to use pg_ascii_toupper().

  11. Allow pg_locale_t APIs to work when ctype_is_c.

  12. Add #define for UNICODE_CASEMAP_BUFSZ.

  13. Inline pg_ascii_tolower() and pg_ascii_toupper().

  14. Avoid global LC_CTYPE dependency in pg_locale_libc.c.

  15. Force LC_COLLATE to C in postmaster.

  16. Change wchar2char() and char2wchar() to accept a locale_t.

  17. Use pg_ascii_tolower()/pg_ascii_toupper() where appropriate.

  18. inet_net_pton.c: use pg_ascii_tolower() rather than tolower().

  19. isn.c: use pg_ascii_toupper() instead of toupper().

  20. contrib/spi/refint.c: use pg_ascii_tolower() instead.

  21. copyfromparse.c: use pg_ascii_tolower() rather than tolower().

  22. Revert "Tidy up locale thread safety in ECPG library."

  23. Tidy up locale thread safety in ECPG library.

  24. All supported systems have locale_t.

On 30.10.25 18:17, Jeff Davis wrote:
> On Tue, 2025-10-28 at 17:19 -0700, Jeff Davis wrote:
>> Attached a new patch series, v6.
> 
> I'm eager to start committing this series so that we have plenty of
> time to sort out any problems. I welcome feedback before or after
> commit, and I can revert if necessary.

What is one supposed to do with this statement?  You post a series of 9 
patches and the next day you say you are eager to commit it?  Do you not 
want to give others the time to properly review this?  The patches say 
they are "v6", but AFAICT the previous patches "v5" and "v4" in this 
thread are substantially different from these.

> The goal here is to do a permanent:
> 
>     setlocale(LC_CTYPE, "C")
> 
> in the postmaster, and instead use _l() variants where necessary.
> 
> Forcing the global LC_CTYPE to C will avoid platform-specific nuances
> spread throughout the code, and prevent new code from accidentally
> depending on platform-specific libc behavior. Instead, libc ctype
> behavior will only happen through a pg_locale_t object.
> 
> It also takes us a step closer to thread safety.

At first glance, these patches seem reasonable steps into that direction.

But I'm not sure that we actually want to make that switch.  It would be 
good if our code is independent of the global locale settings, but that 
doesn't mean that there couldn't be code in extensions, other libraries, 
or other corners of the operating system that relies on this.  In 
general, and I haven't looked this up in the applicable standards, it 
seems like a good idea to accurately declare what encoding you operate in.