Re: Remaining dependency on setlocale()
Jeff Davis <pgsql@j-davis.com>
From: Jeff Davis <pgsql@j-davis.com>
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Peter Eisentraut <peter@eisentraut.org>, Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgresql.org
Date: 2025-10-29T00:19:50Z
Lists: pgsql-hackers
Attachments
- v6-0001-Avoid-global-LC_CTYPE-dependency-in-pg_locale_lib.patch (text/x-patch)
- v6-0002-Define-char_tolower-char_toupper-for-all-locale-p.patch (text/x-patch)
- v6-0003-Avoid-global-LC_CTYPE-dependency-in-like.c.patch (text/x-patch)
- v6-0004-Avoid-global-LC_CTYPE-dependency-in-scansup.c.patch (text/x-patch)
- v6-0005-Avoid-global-LC_CTYPE-dependency-in-pg_locale_icu.patch (text/x-patch)
- v6-0006-Avoid-global-LC_CTYPE-dependency-in-ltree-crc32.c.patch (text/x-patch)
- v6-0007-Avoid-global-LC_CTYPE-dependency-in-fuzzystrmatch.patch (text/x-patch)
- v6-0008-Don-t-include-ICU-headers-in-pg_locale.h.patch (text/x-patch)
- v6-0009-Avoid-global-LC_CTYPE-dependency-in-strcasecmp.c-.patch (text/x-patch)
On Wed, 2025-07-23 at 19:11 -0700, Jeff Davis wrote: > On Fri, 2025-07-11 at 11:48 +1200, Thomas Munro wrote: > > On Fri, Jul 11, 2025 at 6:22 AM Jeff Davis <pgsql@j-davis.com> > > wrote: > > > I don't have a great windows development environment, and it > > > appears CI > > > and the buildfarm don't offer great coverage either. Can I ask > > > for > > > a > > > volunteer to do the windows side of this work? > > > > Me neither but I'm willing to help with that, and have done lots of > > closely related things through trial-by-CI... Attached a new patch series, v6. Rather than creating new global locale_t objects, this series (along with a separate patch for NLS[1]) removes the dependency on the global LC_CTYPE entirely. It's a bunch of small patches that replace direct calls to tolower()/toupper() with calls into the provider. An assumption of these patches is that, in the UTF-8 encoding, the logic in pg_tolower()/pg_toupper() is equivalent to pg_ascii_tolower()/pg_ascii_toupper(). Generally these preserve existing behavior, but there are a couple differences: * If using the builtin C locale (not C.UTF-8) along with a datctype that's a non-C locale with single-byte encoding, it could affect the results of downcase_identifier(), ltree, and fuzzystrmatch on characters > 127. For ICU, I went to a bit of extra effort to preserve the existing behavior here, because it's more likely to be used for single-byte encodings. * When using ICU or builtin C.UTF-8, along with a datctype of "tr_TR.UTF-8", then it will affect ltree's and fuzzystrmatch's treatment of i/I. If these are a concern we can fix them with some hacks, but those behaviors seem fairly obscure to me. Regards, Jeff Davis [1] https://www.postgresql.org/message-id/90f176c5b85b9da26a3265b2630ece3552068566.camel@j-davis.com