Re: Remaining dependency on setlocale()
Jeff Davis <pgsql@j-davis.com>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
fuzzystrmatch: use pg_ascii_toupper().
- b96a9fd76f32 19 (unreleased) landed
-
Avoid global LC_CTYPE dependency in pg_locale_icu.c.
- 0a90df58cf38 19 (unreleased) landed
-
downcase_identifier(): use method table from locale provider.
- 87b2968df0f8 19 (unreleased) landed
-
ltree: fix case-insensitive matching.
- 806555e3000d 18.2 landed
- 7f007e4a044a 19 (unreleased) landed
-
Fix multibyte issue in ltree_strncasecmp().
- 898991966bc9 14.21 landed
- 335b2f30b468 15.16 landed
- b80227c0a54c 16.12 landed
- b8cfe9dc2e7f 17.8 landed
- f79e239e0bc6 18.2 landed
- 84d5efa7e3eb 19 (unreleased) landed
-
Use multibyte-aware extraction of pattern prefixes.
- 9c8de1596912 19 (unreleased) landed
-
Add pg_iswcased().
- 630706ced04e 19 (unreleased) landed
-
Remove char_tolower() API.
- 1e493158d3d2 19 (unreleased) landed
-
Make regex "max_chr" depend on encoding, not provider.
- 19b966243c38 19 (unreleased) landed
-
Change some callers to use pg_ascii_toupper().
- 99cd8890beca 19 (unreleased) landed
-
Allow pg_locale_t APIs to work when ctype_is_c.
- 147602822597 19 (unreleased) landed
-
Add #define for UNICODE_CASEMAP_BUFSZ.
- 8d299052fe58 19 (unreleased) landed
-
Inline pg_ascii_tolower() and pg_ascii_toupper().
- ec4997a9d733 19 (unreleased) landed
-
Avoid global LC_CTYPE dependency in pg_locale_libc.c.
- f81bf78ce12b 19 (unreleased) landed
-
Force LC_COLLATE to C in postmaster.
- 5e6e42e44fe1 19 (unreleased) landed
-
Change wchar2char() and char2wchar() to accept a locale_t.
- 53cd0b71ee2e 19 (unreleased) landed
-
Use pg_ascii_tolower()/pg_ascii_toupper() where appropriate.
- d81dcc8d6243 19 (unreleased) landed
-
inet_net_pton.c: use pg_ascii_tolower() rather than tolower().
- 8898082a5d3e 18.0 landed
-
isn.c: use pg_ascii_toupper() instead of toupper().
- 7a6880fadc17 18.0 landed
-
contrib/spi/refint.c: use pg_ascii_tolower() instead.
- 78bd364ee39c 18.0 landed
-
copyfromparse.c: use pg_ascii_tolower() rather than tolower().
- 4c787a24e7e2 18.0 landed
-
Revert "Tidy up locale thread safety in ECPG library."
- 3c8e463b0d88 18.0 cited
-
Tidy up locale thread safety in ECPG library.
- 8e993bff5326 18.0 cited
-
All supported systems have locale_t.
- 8d9a9f034e92 17.0 cited
Attachments
- v2-0001-Hold-datcollate-datctype-in-global_libc_locale.patch (text/x-patch) patch v2-0001
- v2-0002-fuzzystrmatch-use-global_libc_locale.patch (text/x-patch) patch v2-0002
- v2-0003-ltree-use-global_libc_locale.patch (text/x-patch) patch v2-0003
- v2-0004-Use-global_libc_locale-for-downcase_identifier-an.patch (text/x-patch) patch v2-0004
- v2-0005-Change-wchar2char-and-char2wchar-to-accept-a-loca.patch (text/x-patch) patch v2-0005
- v2-0006-tsearch-use-global_libc_locale.patch (text/x-patch) patch v2-0006
- v2-0007-Force-LC_COLLATE-to-C-in-postmaster.patch (text/x-patch) patch v2-0007
On Tue, 2025-06-10 at 17:32 +0200, Peter Eisentraut wrote: > v1-0001-copyfromparse.c-use-pg_ascii_tolower-rather-than-.patch > v1-0002-contrib-spi-refint.c-use-pg_ascii_tolower-instead.patch > v1-0003-isn.c-use-pg_ascii_toupper-instead-of-toupper.patch > v1-0004-inet_net_pton.c-use-pg_ascii_tolower-rather-than-.patch > > These look good to me. Committed. (That means they're in 18, which was not my intention, but others seemed to think it was harmless enough, so I didn't revert. I will wait for the branch before I commit any more of these.) > v1-0005-Add-global_lc_ctype-to-hold-locale_t-for-datctype.patch > > This looks ok (but might depend on how patch 0006 turns out). I changed this to a global_libc_locale that includes both LC_COLLATE and LC_CTYPE (from datcollate and datctype), in case an extension is relying on strcoll for some reason. > v1-0006-Use-global_lc_ctype-for-callers-of-locale-aware-f.patch > > I think these need further individual analysis and explanation why > these > should use the global lc_ctype setting. This patch series, at least so far, is designed to have zero behavior changes. Anything with a potential for a behavior change should be a separate commit, so that if we need to revert it, we can revert the behavior change without reintroducing a setlocale() dependency. > For example, you could argue > that the SQL-callable soundex(text) function should use the collation > object of its input value, not the global locale. That would be a behavior change. > But furthermore, > soundex_code() could actually just use pg_ascii_toupper() instead. Is that a behavior change? > And > in ts_locale.c, the isalnum_l() call should use mylocale that already > exists in that function. The problem to solve it getting a good > value > into mylocale. Using the global setting confuses the issue a bit, I > think. I reworked it to be less confusing by changing wchar2char/char2wchar to take a locale_t instead of pg_locale_t. Hopefully it's an improvement. In get_iso_localename(), there's a comment saying that it doesn't matter which locale is used (because it's ASCII), but to use the "_l" variants, we need to pick some locale. At that point it's not clear to me that global_libc_locale will be set yet, so I used LC_C_LOCALE. I'm not sure whether we can rely on LC_C_LOCALE being available, but it passed in CI, and if it's not available somewhere it might be a good idea to create it on those platforms anyway. > v1-0007-Fix-the-last-remaining-callers-relying-on-setloca.patch > > Do we have any data what platforms we'd need these checks for? https://cirrus-ci.com/build/5167600088383488 Looks like windows doesn't have iswxdigit_l or isxdigit_l. > Also, if you look into wparser_def.c what p_isxdigit is used for, > it's > used for parsing XML (presumably HTML) files, so we just need ASCII- > only > behavior and no locale dependency. iswxdigit() does seem to be dependent on locale, so this could be a subtle behavior change. > v1-0008-Set-process-LC_COLLATE-C-and-LC_CTYPE-C.patch > > As I mentioned earlier in the thread, I don't think we can do this > for > LC_CTYPE, because otherwise system error messages would not come out > in > the right encoding. Changed it so that it only sets LC_COLLATE to C, and leaves LC_CTYPE set to datctype. Unfortunately, as long as LC_CTYPE is set to a real locale, there's a danger of accidentally depending on that setting. Can the encoding be controlled with LC_MESSAGES instead of LC_CTYPE? Do you have an example of how things can go wrong? > For the LC_COLLATE settings, I think we could just > do the setting in main(), where the other non-database-specific > locale > categories are set. Done. Regards, Jeff Davis