Re: Remaining dependency on setlocale()
Thomas Munro <thomas.munro@gmail.com>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
fuzzystrmatch: use pg_ascii_toupper().
- b96a9fd76f32 19 (unreleased) landed
-
Avoid global LC_CTYPE dependency in pg_locale_icu.c.
- 0a90df58cf38 19 (unreleased) landed
-
downcase_identifier(): use method table from locale provider.
- 87b2968df0f8 19 (unreleased) landed
-
ltree: fix case-insensitive matching.
- 806555e3000d 18.2 landed
- 7f007e4a044a 19 (unreleased) landed
-
Fix multibyte issue in ltree_strncasecmp().
- 898991966bc9 14.21 landed
- 335b2f30b468 15.16 landed
- b80227c0a54c 16.12 landed
- b8cfe9dc2e7f 17.8 landed
- f79e239e0bc6 18.2 landed
- 84d5efa7e3eb 19 (unreleased) landed
-
Use multibyte-aware extraction of pattern prefixes.
- 9c8de1596912 19 (unreleased) landed
-
Add pg_iswcased().
- 630706ced04e 19 (unreleased) landed
-
Remove char_tolower() API.
- 1e493158d3d2 19 (unreleased) landed
-
Make regex "max_chr" depend on encoding, not provider.
- 19b966243c38 19 (unreleased) landed
-
Change some callers to use pg_ascii_toupper().
- 99cd8890beca 19 (unreleased) landed
-
Allow pg_locale_t APIs to work when ctype_is_c.
- 147602822597 19 (unreleased) landed
-
Add #define for UNICODE_CASEMAP_BUFSZ.
- 8d299052fe58 19 (unreleased) landed
-
Inline pg_ascii_tolower() and pg_ascii_toupper().
- ec4997a9d733 19 (unreleased) landed
-
Avoid global LC_CTYPE dependency in pg_locale_libc.c.
- f81bf78ce12b 19 (unreleased) landed
-
Force LC_COLLATE to C in postmaster.
- 5e6e42e44fe1 19 (unreleased) landed
-
Change wchar2char() and char2wchar() to accept a locale_t.
- 53cd0b71ee2e 19 (unreleased) landed
-
Use pg_ascii_tolower()/pg_ascii_toupper() where appropriate.
- d81dcc8d6243 19 (unreleased) landed
-
inet_net_pton.c: use pg_ascii_tolower() rather than tolower().
- 8898082a5d3e 18.0 landed
-
isn.c: use pg_ascii_toupper() instead of toupper().
- 7a6880fadc17 18.0 landed
-
contrib/spi/refint.c: use pg_ascii_tolower() instead.
- 78bd364ee39c 18.0 landed
-
copyfromparse.c: use pg_ascii_tolower() rather than tolower().
- 4c787a24e7e2 18.0 landed
-
Revert "Tidy up locale thread safety in ECPG library."
- 3c8e463b0d88 18.0 cited
-
Tidy up locale thread safety in ECPG library.
- 8e993bff5326 18.0 cited
-
All supported systems have locale_t.
- 8d9a9f034e92 17.0 cited
On Wed, Aug 14, 2024 at 1:05 PM Jeff Davis <pgsql@j-davis.com> wrote: > The only alternative is to essentially ban the use of non-_l variants, > which is fine I suppose, but causes a fair amount of code churn. Let's zoom out a bit and consider some ways we could set up the process, threads and individual calls: 1. The process global locale is always "C". If you ever call uselocale(), it can only be for short stretches, and you have to restore it straight after; perhaps it is only ever used in replacement _l() functions for systems that lack them. You need to use _l() functions for all non-"C" locales. The current database default needs to be available as a variable (in future: thread-local variable, or reachable from one), so you can use it in _l() functions. The "C" locale can be accessed implicitly with non-l() functions, or you could ban those to reduce confusion and use foo_l(..., LC_GLOBAL_LOCALE) for "C". Or a name like PG_C_LOCALE, which, in backend code could be just LC_GLOBAL_LOCALE, while in frontend/library code it could be the singleton mechanism I showed in CF#5166. XXX Note that nailing LC_ALL to "C" in the backend would extend our existing policy for LC_NUMERIC to all categories. That's why we can use strtod() in the backend and expect the radix character to be '.'. It's interesting to contemplate the strtod() calls in CF#5166: they are code copied-and-pasted from backend and frontend; in the backend we can use strtod() currently but in the frontend code I showed a change to strtod_l(..., PG_C_LOCALE), in order to be able to delete some ugly deeply nested uselocale()/setlocale() stuff of the exact sort that NetBSD hackers (and I) hate. It's obviously a bit of a code smell that it's copied-and-pasted in the first place, and should really share code. Supposing some of that stuff finishes up in src/common, then I think you'd want a strtod_l(..., PG_C_LOCALE) that could be allowed to take advantage of the knowledge that the global locale is "C" in the backend. Just thoughts... 2. The process global locale is always "C". Each backend process (in future: thread) calls uselocale() to set the thread-local locale to the database default, so it can keep using the non-_l() functions as a way to access the database default, and otherwise uses _l() functions if it wants something else (as we do already). The "C" locale can be accessed with foo_l(..., LC_GLOBAL_LOCALE) or PG_C_LOCALE etc. XXX This option is blocked by NetBSD's rejection of uselocale(). I guess if you really wanted to work around NetBSD's policy you could make your own wrapper for all affected functions, translating foo() to foo_l(..., pg_thread_current_locale), so you could write uselocale(), which is pretty much what every other libc does... But eughhh 3. The process global locale is inherited from the system or can be set by the user however they want for the benefit of extensions, but we never change it after startup or refer to it. Then we do the same as 1 or 2, except if we ever want "C" we'll need a locale_t for that, again perhaps using the PC_C_LOCALE mechanism. Non-_l() functions are effectively useless except in cases where you really want to use the system's settings inherited from startup, eg for messages, so they'd mostly be banned. What else? > > They're right that we really just want to use "C" in some places, and > > their LC_C_LOCALE is a very useful system-provided value to be able > > to > > pass into _l functions. It's a shame it's non-standard, because > > without it you have to allocate a locale_t for "C" and keep it > > somewhere to feed to _l functions... > > If we're going to do that, why not just have ascii-only variants of our > own? pg_ascii_isspace(...) is at least as readable as isspace_l(..., > LC_C_LOCALE). Yeah, I agree there are some easy things we should do that way. In fact we've already established that scanner_isspace() needs to be used in lots more places for that, even aside from thread-safety.