Re: Remaining dependency on setlocale()

Andreas Karlsson <andreas@proxel.se>

From: Andreas Karlsson <andreas@proxel.se>

To: Jeff Davis <pgsql@j-davis.com>, Joe Conway <mail@joeconway.com>, Thomas Munro <thomas.munro@gmail.com>, Tom Lane <tgl@sss.pgh.pa.us>

Cc: pgsql-hackers@postgresql.org

Date: 2024-08-28T16:26:04Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

fuzzystrmatch: use pg_ascii_toupper().
- b96a9fd76f32 19 (unreleased) landed
Avoid global LC_CTYPE dependency in pg_locale_icu.c.
- 0a90df58cf38 19 (unreleased) landed
downcase_identifier(): use method table from locale provider.
- 87b2968df0f8 19 (unreleased) landed
ltree: fix case-insensitive matching.
- 806555e3000d 18.2 landed
- 7f007e4a044a 19 (unreleased) landed
Fix multibyte issue in ltree_strncasecmp().
- 898991966bc9 14.21 landed
- 335b2f30b468 15.16 landed
- b80227c0a54c 16.12 landed
- b8cfe9dc2e7f 17.8 landed
- f79e239e0bc6 18.2 landed
- 84d5efa7e3eb 19 (unreleased) landed
Use multibyte-aware extraction of pattern prefixes.
- 9c8de1596912 19 (unreleased) landed
Add pg_iswcased().
- 630706ced04e 19 (unreleased) landed
Remove char_tolower() API.
- 1e493158d3d2 19 (unreleased) landed
Make regex "max_chr" depend on encoding, not provider.
- 19b966243c38 19 (unreleased) landed
Change some callers to use pg_ascii_toupper().
- 99cd8890beca 19 (unreleased) landed
Allow pg_locale_t APIs to work when ctype_is_c.
- 147602822597 19 (unreleased) landed
Add #define for UNICODE_CASEMAP_BUFSZ.
- 8d299052fe58 19 (unreleased) landed
Inline pg_ascii_tolower() and pg_ascii_toupper().
- ec4997a9d733 19 (unreleased) landed
Avoid global LC_CTYPE dependency in pg_locale_libc.c.
- f81bf78ce12b 19 (unreleased) landed
Force LC_COLLATE to C in postmaster.
- 5e6e42e44fe1 19 (unreleased) landed
Change wchar2char() and char2wchar() to accept a locale_t.
- 53cd0b71ee2e 19 (unreleased) landed
Use pg_ascii_tolower()/pg_ascii_toupper() where appropriate.
- d81dcc8d6243 19 (unreleased) landed
inet_net_pton.c: use pg_ascii_tolower() rather than tolower().
- 8898082a5d3e 18.0 landed
isn.c: use pg_ascii_toupper() instead of toupper().
- 7a6880fadc17 18.0 landed
contrib/spi/refint.c: use pg_ascii_tolower() instead.
- 78bd364ee39c 18.0 landed
copyfromparse.c: use pg_ascii_tolower() rather than tolower().
- 4c787a24e7e2 18.0 landed
Revert "Tidy up locale thread safety in ECPG library."
- 3c8e463b0d88 18.0 cited
Tidy up locale thread safety in ECPG library.
- 8e993bff5326 18.0 cited
All supported systems have locale_t.
- 8d9a9f034e92 17.0 cited

On 8/9/24 8:24 PM, Jeff Davis wrote:
> On Fri, 2024-08-09 at 13:41 +0200, Andreas Karlsson wrote:
>> I am leaning towards that we should write our own pure ascii
>> functions
>> for this.
> 
> That makes sense for a lot of call sites, but it could cause breakage
> if we aren't careful.
> 
>>   Since we do not support any non-ascii compatible encodings
>> anyway I do not see the point in having locale support in most of
>> these
>> call-sites.
> 
> An ascii-compatible encoding just means that the code points in the
> ascii range are represented as ascii. I'm not clear on whether code
> points in the ascii range can return different results for things like
> isspace(), but it sounds plausible -- toupper() can return different
> results for 'i' in tr_TR.
> 
> Also, what about the values outside 128-255, which are still valid
> input to isspace()?

My idea was that in a lot of those cases we only try to parse e.g. 0-9 
as digits and always only . as the decimal separator so we should make 
just make that obvious by either using locale C or writing our own ascii 
only functions. These strings are meant to be read by machines, not 
humans, primarily.

Andreas