Re: Remaining dependency on setlocale()

Jeff Davis <pgsql@j-davis.com>

From: Jeff Davis <pgsql@j-davis.com>
To: Andreas Karlsson <andreas@proxel.se>, Joe Conway <mail@joeconway.com>, Thomas Munro <thomas.munro@gmail.com>, Tom Lane <tgl@sss.pgh.pa.us>
Cc: pgsql-hackers@postgresql.org
Date: 2024-08-09T18:24:03Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. fuzzystrmatch: use pg_ascii_toupper().

  2. Avoid global LC_CTYPE dependency in pg_locale_icu.c.

  3. downcase_identifier(): use method table from locale provider.

  4. ltree: fix case-insensitive matching.

  5. Fix multibyte issue in ltree_strncasecmp().

  6. Use multibyte-aware extraction of pattern prefixes.

  7. Add pg_iswcased().

  8. Remove char_tolower() API.

  9. Make regex "max_chr" depend on encoding, not provider.

  10. Change some callers to use pg_ascii_toupper().

  11. Allow pg_locale_t APIs to work when ctype_is_c.

  12. Add #define for UNICODE_CASEMAP_BUFSZ.

  13. Inline pg_ascii_tolower() and pg_ascii_toupper().

  14. Avoid global LC_CTYPE dependency in pg_locale_libc.c.

  15. Force LC_COLLATE to C in postmaster.

  16. Change wchar2char() and char2wchar() to accept a locale_t.

  17. Use pg_ascii_tolower()/pg_ascii_toupper() where appropriate.

  18. inet_net_pton.c: use pg_ascii_tolower() rather than tolower().

  19. isn.c: use pg_ascii_toupper() instead of toupper().

  20. contrib/spi/refint.c: use pg_ascii_tolower() instead.

  21. copyfromparse.c: use pg_ascii_tolower() rather than tolower().

  22. Revert "Tidy up locale thread safety in ECPG library."

  23. Tidy up locale thread safety in ECPG library.

  24. All supported systems have locale_t.

On Fri, 2024-08-09 at 13:41 +0200, Andreas Karlsson wrote:
> I am leaning towards that we should write our own pure ascii
> functions 
> for this.

That makes sense for a lot of call sites, but it could cause breakage
if we aren't careful.

>  Since we do not support any non-ascii compatible encodings 
> anyway I do not see the point in having locale support in most of
> these 
> call-sites.

An ascii-compatible encoding just means that the code points in the
ascii range are represented as ascii. I'm not clear on whether code
points in the ascii range can return different results for things like
isspace(), but it sounds plausible -- toupper() can return different
results for 'i' in tr_TR.

Also, what about the values outside 128-255, which are still valid
input to isspace()?

Regards,
	Jeff Davis