Re: Remaining dependency on setlocale()

Daniel Verite <daniel@manitou-mail.org>

From: "Daniel Verite" <daniel@manitou-mail.org>
To: "Jeff Davis" <pgsql@j-davis.com>
Cc: Thomas Munro <thomas.munro@gmail.com>, Peter Eisentraut <peter@eisentraut.org>, Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgresql.org
Date: 2025-10-31T14:01:39Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. fuzzystrmatch: use pg_ascii_toupper().

  2. Avoid global LC_CTYPE dependency in pg_locale_icu.c.

  3. downcase_identifier(): use method table from locale provider.

  4. ltree: fix case-insensitive matching.

  5. Fix multibyte issue in ltree_strncasecmp().

  6. Use multibyte-aware extraction of pattern prefixes.

  7. Add pg_iswcased().

  8. Remove char_tolower() API.

  9. Make regex "max_chr" depend on encoding, not provider.

  10. Change some callers to use pg_ascii_toupper().

  11. Allow pg_locale_t APIs to work when ctype_is_c.

  12. Add #define for UNICODE_CASEMAP_BUFSZ.

  13. Inline pg_ascii_tolower() and pg_ascii_toupper().

  14. Avoid global LC_CTYPE dependency in pg_locale_libc.c.

  15. Force LC_COLLATE to C in postmaster.

  16. Change wchar2char() and char2wchar() to accept a locale_t.

  17. Use pg_ascii_tolower()/pg_ascii_toupper() where appropriate.

  18. inet_net_pton.c: use pg_ascii_tolower() rather than tolower().

  19. isn.c: use pg_ascii_toupper() instead of toupper().

  20. contrib/spi/refint.c: use pg_ascii_tolower() instead.

  21. copyfromparse.c: use pg_ascii_tolower() rather than tolower().

  22. Revert "Tidy up locale thread safety in ECPG library."

  23. Tidy up locale thread safety in ECPG library.

  24. All supported systems have locale_t.

	Jeff Davis wrote:

> On Thu, 2025-10-30 at 21:41 +0100, Daniel Verite wrote:
> > What about code in extensions? AFAIU a user can control the 
> > locale in effect by setting the LC_CTYPE argument of
> > CREATE DATABASE, which ends up in the environment
> > of backends serving that database.
> > If it's forced to "C", how can an extension use locale-aware
> > libc functions?
> 
> Extensions often need to be updated for a new major version.

I think forcing the C locale is not comparable to API changes,
and the consequences are not even necessarily fixable for extensions.

For instance, consider the following function, when run in a database
with en_US.utf8 as locale.

CREATE FUNCTION lt_test(text,text) RETURNS boolean as $$
 use locale; return ($_[0] lt $_[1])?1:0;
$$ LANGUAGE plperlu;

select lt_test('a', 'B');

With PG 18 it returns true
With 19devel it returns false.

This is since commit 5e6e42e4 doing that:

+	 * Collation is handled by pg_locale.c, and the behavior is dependent
on
+	 * the provider. strcoll(), etc., should not be called directly.
+	 */
+	init_locale("LC_COLLATE", LC_COLLATE, "C");
+
+	/*

Obviously libperl is not going to be updated to call Postgres
string comparisons functions instead of strcoll().
The same is probably true for other languages available as
extensions that expose POSIX locale-aware functions.

Extending this logic to LC_CTYPE will extend the breakage.


While I agree with the goal of not depending on setlocale()
in the core code for anything that should be locale-provider
dependent, making this goal leak into extensions seems
unnecessarily user-hostile. What it's saying to users is,
before v19 you could choose your locale, and starting
with v19 you'll have "C" whether you want it or not.


Best regards,
-- 
Daniel Vérité 
https://postgresql.verite.pro/