Re: Remaining dependency on setlocale()

Chao Li <li.evan.chao@gmail.com>

From: Chao Li <li.evan.chao@gmail.com>
To: Jeff Davis <pgsql@j-davis.com>
Cc: Peter Eisentraut <peter@eisentraut.org>, Thomas Munro <thomas.munro@gmail.com>, Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgresql.org
Date: 2025-11-27T01:08:16Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. fuzzystrmatch: use pg_ascii_toupper().

  2. Avoid global LC_CTYPE dependency in pg_locale_icu.c.

  3. downcase_identifier(): use method table from locale provider.

  4. ltree: fix case-insensitive matching.

  5. Fix multibyte issue in ltree_strncasecmp().

  6. Use multibyte-aware extraction of pattern prefixes.

  7. Add pg_iswcased().

  8. Remove char_tolower() API.

  9. Make regex "max_chr" depend on encoding, not provider.

  10. Change some callers to use pg_ascii_toupper().

  11. Allow pg_locale_t APIs to work when ctype_is_c.

  12. Add #define for UNICODE_CASEMAP_BUFSZ.

  13. Inline pg_ascii_tolower() and pg_ascii_toupper().

  14. Avoid global LC_CTYPE dependency in pg_locale_libc.c.

  15. Force LC_COLLATE to C in postmaster.

  16. Change wchar2char() and char2wchar() to accept a locale_t.

  17. Use pg_ascii_tolower()/pg_ascii_toupper() where appropriate.

  18. inet_net_pton.c: use pg_ascii_tolower() rather than tolower().

  19. isn.c: use pg_ascii_toupper() instead of toupper().

  20. contrib/spi/refint.c: use pg_ascii_tolower() instead.

  21. copyfromparse.c: use pg_ascii_tolower() rather than tolower().

  22. Revert "Tidy up locale thread safety in ECPG library."

  23. Tidy up locale thread safety in ECPG library.

  24. All supported systems have locale_t.


> On Nov 26, 2025, at 09:50, Chao Li <li.evan.chao@gmail.com> wrote:
> 
> I will review the rest 3 commits tomorrow.

10 - 0009
```
 {
 	if (isalpha((unsigned char) c))
 	{
-		c = toupper((unsigned char) c);
+		c = pg_ascii_toupper((unsigned char) c);
```

Just curious. As isaplha() and toupper() come from the same header file ctype.h, if we replace toupper with pg_ascii_toupper, does isapha also need to be handled?

11 - 0010
```
-	for (i = 0; i < len; i++)
-	{
-		unsigned char ch = (unsigned char) ident[i];
+	dstsize = len + 1;
+	result = palloc(dstsize);
 
-		if (ch >= 'A' && ch <= 'Z')
-			ch += 'a' - 'A';
-		else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch))
-			ch = tolower(ch);
-		result[i] = (char) ch;
-	}
-	result[i] = '\0';
+	needed = pg_strfold_ident(result, dstsize, ident, len);
+	Assert(needed + 1 == dstsize);
+	Assert(needed == len);
```

I think assert both dstsize and len are redundant, because dstsize=len+1, and no place to change their values.

12 - 0010
```
+/*
+ * Fold an identifier using the database default locale.
+ *
+ * For historical reasons, does not use ordinary locale behavior. Should only
+ * be used for identifier folding. XXX: can we make this equivalent to
+ * pg_strfold(..., default_locale)?
+ */
+size_t
+pg_strfold_ident(char *dest, size_t destsize, const char *src, ssize_t srclen)
+{
+	if (default_locale == NULL || default_locale->ctype == NULL)
+	{
+		int			i;
+
+		for (i = 0; i < srclen && i < destsize; i++)
+		{
+			unsigned char ch = (unsigned char) src[i];
+
+			if (ch >= 'A' && ch <= 'Z')
+				ch += 'a' - 'A';
+			dest[i] = (char) ch;
+		}
+
+		if (i < destsize)
+			dest[i] = '\0';
+
+		return srclen;
+	}
+	return default_locale->ctype->strfold_ident(dest, destsize, src, srclen,
+												default_locale);
+}
```

Given default_local can be NULL only at some specific moment, can we do something like

Local = default_local;
If (local == NULL || local->ctype == NULL)
  Local = libc or other fallback;
Return default_locale->ctype->strfold_ident(dest, destsize, src, srclen, local);

This way avoids the duplicate code.

13 - 0011
```
+{ name => 'lc_collate', type => 'string', context => 'PGC_SUSET', group => 'CLIENT_CONN_LOCALE',
+  short_desc => 'Sets the locale for text ordering in extensions.',
```

I just feel the GUC name is very misleading. Without carefully reading the doc, users may very easy to consider lc_collate the system’s locale. If it only affects extensions, then let’s name it accordingly, for example, “extension_lc_collate”, or “legacy_lc_collate”.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/