Re: Remaining dependency on setlocale()

Chao Li <li.evan.chao@gmail.com>

From: Chao Li <li.evan.chao@gmail.com>
To: Jeff Davis <pgsql@j-davis.com>
Cc: Peter Eisentraut <peter@eisentraut.org>, Thomas Munro <thomas.munro@gmail.com>, Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgresql.org
Date: 2025-11-25T01:44:51Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. fuzzystrmatch: use pg_ascii_toupper().

  2. Avoid global LC_CTYPE dependency in pg_locale_icu.c.

  3. downcase_identifier(): use method table from locale provider.

  4. ltree: fix case-insensitive matching.

  5. Fix multibyte issue in ltree_strncasecmp().

  6. Use multibyte-aware extraction of pattern prefixes.

  7. Add pg_iswcased().

  8. Remove char_tolower() API.

  9. Make regex "max_chr" depend on encoding, not provider.

  10. Change some callers to use pg_ascii_toupper().

  11. Allow pg_locale_t APIs to work when ctype_is_c.

  12. Add #define for UNICODE_CASEMAP_BUFSZ.

  13. Inline pg_ascii_tolower() and pg_ascii_toupper().

  14. Avoid global LC_CTYPE dependency in pg_locale_libc.c.

  15. Force LC_COLLATE to C in postmaster.

  16. Change wchar2char() and char2wchar() to accept a locale_t.

  17. Use pg_ascii_tolower()/pg_ascii_toupper() where appropriate.

  18. inet_net_pton.c: use pg_ascii_tolower() rather than tolower().

  19. isn.c: use pg_ascii_toupper() instead of toupper().

  20. contrib/spi/refint.c: use pg_ascii_tolower() instead.

  21. copyfromparse.c: use pg_ascii_tolower() rather than tolower().

  22. Revert "Tidy up locale thread safety in ECPG library."

  23. Tidy up locale thread safety in ECPG library.

  24. All supported systems have locale_t.

Hi Jeff,

I have reviewed 0001-0004 and got a few comments:

> On Nov 25, 2025, at 07:57, Jeff Davis <pgsql@j-davis.com> wrote:
> 
> 
> 0001-0004: Pure refactoring patches. I intend to commit a couple of
> these soon.
> 

1 - 0001
```
+/*
+ * Fold a character to upper case, following C/POSIX locale rules.
+ */
+static inline unsigned char
+pg_ascii_toupper(unsigned char ch)
```

I was curious why “inline” is needed, then I figured out when I tried to build. Without “inline”, compile will raise warnings of “unused function”. So I guess it’s better to explain why “inline” is used in the function comment, otherwise other readers may get the same confusion.

2 - 0002
```
+ * three output codepoints. See Unicode 5.18.2, "Change in Length".
```

With “change in length”, I confirmed “Unicode 5.18.2” means the Unicode Standard Section 5.18.2 “Complications for Case Mapping”. Why don’t we just give the URL in the comment. https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-5/#G29675

3 - 0004
```
@@ -1282,10 +1327,18 @@ size_t
 pg_strfold(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 		   pg_locale_t locale)
 {
-	if (locale->ctype->strfold)
-		return locale->ctype->strfold(dst, dstsize, src, srclen, locale);
-	else
-		return locale->ctype->strlower(dst, dstsize, src, srclen, locale);
+	if (locale->ctype == NULL)
+	{
+		int			i;
+
+		srclen = (srclen >= 0) ? srclen : strlen(src);
+		for (i = 0; i < srclen && i < dstsize; i++)
+			dst[i] = pg_ascii_tolower(src[i]);
+		if (i < dstsize)
+			dst[i] = '\0';
+		return srclen;
+	}
+	return locale->ctype->strfold(dst, dstsize, src, srclen, locale);
 }
```

I don’t get this change. In old code, depending on locale->ctype->strfold, it calls strfold or strlower. But in this patch, it only calls strfold. Why? If that’s intentional, maybe better to add a comment to explain that.

4 - 0004

In pg_strfold, the ctype==NULL fallback code is exactly the same as pg_strlower. I guess you intentionally to not call pg_strlower here for performance consideration, is that true?

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/