Re: Remaining dependency on setlocale()

Jeff Davis <pgsql@j-davis.com>

From: Jeff Davis <pgsql@j-davis.com>
To: Peter Eisentraut <peter@eisentraut.org>, Thomas Munro <thomas.munro@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgresql.org
Date: 2025-06-11T19:15:14Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. fuzzystrmatch: use pg_ascii_toupper().

  2. Avoid global LC_CTYPE dependency in pg_locale_icu.c.

  3. downcase_identifier(): use method table from locale provider.

  4. ltree: fix case-insensitive matching.

  5. Fix multibyte issue in ltree_strncasecmp().

  6. Use multibyte-aware extraction of pattern prefixes.

  7. Add pg_iswcased().

  8. Remove char_tolower() API.

  9. Make regex "max_chr" depend on encoding, not provider.

  10. Change some callers to use pg_ascii_toupper().

  11. Allow pg_locale_t APIs to work when ctype_is_c.

  12. Add #define for UNICODE_CASEMAP_BUFSZ.

  13. Inline pg_ascii_tolower() and pg_ascii_toupper().

  14. Avoid global LC_CTYPE dependency in pg_locale_libc.c.

  15. Force LC_COLLATE to C in postmaster.

  16. Change wchar2char() and char2wchar() to accept a locale_t.

  17. Use pg_ascii_tolower()/pg_ascii_toupper() where appropriate.

  18. inet_net_pton.c: use pg_ascii_tolower() rather than tolower().

  19. isn.c: use pg_ascii_toupper() instead of toupper().

  20. contrib/spi/refint.c: use pg_ascii_tolower() instead.

  21. copyfromparse.c: use pg_ascii_tolower() rather than tolower().

  22. Revert "Tidy up locale thread safety in ECPG library."

  23. Tidy up locale thread safety in ECPG library.

  24. All supported systems have locale_t.

Attachments

On Tue, 2025-06-10 at 17:32 +0200, Peter Eisentraut wrote:
> v1-0001-copyfromparse.c-use-pg_ascii_tolower-rather-than-.patch
> v1-0002-contrib-spi-refint.c-use-pg_ascii_tolower-instead.patch
> v1-0003-isn.c-use-pg_ascii_toupper-instead-of-toupper.patch
> v1-0004-inet_net_pton.c-use-pg_ascii_tolower-rather-than-.patch
> 
> These look good to me.

Committed. (That means they're in 18, which was not my intention, but
others seemed to think it was harmless enough, so I didn't revert. I
will wait for the branch before I commit any more of these.)

> v1-0005-Add-global_lc_ctype-to-hold-locale_t-for-datctype.patch
> 
> This looks ok (but might depend on how patch 0006 turns out).

I changed this to a global_libc_locale that includes both LC_COLLATE
and LC_CTYPE (from datcollate and datctype), in case an extension is
relying on strcoll for some reason.

> v1-0006-Use-global_lc_ctype-for-callers-of-locale-aware-f.patch
> 
> I think these need further individual analysis and explanation why
> these 
> should use the global lc_ctype setting.

This patch series, at least so far, is designed to have zero behavior
changes. Anything with a potential for a behavior change should be a
separate commit, so that if we need to revert it, we can revert the
behavior change without reintroducing a setlocale() dependency.

>   For example, you could argue
> that the SQL-callable soundex(text) function should use the collation
> object of its input value, not the global locale.

That would be a behavior change.

>   But furthermore, 
> soundex_code() could actually just use pg_ascii_toupper() instead.  

Is that a behavior change?

> And 
> in ts_locale.c, the isalnum_l() call should use mylocale that already
> exists in that function.  The problem to solve it getting a good
> value 
> into mylocale.  Using the global setting confuses the issue a bit, I
> think.

I reworked it to be less confusing by changing wchar2char/char2wchar to
take a locale_t instead of pg_locale_t. Hopefully it's an improvement.

In get_iso_localename(), there's a comment saying that it doesn't
matter which locale is used (because it's ASCII), but to use the "_l"
variants, we need to pick some locale. At that point it's not clear to
me that global_libc_locale will be set yet, so I used LC_C_LOCALE.

I'm not sure whether we can rely on LC_C_LOCALE being available, but it
passed in CI, and if it's not available somewhere it might be a good
idea to create it on those platforms anyway.

> v1-0007-Fix-the-last-remaining-callers-relying-on-setloca.patch
> 
> Do we have any data what platforms we'd need these checks for?

https://cirrus-ci.com/build/5167600088383488

Looks like windows doesn't have iswxdigit_l or isxdigit_l.

> Also, if you look into wparser_def.c what p_isxdigit is used for,
> it's 
> used for parsing XML (presumably HTML) files, so we just need ASCII-
> only 
> behavior and no locale dependency.

iswxdigit() does seem to be dependent on locale, so this could be a
subtle behavior change.

> v1-0008-Set-process-LC_COLLATE-C-and-LC_CTYPE-C.patch
> 
> As I mentioned earlier in the thread, I don't think we can do this
> for 
> LC_CTYPE, because otherwise system error messages would not come out
> in 
> the right encoding.

Changed it so that it only sets LC_COLLATE to C, and leaves LC_CTYPE
set to datctype.

Unfortunately, as long as LC_CTYPE is set to a real locale, there's a
danger of accidentally depending on that setting. Can the encoding be
controlled with LC_MESSAGES instead of LC_CTYPE?

Do you have an example of how things can go wrong?

>   For the LC_COLLATE settings, I think we could just 
> do the setting in main(), where the other non-database-specific
> locale 
> categories are set.

Done.

Regards,
	Jeff Davis