Re: Remaining dependency on setlocale()

Jeff Davis <pgsql@j-davis.com>

From: Jeff Davis <pgsql@j-davis.com>
To: Peter Eisentraut <peter@eisentraut.org>, Thomas Munro <thomas.munro@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgresql.org
Date: 2025-06-06T05:15:34Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. fuzzystrmatch: use pg_ascii_toupper().

  2. Avoid global LC_CTYPE dependency in pg_locale_icu.c.

  3. downcase_identifier(): use method table from locale provider.

  4. ltree: fix case-insensitive matching.

  5. Fix multibyte issue in ltree_strncasecmp().

  6. Use multibyte-aware extraction of pattern prefixes.

  7. Add pg_iswcased().

  8. Remove char_tolower() API.

  9. Make regex "max_chr" depend on encoding, not provider.

  10. Change some callers to use pg_ascii_toupper().

  11. Allow pg_locale_t APIs to work when ctype_is_c.

  12. Add #define for UNICODE_CASEMAP_BUFSZ.

  13. Inline pg_ascii_tolower() and pg_ascii_toupper().

  14. Avoid global LC_CTYPE dependency in pg_locale_libc.c.

  15. Force LC_COLLATE to C in postmaster.

  16. Change wchar2char() and char2wchar() to accept a locale_t.

  17. Use pg_ascii_tolower()/pg_ascii_toupper() where appropriate.

  18. inet_net_pton.c: use pg_ascii_tolower() rather than tolower().

  19. isn.c: use pg_ascii_toupper() instead of toupper().

  20. contrib/spi/refint.c: use pg_ascii_tolower() instead.

  21. copyfromparse.c: use pg_ascii_tolower() rather than tolower().

  22. Revert "Tidy up locale thread safety in ECPG library."

  23. Tidy up locale thread safety in ECPG library.

  24. All supported systems have locale_t.

Attachments

On Tue, 2024-12-17 at 13:14 +0100, Peter Eisentraut wrote:
> > > > +1 to this approach. It makes things more consistent across
> > > > platforms
> > > > and avoids surprising dependencies on the global setting.
> > > > 
> > 
> > I think this is the best way, and I haven't seen anyone supporting
> > any
> > other idea.  (I'm working on those setlocale()-removal patches I
> > mentioned, more very soon...)
> 
> I also think this is the right direction, and we'll get closer with
> the 
> remaining patches that Thomas has lined up.
> 
> I think at this point, we could already remove all locale settings 
> related to LC_COLLATE.  Nothing uses that anymore.
> 
> I think we will need to keep the global LC_CTYPE setting set to 
> something useful, for example so that system error messages come out
> in 
> the right encoding.
> 
> But I'm concerned about the the Perl_setlocale() dance in plperl.c. 
> Perl apparently does a setlocale(LC_ALL, "") during startup, and that
> code is a workaround to reset everything back afterwards.  We need to
> be 
> careful not to break that.
> 
> (Perl has fixed that in 5.19, but the fix requires that you set
> another 
> environment variable before launching Perl, which you can't do in a 
> threaded system, so we'd probably need another fix eventually.  See 
> <https://github.com/Perl/perl5/issues/8274>.)

To continue this thread, I did a symbol search in the meson build
directory like (patterns.txt attached):

  for f in `find . -name *.o`; do
    if ( nm --format=just-symbols $f | \
         grep -xE -f /tmp/patterns.txt > /dev/null ); then
      echo $f; fi; done

and it output:

./contrib/fuzzystrmatch/fuzzystrmatch.so.p/dmetaphone.c.o
./contrib/fuzzystrmatch/fuzzystrmatch.so.p/fuzzystrmatch.c.o
./contrib/isn/isn.so.p/isn.c.o
./contrib/spi/refint.so.p/refint.c.o
./contrib/ltree/ltree.so.p/crc32.c.o
./src/backend/postgres_lib.a.p/commands_copyfromparse.c.o
./src/backend/postgres_lib.a.p/utils_adt_pg_locale_libc.c.o
./src/backend/postgres_lib.a.p/tsearch_wparser_def.c.o
./src/backend/postgres_lib.a.p/parser_scansup.c.o
./src/backend/postgres_lib.a.p/utils_adt_inet_net_pton.c.o
./src/backend/postgres_lib.a.p/tsearch_ts_locale.c.o
./src/bin/psql/psql.p/meson-generated_.._tab-complete.c.o
./src/interfaces/ecpg/preproc/ecpg.p/meson-generated_.._preproc.c.o
./src/interfaces/ecpg/compatlib/libecpg_compat.so.3.18.p/informix.c.o
./src/interfaces/ecpg/compatlib/libecpg_compat.a.p/informix.c.o
./src/port/libpgport_srv.a.p/pgstrcasecmp.c.o
./src/port/libpgport_shlib.a.p/pgstrcasecmp.c.o
./src/port/libpgport.a.p/pgstrcasecmp.c.o

Not a short list, but not a long list, either, so seems tractable. Note
that this misses things like isdigit() which is inlined.

A few observations while spot-checking these files:

---------------------
pgstrcasecmp.c - has code like:

       else if (IS_HIGHBIT_SET(ch) && islower(ch))
        ch = toupper(ch);

and comments like "Note however that the whole thing is a bit bogus for
multibyte character sets."

Most of the callers are directly comparing with ascii literals, so I'm
not sure what the point is. There are probably some more interesting
callers hidden in there.
----------------------
pg_locale_libc.c -

char2wchar and wchar2char use mbstowcs and wcstombs when the input
locale is NULL. The main culprit seems to be full text search, which
has a bunch of /* TODO */ comments. Another caller is
get_iso_localename().

There are also a couple false positives where mbstowcs_l/wcstombs_l are
emulated with uselocale() and mbstowcs/wcstombs. In that case, it's not
actually sensitive to the global setting.
-----------------------
copyfromparse.c - the input is ASCII so it can use pg_ascii_tolower()
instead of tolower()
-----------------------

Regards,
	Jeff Davis