Re: Add CASEFOLD() function.

Jeff Davis <pgsql@j-davis.com>

From: Jeff Davis <pgsql@j-davis.com>
To: Thom Brown <thom@linux.com>, Peter Eisentraut <peter@eisentraut.org>
Cc: Vik Fearing <vik@postgresfriends.org>, Joe Conway <mail@joeconway.com>, Ian Lawrence Barwick <barwick@gmail.com>, PostgreSQL-development <pgsql-hackers@postgresql.org>
Date: 2025-06-19T16:33:41Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Fix PDF doc build.

  2. Add SQL function CASEFOLD().

  3. Add support for Unicode case folding.

On Thu, 2025-06-19 at 16:36 +0100, Thom Brown wrote:
> Ease of use, perhaps. It seems easier to use:
> 
> column_name cftext
> 
> rather than:
> 
> CREATE COLLATION case_insensitive_collation (
>     PROVIDER = icu,
>     LOCALE = 'und-u-ks-level2',
>     DETERMINISTIC = FALSE
> );

We could auto-create such a collation at initdb time for ICU-enabled
builds.

> But I see the arguments against it. It creates an unnecessary
> dependency on an extension, and if someone wants to ignore both case
> and accents, they may resort to using 2 extensions (citext +
> unaccent)
> when none are needed.

There are at least three ways to do case insensitivity (or other kinds
of equivalence):

* Explicit function calls in queries, as well as index and constraint
definitions. E.g. expression index on LOWER(), queries that explicitly
do "LOWER(x) = ..."

* Wrap those function calls up in a separate data type, like citext.

* Non-deterministic collations.

Given that we have collations, which are a way of organizing alternate
behaviors for existing data types, I'm not sure I see the need for
creating an entirely separate data type.

> I guess I don't feel strongly about it either
> way.

Are you a user of citext? I'm genuinely interested in the use cases,
and whether the separate-data-type approach has merits that are missing
in the other approaches.

Regards,
	Jeff Davis