Re: Add CASEFOLD() function.

Jeff Davis <pgsql@j-davis.com>

From: Jeff Davis <pgsql@j-davis.com>
To: Peter Eisentraut <peter@eisentraut.org>, Joe Conway <mail@joeconway.com>, Ian Lawrence Barwick <barwick@gmail.com>
Cc: pgsql-hackers@postgresql.org
Date: 2025-01-18T00:34:43Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Fix PDF doc build.

  2. Add SQL function CASEFOLD().

  3. Add support for Unicode case folding.

Attachments

On Fri, 2025-01-10 at 16:27 -0800, Jeff Davis wrote:
> New patch series attached.

v5 attached.

This version is rebased over the Full Case Mapping support, and
supports Default Case Folding when using the PG_UNICODE_FAST collation.

That means that "ẞ", "ß", "SS", "Ss", and "ss" all fold to "ss"; and
"Σ", "σ", and "ς" all fold to "σ".

CASEFOLD() is better (according to Unicode, anyway) than LOWER() for
caseless matching, or in an expression index to enforce case-
insensitive uniqueness without relying on ICU.

Additionally, the infrastructure in this patch (as well as 286a365b9c)
can be used in the future for better case-insensitive pattern matching,
or casefolding identifiers in the parser without relying on libc.

I feel this is about ready for commit. The main point of discussion was
whether CASEFOLD() would do normalization, and if so, what the SQL API
would look like. I concluded upthread that it was unnecessary to meet
the Unicode Default Case Folding behavior, and we should just leave
normalization as a separate process. If someone disagrees with
reasoning, please let me know.

Regards,
	Jeff Davis

[1]
https://www.postgresql.org/message-id/610a56de2bd958e96c149ca60420db30e7d51588.camel%40j-davis.com