Re: Add CASEFOLD() function.
Vik Fearing <vik@postgresfriends.org>
From: Vik Fearing <vik@postgresfriends.org>
To: Jeff Davis <pgsql@j-davis.com>, Joe Conway <mail@joeconway.com>,
Ian Lawrence Barwick <barwick@gmail.com>
Cc: pgsql-hackers@postgresql.org, Peter Eisentraut <peter@eisentraut.org>
Date: 2025-06-18T17:09:04Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Fix PDF doc build.
- d2ca16bb509c 18.0 landed
-
Add SQL function CASEFOLD().
- bfc5992069cf 18.0 landed
-
Add support for Unicode case folding.
- 4e7f62bc386a 18.0 landed
On 17/06/2025 20:14, Jeff Davis wrote: > On Tue, 2025-06-17 at 17:37 +0200, Vik Fearing wrote: >> If the character set of <character factor> is UTF8, UTF16, or UTF32, >> then FR is replaced by >> Case: >> i) If the <search condition> S IS NORMALIZED evaluates to >> True, then NORMALIZE (FR) >> ii) Otherwise, FR. > I read that as "if the input is normalized, then the output should be > normalized", IOW preserve the normalization. But does it mean "preserve > whatever the input normal form is" or "preserve NFC if the input is > NFC, otherwise the normalization is undefined"? > > The above wording seems to mean "preserve NFC if the input is NFC", > because that's what NORMALIZE(FR) does when the normal form is > unspecified. Yes, and that is also the default for <normalized predicate>. >> It does not appear to me that our LOWER and UPPER functions obey this >> rule, > You are correct: > > WITH s(t) AS > (SELECT NORMALIZE(U&'\00C1\00DF\0301' COLLATE "en-US-x-icu")) > SELECT UPPER(t) = NORMALIZE(UPPER(t)) FROM s; > ?column? > ---------- > f > >> so there is a valid argument that we should continue to ignore it. >> Or, we can say that we have at least one of three compliant. > What do other databases do? I don't know. I am just pointing out what the Standard says. I think we should either comply, or say that we don't do it for LOWER and UPPER so let's keep things implementation-consistent. > Given how costly normalization can be, imposing that on every caller > seems like a bit much. How much does it cost to check for NFC? I honestly don't know the answer to that question, but that is the only case where we need to maintain normalization. > And favoring NFC for the user unconditionally > might not be the best thing. Then again, NFC is good most of the time, > and there are patches to speed up normalization. It's not unconditionally, it's only if the input was NFC. > I tend to think that a lot of users who want casefolding would also > want normalization, but it's hard to weigh that against the performance > cost. It might not matter outside of a few edge cases, though I'm not > sure exactly how many. I defer to you and others in the thread to make this decision. -- Vik Fearing