Re: Add CASEFOLD() function.
Peter Eisentraut <peter@eisentraut.org>
From: Peter Eisentraut <peter@eisentraut.org>
To: Jeff Davis <pgsql@j-davis.com>, Joe Conway <mail@joeconway.com>,
Ian Lawrence Barwick <barwick@gmail.com>
Cc: pgsql-hackers@postgresql.org
Date: 2024-12-19T16:18:31Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Fix PDF doc build.
- d2ca16bb509c 18.0 landed
-
Add SQL function CASEFOLD().
- bfc5992069cf 18.0 landed
-
Add support for Unicode case folding.
- 4e7f62bc386a 18.0 landed
On 16.12.24 18:49, Jeff Davis wrote:
> One question I have is whether we want this function to normalize the
> output.
>
> I believe most usecases would want the output normalized, because
> normalization differences (e.g. "a" U+0061 followed by "combining
> acute" U+0301 vs "a with acute" U+00E1) are more minor than differences
> in case.
Can you explain this in further detail? I don't quite follow why this
would be required.
> Of course, a user could wrap it with the normalize() function, but
> that's verbose and easy to forget. I'm also not sure that it can be
> made as fast as a combined function that does both.
>
> And a follow-up question: if it does normalize, the second parameter
> would be the requested normal form. But to accept the keyword forms
> (NFC, NFD in gram.y) rather than the string forms ('NFC', 'NFD') then
> we'd need to also need to add CASEFOLD to gram.y (like NORMALIZE). Is
> that a reasonable thing to do?
That's maybe one reason to keep it separate.
Another might be that's not entirely clear how this should work in
encodings other than UTF-8. For example, the normalized string might
not be representable in the encoding.