Re: Add CASEFOLD() function.
Jeff Davis <pgsql@j-davis.com>
From: Jeff Davis <pgsql@j-davis.com>
To: Peter Eisentraut <peter@eisentraut.org>, Joe Conway
<mail@joeconway.com>, Ian Lawrence Barwick <barwick@gmail.com>
Cc: pgsql-hackers@postgresql.org
Date: 2024-12-19T17:51:32Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Fix PDF doc build.
- d2ca16bb509c 18.0 landed
-
Add SQL function CASEFOLD().
- bfc5992069cf 18.0 landed
-
Add support for Unicode case folding.
- 4e7f62bc386a 18.0 landed
On Thu, 2024-12-19 at 17:18 +0100, Peter Eisentraut wrote: > Can you explain this in further detail? I don't quite follow why > this > would be required. I am unsure now. My initial reasoning was based on the idea that users would want to use CASEFOLD(t) in a unique expression index as an improvement over LOWER(t). And if you do that, you'd be surprised if some equivalent strings ended up in the index. I don't think that's a huge problem, because in other contexts we leave it up to the user to keep things normalized consistently, and a CHECK(t IS NFC NORMALIZED) is a good way to do that. But there's a problem: full case folding doesn't preserve the normal form, so even if the input is NFC normalized, the output might not be. If we solve this problem, then we can just say that CASEFOLD() preserves the normal form, consistently with how the spec defines LOWER()/UPPER(), and I think that would be the best outcome. I'm not sure if that problem is solvable, though, because what if the input string is in both NFC and NFD, how do we know which normal form to preserve? We could tell users to use an expression index on NORMALIZE(CASEFOLD(t)) instead, but that feels like inefficient boilerplate. > > Another might be that's not entirely clear how this should work in > encodings other than UTF-8. For example, the normalized string might > not be representable in the encoding. That's a good point. Regards, Jeff Davis