Re: BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation

Heikki Linnakangas <hlinnaka@iki.fi>

From: Heikki Linnakangas <hlinnaka@iki.fi>
To: Laurenz Albe <laurenz.albe@cybertec.at>, adam.warland@infor.com, pgsql-bugs@lists.postgresql.org
Date: 2025-12-02T16:36:06Z
Lists: pgsql-bugs
On 02/12/2025 18:24, Laurenz Albe wrote:
> On Tue, 2025-12-02 at 10:03 +0000, PG Bug reporting form wrote:
>> PostgreSQL version: 18.1
>>
>> When using a nondeterministic ICU collation, the replace() function fails to
>> replace a substring when that substring appears at the end of the input
>> string.
>>
>> Occurrences of the same substring earlier in the string are replaced
>> normally.
>>
>> Specific collation used:
>> create collation test_nondeterministic (
>>      provider = icu,
>>      locale = 'und-u-ks-level2',
>>      deterministic = false
>> )
>>
>> -- Replace final character under nondeterministic collation
>> SELECT replace(
>>      'testx' COLLATE "test_nondeterministic",
>>      'x'     COLLATE "test_nondeterministic",
>>      'y') AS res1;
> 
> I can reproduce the problem, and the attached patch fixes it for me.

+1, looks good to me. Let's also add a regression test for this.

> I am not certain if it is safe to apply pg_mblen() to "haystack_end", though.

It doesn't do that though, does it? There are two pg_mblen() calls in 
the vicinity:

> 			for (const char *test_end = hptr; test_end <= haystack_end; test_end += pg_mblen(test_end))
> 			{
> 				if (pg_strncoll(hptr, (test_end - hptr), needle, needle_len, state->locale) == 0)
> 				{
> 					state->last_match_len_tmp = (test_end - hptr);
> 					result_hptr = hptr;
> 					if (!state->greedy)
> 						break;
> 				}
> 			}
> 			if (result_hptr)
> 				break;
> 
> 			hptr += pg_mblen(hptr);

Neither of those will get called with 'haystack_end' as far as I can see.

- Heikki