Thread

  1. Re: BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation

    Heikki Linnakangas <hlinnaka@iki.fi> — 2025-12-02T16:36:06Z

    On 02/12/2025 18:24, Laurenz Albe wrote:
    > On Tue, 2025-12-02 at 10:03 +0000, PG Bug reporting form wrote:
    >> PostgreSQL version: 18.1
    >>
    >> When using a nondeterministic ICU collation, the replace() function fails to
    >> replace a substring when that substring appears at the end of the input
    >> string.
    >>
    >> Occurrences of the same substring earlier in the string are replaced
    >> normally.
    >>
    >> Specific collation used:
    >> create collation test_nondeterministic (
    >>      provider = icu,
    >>      locale = 'und-u-ks-level2',
    >>      deterministic = false
    >> )
    >>
    >> -- Replace final character under nondeterministic collation
    >> SELECT replace(
    >>      'testx' COLLATE "test_nondeterministic",
    >>      'x'     COLLATE "test_nondeterministic",
    >>      'y') AS res1;
    > 
    > I can reproduce the problem, and the attached patch fixes it for me.
    
    +1, looks good to me. Let's also add a regression test for this.
    
    > I am not certain if it is safe to apply pg_mblen() to "haystack_end", though.
    
    It doesn't do that though, does it? There are two pg_mblen() calls in 
    the vicinity:
    
    > 			for (const char *test_end = hptr; test_end <= haystack_end; test_end += pg_mblen(test_end))
    > 			{
    > 				if (pg_strncoll(hptr, (test_end - hptr), needle, needle_len, state->locale) == 0)
    > 				{
    > 					state->last_match_len_tmp = (test_end - hptr);
    > 					result_hptr = hptr;
    > 					if (!state->greedy)
    > 						break;
    > 				}
    > 			}
    > 			if (result_hptr)
    > 				break;
    > 
    > 			hptr += pg_mblen(hptr);
    
    Neither of those will get called with 'haystack_end' as far as I can see.
    
    - Heikki