Re: BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation

Laurenz Albe <laurenz.albe@cybertec.at>

From: Laurenz Albe <laurenz.albe@cybertec.at>
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Heikki Linnakangas <hlinnaka@iki.fi>, adam.warland@infor.com, pgsql-bugs@lists.postgresql.org
Date: 2025-12-03T07:51:22Z
Lists: pgsql-bugs

Attachments

On Tue, 2025-12-02 at 15:53 -0500, Tom Lane wrote:
> > The attached patch v3 turns it into a while loop to avoid
> > the problem.
> 
> Looking at the code overall, I wonder if the outer loop doesn't have
> the same issue.  The comments claim that we should be able to handle
> zero-length matches, but if the overall haystack is of length zero,
> we will fail to check for such a match.

If you can find zero-length matches at all, you could find a
zero-length match in a non-empty haystack.  Perhaps the function is
never called with an empty haystack...

> Also, since we have haystack <= haystack_end as a starting condition,
> I think both loops could omit the initial test.  I'd be inclined
> to code them like
> 
> 	test_ptr = start point;
> 	for (;;)
> 	{
> 		...
> 		if (test_ptr >= haystack_end)
> 			break;
> 		test_ptr += pg_mblen(test_ptr);
> 	}

True.  The attached v4 patch does it like that.

> On the other hand ... is that comment really right about zero-length
> match being possible?  If it is, the API for this function is in
> need of redesign, because callers that try to find "the next match"
> would go into an infinite loop re-finding the same zero-length
> match over and over.

Right.  I'll see if I can trigger such a case.

Yours,
Laurenz Albe