Thread

  1. Re: BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation

    Laurenz Albe <laurenz.albe@cybertec.at> — 2025-12-04T17:14:51Z

    On Wed, 2025-12-03 at 10:12 -0500, Tom Lane wrote:
    > Laurenz Albe <laurenz.albe@cybertec.at> writes:
    > > On Tue, 2025-12-02 at 15:53 -0500, Tom Lane wrote:
    > > > Looking at the code overall, I wonder if the outer loop doesn't have
    > > > the same issue.  The comments claim that we should be able to handle
    > > > zero-length matches, but if the overall haystack is of length zero,
    > > > we will fail to check for such a match.
    > 
    > 
    > After further thought, it seems to me that this comment is an
    > unjustified extrapolation from what Peter actually said, which was
    > that the match substring could be physically shorter than the needle.
    > Which is certainly true, for instance case-folding or accent-stripping
    > might shorten the string.  But it doesn't follow that a nonempty
    > needle could ever match an empty substring; and that does not seem
    > like it could be sane behavior to me.  We're considering string
    > comparison here, not regexes.
    > 
    > We do require callers to eliminate the empty-needle case, and given
    > that I think we could assume that match substrings must be at least
    > 1 byte long.  That assumption is what justifies the current API for
    > these functions, and perhaps we can also simplify this loop by
    > using it.
    
    I think I get it.  I don't see an explicit requirement for a non-empty
    needle, but all callers of text_position_next_internal() handle that
    case separately.
    
    The attached v5 patch simplifies the loop to a do-while loop, assuming
    that we cannot find a zero-length match.
    
    I have also updated the comments to no longer mention the possibility
    of an empty match, and for good measure I have added an Assert() that
    the needle cannot be empty.
    
    Yours,
    Laurenz Albe