Thread

  1. BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation

    PG Bug reporting form <noreply@postgresql.org> — 2025-12-02T10:03:44Z

    The following bug has been logged on the website:
    
    Bug reference:      19341
    Logged by:          Adam Warland
    Email address:      adam.warland@infor.com
    PostgreSQL version: 18.1
    Operating system:   Windows 11 Enterprise
    Description:        
    
    When using a nondeterministic ICU collation, the replace() function fails to
    replace a substring when that substring appears at the end of the input
    string.
    
    Occurrences of the same substring earlier in the string are replaced
    normally.
    
    This appears to be unintended and inconsistent with the documented
    limitations of nondeterministic collations. The failure seems specific to
    situations where:
            • a nondeterministic ICU collation is applied to both source and
    match strings, and
            • the substring being replaced appears as the final character of the
    source string.
    
    The behavior reproduces reliably.
    
    Expected Behavior
    replace() should replace all occurrences of the match substring, including
    one at the final position, regardless of collation — or, if nondeterministic
    collations cannot support this operation, documentation should explicitly
    state the limitation (as is already done for LIKE and regular expressions).
    
    Actual Behavior
    Under a nondeterministic ICU collation, the final occurrence of the
    substring is not replaced, even though earlier occurrences are.
    Example output (actual) replace x with y :
    res1  | testx     -- unchanged, incorrect (final character not replaced)
    res2  | yabcdx    -- first 'x' replaced, final 'x' not replaced
    Under a deterministic or C collation, output is correct:
    res_C | testy
    
    Environment
    SELECT version();
    → PostgreSQL 18.1 (Debian 18.1-1.pgdg13+2) on x86_64-pc-linux-gnu, compiled
    by gcc (Debian 14.2.0-19) 14.2.0, 64-bit.
    SHOW server_encoding;
    → UTF8
    SHOW lc_collate;
    SHOW lc_ctype;
    → en_US.utf8
    en_US.utf8
    
    Specific collation used:
    create collation test_nondeterministic (
        provider = icu,
        locale = 'und-u-ks-level2',
        deterministic = false
    )
    
    
    Minimal Reproduction Case
    
    drop COLLATION if EXISTS test_nondeterministic;
    create collation test_nondeterministic (
        provider = icu,
        locale = 'und-u-ks-level2',
        deterministic = false
    );
    -- Replace final character under nondeterministic collation
    SELECT replace(
        'testx' COLLATE "test_nondeterministic",
        'x'     COLLATE "test_nondeterministic",
        'y') AS res1;
    -- Replace substring appearing twice — final one fails
    SELECT replace(
        'xabcdx' COLLATE "test_nondeterministic",
        'x'      COLLATE "test_nondeterministic",
        'y') AS res2;
    -- Control test using deterministic collation
    SELECT replace(
        'testx' COLLATE "C",
        'x'     COLLATE "C",
        'y') AS res_C;
    
    Observed result:
    res1 and the final x in res2 are not replaced.
    
    Additional Notes
            • The issue does not occur with deterministic ICU collations or with
    C collation.
            • The failure seems tied specifically to the last character
    position, which may indicate an off-by-one issue or a limitation in
    substring matching under nondeterministic collation rules.
            • No documentation currently states that replace() is partially
    unsupported with nondeterministic collations, although other operations
    (LIKE, regex) have historically been restricted.
    
    Conclusion
    replace() appears to behave incorrectly when matching a substring at the end
    of a string under nondeterministic ICU collations. This is either:
            • a defect in how nondeterministic collations interact with
    substring matching functions, or
    an undocumented limitation that should be clarified.