Thread
-
BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation
PG Bug reporting form <noreply@postgresql.org> — 2025-12-02T10:03:44Z
The following bug has been logged on the website: Bug reference: 19341 Logged by: Adam Warland Email address: adam.warland@infor.com PostgreSQL version: 18.1 Operating system: Windows 11 Enterprise Description: When using a nondeterministic ICU collation, the replace() function fails to replace a substring when that substring appears at the end of the input string. Occurrences of the same substring earlier in the string are replaced normally. This appears to be unintended and inconsistent with the documented limitations of nondeterministic collations. The failure seems specific to situations where: • a nondeterministic ICU collation is applied to both source and match strings, and • the substring being replaced appears as the final character of the source string. The behavior reproduces reliably. Expected Behavior replace() should replace all occurrences of the match substring, including one at the final position, regardless of collation — or, if nondeterministic collations cannot support this operation, documentation should explicitly state the limitation (as is already done for LIKE and regular expressions). Actual Behavior Under a nondeterministic ICU collation, the final occurrence of the substring is not replaced, even though earlier occurrences are. Example output (actual) replace x with y : res1 | testx -- unchanged, incorrect (final character not replaced) res2 | yabcdx -- first 'x' replaced, final 'x' not replaced Under a deterministic or C collation, output is correct: res_C | testy Environment SELECT version(); → PostgreSQL 18.1 (Debian 18.1-1.pgdg13+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 14.2.0-19) 14.2.0, 64-bit. SHOW server_encoding; → UTF8 SHOW lc_collate; SHOW lc_ctype; → en_US.utf8 en_US.utf8 Specific collation used: create collation test_nondeterministic ( provider = icu, locale = 'und-u-ks-level2', deterministic = false ) Minimal Reproduction Case drop COLLATION if EXISTS test_nondeterministic; create collation test_nondeterministic ( provider = icu, locale = 'und-u-ks-level2', deterministic = false ); -- Replace final character under nondeterministic collation SELECT replace( 'testx' COLLATE "test_nondeterministic", 'x' COLLATE "test_nondeterministic", 'y') AS res1; -- Replace substring appearing twice — final one fails SELECT replace( 'xabcdx' COLLATE "test_nondeterministic", 'x' COLLATE "test_nondeterministic", 'y') AS res2; -- Control test using deterministic collation SELECT replace( 'testx' COLLATE "C", 'x' COLLATE "C", 'y') AS res_C; Observed result: res1 and the final x in res2 are not replaced. Additional Notes • The issue does not occur with deterministic ICU collations or with C collation. • The failure seems tied specifically to the last character position, which may indicate an off-by-one issue or a limitation in substring matching under nondeterministic collation rules. • No documentation currently states that replace() is partially unsupported with nondeterministic collations, although other operations (LIKE, regex) have historically been restricted. Conclusion replace() appears to behave incorrectly when matching a substring at the end of a string under nondeterministic ICU collations. This is either: • a defect in how nondeterministic collations interact with substring matching functions, or an undocumented limitation that should be clarified.