Re: BUG #19354: JOHAB rejects valid byte sequences
Michael Paquier <michael@paquier.xyz>
From: Michael Paquier <michael@paquier.xyz>
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Robert Haas <robertmhaas@gmail.com>, Jeroen Vermeulen <jtvjtv@gmail.com>, VASUKI M <vasukianand0119@gmail.com>, pgsql-bugs@lists.postgresql.org
Date: 2025-12-17T02:59:17Z
Lists: pgsql-bugs
On Tue, Dec 16, 2025 at 10:41:46AM -0500, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I'm >> left with the conclusions that (1) nobody ever actually tried using >> this encoding for anything real until 3 days ago and (2) we don't have >> any testing infrastructure that verifies that the characters in the >> mapping tables are actually accepted by pg_verifymbstr(). I wonder how >> many other encodings we have that don't actually work? > > Indeed. Anyone want to do some testing? FWIW, I have been made aware a couple of weeks ago by a colleague that SJIS and SHIFT_JIS_2004 are used by some customers, and that we are many years behind an update of the conversion mappings in the tree with Postgres not understanding some of the characters. These are two marginal in the mostly-UTF8 world we live in these days, but it's annoying for byte sequences that should not change across the years, just be refreshed with new data. -- Michael