Re: BUG #19354: JOHAB rejects valid byte sequences

Michael Paquier <michael@paquier.xyz>

From: Michael Paquier <michael@paquier.xyz>

To: Tom Lane <tgl@sss.pgh.pa.us>

Cc: Robert Haas <robertmhaas@gmail.com>, Jeroen Vermeulen <jtvjtv@gmail.com>, VASUKI M <vasukianand0119@gmail.com>, pgsql-bugs@lists.postgresql.org

Date: 2025-12-17T02:59:17Z

Lists: pgsql-bugs

On Tue, Dec 16, 2025 at 10:41:46AM -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I'm
>> left with the conclusions that (1) nobody ever actually tried using
>> this encoding for anything real until 3 days ago and (2) we don't have
>> any testing infrastructure that verifies that the characters in the
>> mapping tables are actually accepted by pg_verifymbstr(). I wonder how
>> many other encodings we have that don't actually work?
> 
> Indeed.  Anyone want to do some testing?

FWIW, I have been made aware a couple of weeks ago by a colleague that
SJIS and SHIFT_JIS_2004 are used by some customers, and that we are
many years behind an update of the conversion mappings in the tree
with Postgres not understanding some of the characters.  These are two
marginal in the mostly-UTF8 world we live in these days, but it's
annoying for byte sequences that should not change across the years,
just be refreshed with new data.
--
Michael