Thread
-
BUG #19354: JOHAB rejects valid byte sequences
PG Bug reporting form <noreply@postgresql.org> — 2025-12-13T18:52:36Z
The following bug has been logged on the website: Bug reference: 19354 Logged by: Jeroen Vermeulen Email address: jtvjtv@gmail.com PostgreSQL version: 18.1 Operating system: Debian unstable x86-64, macOS, Windows, etc. Description: Calling libpq, connecting to a UTF8 database and successfully setting client encoding to JOHAB, this statement: PQexec(connection, "SELECT '\x8a\x5c'"); Returned an empty result with this error message: ERROR: invalid byte sequence for encoding "JOHAB": 0x8a 0x5c AFAICT, 0x8a 0x5c is a valid JOHAB sequence making up Hangul character "굎". Easily verified in Python: print(b'\x8a\x5c'.decode('johab')) It's the same story for some other valid sequences I tried, including this character's "neighbours" 0x8a 0x5b and 0x8a 0x5d. My test code did work with similar two-byte characters in BIG5, GB18030, UTF-8, SJIS, and UHC. It just breaks with these JOHAB characters on all of these x86-64 docker images: "archlinux", "debian", "debian:unstable", "fedora", and "ubuntu". And I got the same results on macOS+homebrew, Windows+MinGW with pacman-installed postgres, and a native Windows VM with whatever-postgres-they-preinstall.