Thread

  1. BUG #19354: JOHAB rejects valid byte sequences

    PG Bug reporting form <noreply@postgresql.org> — 2025-12-13T18:52:36Z

    The following bug has been logged on the website:
    
    Bug reference:      19354
    Logged by:          Jeroen Vermeulen
    Email address:      jtvjtv@gmail.com
    PostgreSQL version: 18.1
    Operating system:   Debian unstable x86-64, macOS, Windows, etc.
    Description:        
    
    Calling libpq, connecting to a UTF8 database and successfully setting client
    encoding to JOHAB, this statement:
    
        PQexec(connection, "SELECT '\x8a\x5c'");
    
    Returned an empty result with this error message:
    
        ERROR:  invalid byte sequence for encoding "JOHAB": 0x8a 0x5c
    
    AFAICT, 0x8a 0x5c is a valid JOHAB sequence making up Hangul character "굎".
    Easily verified in Python:
    
        print(b'\x8a\x5c'.decode('johab'))
    
    It's the same story for some other valid sequences I tried, including this
    character's "neighbours" 0x8a 0x5b and 0x8a 0x5d.
    
    My test code did work with similar two-byte characters in BIG5, GB18030,
    UTF-8, SJIS, and UHC.  It just breaks with these JOHAB characters on all of
    these x86-64 docker images: "archlinux", "debian", "debian:unstable",
    "fedora", and "ubuntu".  And I got the same results on macOS+homebrew,
    Windows+MinGW with pacman-installed postgres, and a native Windows VM with
    whatever-postgres-they-preinstall.