Thread

  1. Re: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8

    Tatsuo Ishii <ishii@postgresql.org> — 2026-05-11T02:39:09Z

    [Add Cc: to pgsql-hackers]
    
    From: Zhongpu Chen <chenloveit@gmail.com>
    Subject: Re: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8
    Date: Mon, 11 May 2026 09:56:20 +0800
    Message-ID: <CA+1gyqJWpDhOCiM2WrCTffbbTdQ2gWiVzZikiQFkKmTng5Hn_w@mail.gmail.com>
    
    > I see. The settings may be used in a finer way. For example, `set
    > euc-cn-encoding-valiation = 'read_compatible'`.
    
    It will make pg_dumpall not working. Suppose there's a database
     populated with `set euc-cn-encoding-valiation = 'native'.
    
    1. Dump the database cluster using pg_dumpall.
    2. Create a new database cluster using initdb.
    3. Set euc-cn-encoding-valiation = 'read_compatible' in the postgresql.conf.
    4. Restore from the dump --- failure because of disallowed EUC_CN characters.
    
    I think encoding properties (including character validation) should
    belong to encoding itself, rather than GUC parameters. If you want to
    have "strict" EUC_CN and "non-strict" EUC_CN at the same time, I think
    the best way to implement it is, add new EUC_CN variant encoding.
    
    Regards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp