Re: GB18030-2022 Support in PostgreSQL

Chao Li <li.evan.chao@gmail.com>

From: Chao Li <li.evan.chao@gmail.com>
To: John Naylor <johncnaylorls@gmail.com>
Cc: Peter Eisentraut <peter@eisentraut.org>, pgsql-hackers@lists.postgresql.org, Tom Lane <tgl@sss.pgh.pa.us>, Andrew Dunstan <andrew@dunslane.net>
Date: 2025-09-29T10:36:27Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Generate EUC_CN mappings from gb18030-2022.ucm

  2. Update GB18030 encoding from version 2000 to 2022

  3. Generate GB18030 mappings from the Unicode Consortium's UCM file


> On Sep 29, 2025, at 17:32, John Naylor <johncnaylorls@gmail.com> wrote:
> 
> On Wed, Sep 24, 2025 at 4:18 PM Chao Li <li.evan.chao@gmail.com> wrote:
>> 
>> I found that both EUC_CN and UHC use the same XML file, so I updated both.
> 
> When you say "same file", that implies to me the file we have checked
> in our repo. They have different names and the UHC file is downloaded
> on demand, so it doesn't seem like we need to change UHC at all to
> delete gb-18030-2000.xml. Is that right?
> 
> -- 
> John Naylor
> Amazon Web Services


“same file" was a mistake. windows-949-2000.ucm is a different file from gb-18030-2000(2022).ucm.

In theory, we don’t need to change UHC if our goal is to delete gb-18030-2000.xml. However, as you can see, with switching to use ucm, UHC, EUC_CN and GB18030 now share the same download URL in the Makefile, and their perl scripts use the same logic to process UCM files, so I think it would be good for maintenance.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/