Re: GB18030-2022 Support in PostgreSQL

Chao Li <li.evan.chao@gmail.com>

From: Chao Li <li.evan.chao@gmail.com>
To: John Naylor <johncnaylorls@gmail.com>
Cc: Peter Eisentraut <peter@eisentraut.org>, pgsql-hackers@lists.postgresql.org, Tom Lane <tgl@sss.pgh.pa.us>, Andrew Dunstan <andrew@dunslane.net>
Date: 2025-08-13T08:08:45Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Generate EUC_CN mappings from gb18030-2022.ucm

  2. Update GB18030 encoding from version 2000 to 2022

  3. Generate GB18030 mappings from the Unicode Consortium's UCM file

Attachments

On 2025/8/13 15:20, Chao Li wrote:
>
>
> Sounds good. Let me recreate the patch.
>
>
Attached is the new patch. It downloads the UCM file in make:


```
Unicode % make gb18030_to_utf8.map
wget -O gb-18030-2000.ucm --no-use-server-timestamps 
https://raw.githubusercontent.com/unicode-org/icu-data/d9d3a6ed27bb98a7106763e940258f0be8cd995b/charset/data/ucm/gb-18030-2000.ucm
--2025-08-13 15:54:53-- 
https://raw.githubusercontent.com/unicode-org/icu-data/d9d3a6ed27bb98a7106763e940258f0be8cd995b/charset/data/ucm/gb-18030-2000.ucm
HTTP request sent, awaiting response... 200 OK
Length: 672885 (657K) [text/plain]
Saving to: ‘gb-18030-2000.ucm’

gb-18030-2000.ucm  100%[=====================================>] 657.11K 
  2.78MB/s    in 0.2s

2025-08-13 15:54:54 (2.78 MB/s) - ‘gb-18030-2000.ucm’ saved [672885/672885]

'/usr/bin/perl' -I . UCS_to_GB18030.pl
- Writing UTF8=>GB18030 conversion table: utf8_to_gb18030.map
- Writing GB18030=>UTF8 conversion table: gb18030_to_utf8.map
Unicode % git diff
Unicode %
```

After regenerating the map files, there is no change found in the map files.


Best regards,

Chao Li (Evan)
--------------------
HighGo Software Co., Ltd.
https://www.highgo.com/