Re: GB18030-2022 Support in PostgreSQL
John Naylor <johncnaylorls@gmail.com>
From: John Naylor <johncnaylorls@gmail.com>
To: Chao Li <li.evan.chao@gmail.com>
Cc: pgsql-hackers@lists.postgresql.org, Tom Lane <tgl@sss.pgh.pa.us>, Andrew Dunstan <andrew@dunslane.net>
Date: 2025-08-11T05:50:48Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Generate EUC_CN mappings from gb18030-2022.ucm
- 48566180efff 19 (unreleased) landed
-
Update GB18030 encoding from version 2000 to 2022
- 5334620eef8f 19 (unreleased) landed
-
Generate GB18030 mappings from the Unicode Consortium's UCM file
- cfa6cd29271e 19 (unreleased) landed
On Mon, Aug 11, 2025 at 9:01 AM Chao Li <li.evan.chao@gmail.com> wrote: > > I have created a patch https://commitfest.postgresql.org/patch/5954/. CommitFests requested a rebase, so I rebased the code and created the v2 patch. > > BTW, I have tested all 66 new characters, 9 not-required characters and 18 changed characters in a way as: "9 characters are no longer required by the new standard, but are retained in this patch for compatibility" How is that done? > I added a test case with a mapping changed char, and the test passes: > > % make check > ... > # All 229 tests passed. > > For more details on the standard change, see https://ken-lunde.medium.com/the-gb-18030-2022-standard-3d0ebaeb4132 > > I am attaching the patch file. Going from the old .xml file to the .ucm file makes it difficult to see the relevant changes. Also, there are nearly 1000 non-user-visible changes like this in the output file that are not explained: - /*** Three byte table, leaf: efa8xx - offset 0x07aba ***/ + /*** Three byte table, leaf: efa8xx - offset 0x07b3a ***/ The 2000 version is available in the .ucm format, so maybe converting to that first would be a good preparatory patch: https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/gb-18030-2000.ucm Looking at the history, it looks like that file has seen small revisions, so it may take some research to get the exact equivalent to the XML file we use. That will also tell us if anything will change for us besides the actual 2022 revision. -- John Naylor Amazon Web Services