Re: GB18030-2022 Support in PostgreSQL
John Naylor <johncnaylorls@gmail.com>
From: John Naylor <johncnaylorls@gmail.com>
To: Chao Li <li.evan.chao@gmail.com>
Cc: pgsql-hackers@lists.postgresql.org, Tom Lane <tgl@sss.pgh.pa.us>, Andrew Dunstan <andrew@dunslane.net>
Date: 2025-08-11T09:15:00Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Generate EUC_CN mappings from gb18030-2022.ucm
- 48566180efff 19 (unreleased) landed
-
Update GB18030 encoding from version 2000 to 2022
- 5334620eef8f 19 (unreleased) landed
-
Generate GB18030 mappings from the Unicode Consortium's UCM file
- cfa6cd29271e 19 (unreleased) landed
On Mon, Aug 11, 2025 at 3:22 PM Chao Li <li.evan.chao@gmail.com> wrote: Hi, For future reference, please don't quote my entire message below yours -- it clutters the archives and also removes context. > Yes, I did a diff between 2000.ucm and 2022.ucm when I worked on the patch. The diff between 2000.ucm and 2022.ucm are quite small: That would match my expectation. In case it wasn't clear before, my preference is to split this patch into two patches: First convert to .ucm, then update to 2022 revision. Then the small diff will be obvious to everyone who looks at the second commit. > For your question: > > "9 characters are no longer required by the new standard, but are > retained in this patch for compatibility" > > How is that done? > > > The 9 mappings are not changed between 2000.ucm and 2022.ucm. For example, GB18030 code 0xFD9C is one of the 9 not-required code, but the mapping: > > <UF92C> \xFD\x9C |0 > > Still appears in 2022.ucm, so that this character is retained. Thanks for clarifying -- by saying "retained in the patch", the commit message implied to me that the patch added something not in the upstream file. -- John Naylor Amazon Web Services