Re: GB18030-2022 Support in PostgreSQL

Chao Li <li.evan.chao@gmail.com>

From: Chao Li <li.evan.chao@gmail.com>
To: John Naylor <johncnaylorls@gmail.com>
Cc: Peter Eisentraut <peter@eisentraut.org>, pgsql-hackers@lists.postgresql.org, Tom Lane <tgl@sss.pgh.pa.us>, Andrew Dunstan <andrew@dunslane.net>
Date: 2025-08-13T07:20:27Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Generate EUC_CN mappings from gb18030-2022.ucm

  2. Update GB18030 encoding from version 2000 to 2022

  3. Generate GB18030 mappings from the Unicode Consortium's UCM file


> On Aug 13, 2025, at 15:17, John Naylor <johncnaylorls@gmail.com> wrote:
> 
> On Wed, Aug 13, 2025 at 2:41 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>> Could we download this file on demand, like we do for the other input
>> files for the conversion mappings?
> 
> That sounds like the way to go.
> 
> While poking around, I found that UCS_to_EUC_CN.pl also uses
> gb-18030-2000.xml for its input, so now it seems wrong to delete the
> XML file as a side effect of changing the source for GB18030. Maybe
> EUC_CN could use a downloaded-on-demand .ucm source as well (whether
> 2000 or 2022) but we can consider that later. For now let's leave the
> XML file alone.
> 

Sounds good. Let me recreate the patch.

Chao Li (Evan)
--------------------
HighGo Software Co., Ltd.
https://www.highgo.com/