Re: GB18030-2022 Support in PostgreSQL

John Naylor <johncnaylorls@gmail.com>

From: John Naylor <johncnaylorls@gmail.com>
To: Chao Li <li.evan.chao@gmail.com>
Cc: Peter Eisentraut <peter@eisentraut.org>, pgsql-hackers@lists.postgresql.org, Tom Lane <tgl@sss.pgh.pa.us>, Andrew Dunstan <andrew@dunslane.net>
Date: 2025-09-18T07:59:32Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Generate EUC_CN mappings from gb18030-2022.ucm

  2. Update GB18030 encoding from version 2000 to 2022

  3. Generate GB18030 mappings from the Unicode Consortium's UCM file

Attachments

On Wed, Sep 17, 2025 at 9:08 AM Chao Li <li.evan.chao@gmail.com> wrote:
> I see you have updated the function comment in utf8_and_gb18030.c, so I removed it from the v8 patch.
>
> Attached is the v8 patch:

I've reworked the commit message I started in v5 to incorporate later
discussions. (I was not a fan of including a complete table there, nor
of using UTF-8 encoding instead of code points as a reference.)

The only change I made for v9 is to reword the regression test
addition from "upgrades" to "change". I'm planning to commit next week
unless there are objections. (If anyone otherwise busy with the PG18
release wants a chance to weigh in, let me know and I'll hold off).

It'll be a good idea to communicate how to detect (unlikely but not
impossible) incompatibilities for existing systems, but I don't think
committing needs to wait for that piece.

--
John Naylor
Amazon Web Services