Re: GB18030-2022 Support in PostgreSQL

Chao Li <li.evan.chao@gmail.com>

From: Chao Li <li.evan.chao@gmail.com>

To: John Naylor <johncnaylorls@gmail.com>

Cc: Peter Eisentraut <peter@eisentraut.org>, pgsql-hackers@lists.postgresql.org, Tom Lane <tgl@sss.pgh.pa.us>, Andrew Dunstan <andrew@dunslane.net>

Date: 2025-09-10T11:54:08Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Generate EUC_CN mappings from gb18030-2022.ucm
- 48566180efff 19 (unreleased) landed
Update GB18030 encoding from version 2000 to 2022
- 5334620eef8f 19 (unreleased) landed
Generate GB18030 mappings from the Unicode Consortium's UCM file
- cfa6cd29271e 19 (unreleased) landed

Attachments

v5-0002-JCN-changes.patch (application/octet-stream) patch v5-0002
v5-0001-Generate-GB18030-mappings-from-the-Unicode-Consor.patch (application/octet-stream) patch v5-0001
v5-0003-Update-GB18030-encoding-from-version-2000-to-2022.patch (application/octet-stream) patch v5-0003

Hi John,

Thank you very much for taking care of this patch.

John Naylor <johncnaylorls@gmail.com> 于2025年9月10日周三 14:38写道：

>
> - The URL at the top currently points to a directory in Github, but v3
> changed it to point to the actual file. A directory can be navigated
> for inspection, so I used:
>
> 2000:
> https://github.com/unicode-org/icu-data/tree/main/charset/data/ucm
>
> 2022:
> https://github.com/unicode-org/icu/blob/main/icu4c/source/data/mappings/
>
>
Looks good.


> - I also made the regex a multiline regex for readability, even though
> the previous one was not.
>
>
Thank you very much for polishing the perl script. I am not an expert of
perl. I can make the script working, but not perfect.


> For 2022 version, I think it would be good to once run a test to
> verify that no mappings changed that we didn't expect. Perhaps the
> tests here can be used:
>
>
> https://www.postgresql.org/message-id/b9e3167f-f84b-7aa4-5738-be578a4db924%40iki.fi
>
>
I have manually run tested I had done before, everything works as expected.

I downloaded the tests from the referenced mail, but I cannot make the
tests to run. After extracting the 2 patch files, it added
src/test/encodings, but "make check" seems to not run them. I tried to copy
.out and .sql files to src/test/regress, but the tests still not running.
Did I miss anything?

The upstream correction to the 2000 version is not present in our
> mappings, so we should mention that, unless it was reverted in or
> before 2022.
>

I think the upstream correction to the 2000 version is just a few not
round-trip chars that are ignored by us. So I feel we don't need to mention
them.


>
> In the documentation (charset.sgml), do we want to mention the version
> e.g. the following?
>
>  <entry><literal>GB18030</literal></entry>
> -<entry>National Standard</entry>
> +<entry>National Standard, version 2022</entry>
>

That's a good idea. I updated the sgml file:

[image: image.png]


>
> I've whacked around the commit messages, so those should be reviewed
> for accuracy.
>
> Your draft commit message had "9 characters are no longer required by
> the new standard, but are retained in this patch for compatibility"
> ...but those nine were introduced in the 2005 version, right? In which
> case it doesn't affect us. Please confirm.
>

I don't find any hint about if the 9 characters were introduced in the 2005
version.

But without this patch, they can be properly converted:
```
evantest=# SELECT encode(convert_from(decode('FD9D', 'hex'),
'GB18030')::bytea, 'hex');
 encode
--------
 efa5b9
(1 row)
```
So they should be available in the version 2002 already.


>
> "Author: Zheng Tao <taoz@highgo.com>" -- I haven't seen any messages
> from this address in this thread, so could you confirm this was
> intentional?
>
>
Yes, Zheng Tao is my colleague. He worked with me for this patch, so I want
to credit him.

I am attaching v5 version. The only change is 0003, I added the SGML change.

Best regards,
Chao Li (Evan)
---------------------
HighGo Software Co., Ltd.
https://www.highgo.com/