Re: GB18030-2022 Support in PostgreSQL

Peter Eisentraut <peter@eisentraut.org>

From: Peter Eisentraut <peter@eisentraut.org>
To: Chao Li <li.evan.chao@gmail.com>, Tom Lane <tgl@sss.pgh.pa.us>
Cc: Andrew Dunstan <andrew@dunslane.net>, John Naylor <johncnaylorls@gmail.com>, JiaoShuntian <jiaoshuntian@highgo.com.w.kunlunaq.com>, pgsql-hackers@lists.postgresql.org
Date: 2025-08-06T10:29:15Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Generate EUC_CN mappings from gb18030-2022.ucm

  2. Update GB18030 encoding from version 2000 to 2022

  3. Generate GB18030 mappings from the Unicode Consortium's UCM file

On 05.08.25 08:22, Chao Li wrote:
> I agree with Tom that we may just redefine GB18030 to comply with the 
> 2022 standard.
> 
> As John Naylor pointed, 2022 is not backward compatible, that is true. 
> However, I went through all the incompatible changes, those are all 
> characters rarely used. So I would guess most of the existing databases 
> won’t be impacted and the rest with encoding GB18030 need to do data 
> migration before upgrading to a PG version that switches to 
> GB18030-2022. I think PG may delegate data migration tasks to third 
> party PG service vendors. They may develop simple or complex migration 
> tools to help different use cases.

Note that you can also create custom conversions using CREATE 
CONVERSION, so that would be something for those who would need the old 
behavior.