Re: [PATCH] Add CANONICAL option to xmlserialize

Jim Jones <jim.jones@uni-muenster.de>

From: Jim Jones <jim.jones@uni-muenster.de>
To: Andrew Dunstan <andrew@dunslane.net>, Tom Lane <tgl@sss.pgh.pa.us>
Cc: Pavel Stehule <pavel.stehule@gmail.com>, Chapman Flack <chap@anastigmatix.net>, vignesh C <vignesh21@gmail.com>, Thomas Munro <thomas.munro@gmail.com>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>, Vik Fearing <vik@postgresfriends.org>
Date: 2026-05-26T09:46:44Z
Lists: pgsql-hackers

Attachments

On 30/03/2026 13:27, Jim Jones wrote:
> On 30/03/2026 11:44, Andrew Dunstan wrote:
>> I note that your function returns xml, whereas Tom's suggestion was for
>> a function returning text. I don't think there was any discussion on the
>> point.
> Indeed, there was no discussion regarding the return type.
> 
> My rationale for keeping it as xml was: the output is xml, callers can
> immediately use the xml without casting, and nearly all other xml*
> functions return xml. Is there a direct advantage of having this
> function return text?


After some consideration, I think returning text instead of xml is
indeed the better choice here. The canonical form is a serialization
artifact rather than a document intended for further XML processing.
More practically, since xml has no = operator, the primary use case of
comparing documents requires casting anyway -- returning text is indeed
closer to real-world usage.

I also noticed a correctness issue with database encoding: the C14N 1.1
specification mandates UTF8 output, so xmlC14NDocDumpMemory always
returns UTF8.[1] I added a pg_any_to_server call to convert the output
to the server encoding before returning.

Best, Jim

1 -
https://github.com/GNOME/libxml2/blob/174201f747da93167354287a7599d0b385552599/c14n.c#L1964