Re: [PATCH] Add CANONICAL option to xmlserialize
Jim Jones <jim.jones@uni-muenster.de>
From: Jim Jones <jim.jones@uni-muenster.de>
To: Andrew Dunstan <andrew@dunslane.net>, Tom Lane <tgl@sss.pgh.pa.us>
Cc: Pavel Stehule <pavel.stehule@gmail.com>,
Chapman Flack <chap@anastigmatix.net>, vignesh C <vignesh21@gmail.com>,
Thomas Munro <thomas.munro@gmail.com>,
PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>,
Vik Fearing <vik@postgresfriends.org>
Date: 2026-05-26T09:46:44Z
Lists: pgsql-hackers
Attachments
- v25-0001-Add-xmlcanonicalize-function.patch (text/x-patch) patch v25-0001
On 30/03/2026 13:27, Jim Jones wrote: > On 30/03/2026 11:44, Andrew Dunstan wrote: >> I note that your function returns xml, whereas Tom's suggestion was for >> a function returning text. I don't think there was any discussion on the >> point. > Indeed, there was no discussion regarding the return type. > > My rationale for keeping it as xml was: the output is xml, callers can > immediately use the xml without casting, and nearly all other xml* > functions return xml. Is there a direct advantage of having this > function return text? After some consideration, I think returning text instead of xml is indeed the better choice here. The canonical form is a serialization artifact rather than a document intended for further XML processing. More practically, since xml has no = operator, the primary use case of comparing documents requires casting anyway -- returning text is indeed closer to real-world usage. I also noticed a correctness issue with database encoding: the C14N 1.1 specification mandates UTF8 output, so xmlC14NDocDumpMemory always returns UTF8.[1] I added a pg_any_to_server call to convert the output to the server encoding before returning. Best, Jim 1 - https://github.com/GNOME/libxml2/blob/174201f747da93167354287a7599d0b385552599/c14n.c#L1964