Re: [PATCH] Add CANONICAL option to xmlserialize
Jim Jones <jim.jones@uni-muenster.de>
From: Jim Jones <jim.jones@uni-muenster.de>
To: Thomas Munro <thomas.munro@gmail.com>
Cc: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Date: 2023-03-06T10:50:54Z
Lists: pgsql-hackers
Attachments
- v2-0001-Add-CANONICAL-format-to-xmlserialize.patch (text/x-patch) patch v2-0001
On 06.03.23 00:32, Thomas Munro wrote: > I couldn't reproduce that locally either, but I just tested on CI with > your patch applied saw the failure, and then removed > "PYTHONCOERCECLOCALE=0 LANG=C" and it's all green: > > https://github.com/macdice/postgres/commit/91999f5d13ac2df6f7237a301ed6cf73f2bb5b6d > > Without looking too closely, my first guess would have been that this > just isn't going to work without UTF-8 database encoding, so you might > need to skip the test (see for example > src/test/regress/expected/unicode_1.out). It's annoying that "xml" > already has 3 expected variants... hmm. BTW shouldn't it be failing > in a more explicit way somewhere sooner if the database encoding is > not UTF-8, rather than getting confused? I guess this confusion is happening because xml_parse() was being called with the database encoding from GetDatabaseEncoding(). I added a condition before calling xml_parse() to check if the xml document has a different encoding than UTF-8 parse_xml_decl(xml_text2xmlChar(data), NULL, NULL, &encodingStr, NULL); encoding = encodingStr ? xmlChar_to_encoding(encodingStr) : PG_UTF8; doc = xml_parse(data, XMLOPTION_DOCUMENT, false, encoding, NULL); v2 attached. Thanks! Best, Jim