Thread

  1. Re: Regression with large XML data input

    Erik Wienhold <ewie@ewie.name> — 2025-07-28T10:49:02Z

    On 2025-07-28 09:45 +0200, Jim Jones wrote:
    > 
    > On 28.07.25 04:47, Michael Paquier wrote:
    > > I understand that from the point of view of a maintainer this is
    > > rather bad, but from the customer point of view the current
    > > situation is also bad to deal with in the scope of a minor upgrade,
    > > because applications suddenly break.
    > 
    > I totally get it --- from the user’s perspective, it’s hard to see
    > this as a bugfix.
    > 
    > I was wondering whether using XML_PARSE_HUGE in xml_parse's options
    > could help address this, for example:
    > 
    > options = XML_PARSE_NOENT | XML_PARSE_DTDATTR | XML_PARSE_HUGE
    >           | (preserve_whitespace ? 0 : XML_PARSE_NOBLANKS);
    
    This also came to my mind, but it was already tried and reverted soon
    after for security reasons. [1]
    
    > One idea would be to guard XML_PARSE_HUGE behind a GUC --- say,
    > xml_enable_huge_parsing. That would at least allow controlled
    > environments to opt in. But of course, that wouldn't help current
    > releases.
    
    +1 for new major releases.  But normal users must not be allowed to
    enable that GUC.  So probably context PGC_SU_BACKEND.
    
    I'm leaning towards Michael's proposal of adding a libxml2 version check
    in the stable branches before REL_18_STABLE and parsing the content with
    xmlParseBalancedChunkMemory on versions up to 2.12.x.
    
    [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f2743a7d70e7b2891277632121bb51e739743a47
    
    -- 
    Erik Wienhold