Thread

  1. Re: Regression with large XML data input

    Erik Wienhold <ewie@ewie.name> — 2025-07-24T19:01:11Z

    On 2025-07-24 05:12 +0200, Michael Paquier wrote:
    > Switching back to the previous code, where we rely on
    > xmlParseBalancedChunkMemory() fixes the issue.  A quick POC is
    > attached.  It fails one case in check-world with SERIALIZE because I
    > am not sure it is possible to pass down some options through
    > xmlParseBalancedChunkMemory(), still the regression is gone, and I am
    > wondering if there is not a better solution to be able to dodge the
    > original problem and still accept this case.
    
    The whitespace can be preserved by setting xmlKeepBlanksDefault before
    parsing.  See attached v2.  That function is deprecated, though.  But
    libxml2 uses thread-local globals, so it should be safe.  Other than
    that, I see no other way to set XML_PARSE_NOBLANKS with
    xmlParseBalancedChunkMemory.
    
    [1] https://gitlab.gnome.org/GNOME/libxml2/-/blob/408bd0e18e6ddba5d18e51d52da0f7b3ca1b4421/parserInternals.c#L2833
    
    -- 
    Erik Wienhold