Thread

  1. Re: XPATH evaluation

    Radosław Smogura <rsmogura@softperience.eu> — 2011-06-18T16:51:26Z

    Nicolas Barbier <nicolas.barbier@gmail.com> Friday 17 of June 2011 17:29:57
    > 2011/6/17, Andrew Dunstan <andrew@dunslane.net>:
    > > On 06/17/2011 10:55 AM, Radosław Smogura wrote:
    > >> XML canonization preservs whitespaces, if I remember
    > >> well, I think there is example.
    > >> 
    > >> In any case if I will store image in XML (I've seen this), preservation
    > >> of white spaces and new lines is important.
    > > 
    > > If you store images you should encode them anyway, in base64 or hex.
    > 
    > Whitespace that is not at certain obviously irrelevant places (such as
    > right after "<", between attributes, outside of the whole document,
    > etc), and that is not defined to be irrelevant by some schema (if the
    > parser is schema-aware), is relevant. You cannot just muck around with
    > it and consider that correct.
    > 
    > > More generally, data that needs that sort of preservation should
    > > possibly be in CDATA nodes.
    > 
    > CDATA sections are just syntactic sugar (a form of escaping):
    > 
    > <URL:http://www.w3.org/TR/xml-infoset/#omitted>
    > 
    > "Appendix D: What is not in the Information Set
    > [..]
    > 19. The boundaries of CDATA marked sections."
    > 
    > Therefore, there is not such thing as a "CDATA node" that would be
    > different from "just text" (Infoset-wise).
    > 
    > Note that that does not mean that binary data is never supposed to be
    > altered or that all binary data is to be accepted: e.g., whether
    > newlines are represented using "\n", "\r", or "\r\n" is irrelevant;
    > also, binary data that is not valid according to the used encoding
    > must of course not be accepted.
    > 
    > Nicolas
    
    I would like to send patch to remove formatting. How to deal with collapsing 
    blank nodes I don't know.
    
    Regards,
    Radek