Thread

  1. [PATCH] Bug in XPATH() if expression returns a scalar value

    Florian G. Pflug <fgp@phlo.org> — 2011-05-30T13:17:02Z

    Hi
    
    The in-core XPATH() function currently only handles XPath expressions which
    return a node set correctly. For XPath expressions which return boolean,
    numeric or string values, XPATH() returns an empty array. (This is the case
    for XPath expressions whose top-level syntactic construct isn't a path
    expression but rather one of the built-in functions like name() or count()).
    
    The XPath expression 'name(/*)', for example, is supposed to return 'root'
    when applied to the XML fragment '<root><nested/><nested/></root>'. Postgres,
    however, currently returns an empty array.
    
    A patch which fixes that is attached. It makes XPATH() return a single-element
    array containing a textual representation of the string, numeric value or boolean
    for XPath expressions which don't return a node set. For numeric and boolean
    results, the textual representation is obtained by calling float8out() respectively
    boolout(). These function's results are assumed to always be well-form XML. For
    string results, xml_in() is used to verify that the string returned by the XPath
    expression is a well-formed XML fragment. This is required because XPATH() could
    otherwise be used to insert invalid XML fragments into columns of type XML.
    
    The patch add a few tests of non-nodeset returning XPath expressions to the
    XML regression tests.
    
    I'm not 100% clear on whether I should add this to the next commit fest or not -
    after all, it's more a bug-fix rather than a new feature. But to prevent this
    from getting lost, I'll add it unless someone tells me not to.
    
    best regards,
    Florian Pflug
    
  2. Re: [PATCH] Bug in XPATH() if expression returns a scalar value

    Florian G. Pflug <fgp@phlo.org> — 2011-05-31T14:19:29Z

    Sorry for the self-reply but I figured it'd be helpful to add information
    that I discovered only after my initial post.
    
    On May30, 2011, at 15:17 , Florian Pflug wrote:
    > The XPath expression 'name(/*)', for example, is supposed to return 'root'
    > when applied to the XML fragment '<root><nested/><nested/></root>'. Postgres,
    > however, currently returns an empty array.
    
    In the mean while, I've discovered that this was discussed previously about
    a year ago here:
      http://archives.postgresql.org/pgsql-general/2010-07/msg00355.php
    
    The basic confusion seems to be whether XPATH() is supposed to work on
    everything that http://www.w3.org/TR/xpath/ consideres to be an "Expression",
    or only on what that document calls a "Location Path".
    
    The difference is basically that "Location Paths" server purely as
    predicates, i.e. *select* a subset of nodes from an XML fragment, while
    "Expressions" can produce node sets *or* arbitrary scalar values
    (boolean, numeric or string).
    
    According to the thread from last summer, XLST handles this by defining
    *two* constructs which evaluate XPath expressions, one for those which
    return node sets (<xsl:template match="...">) and one for those which
    return scalar values (<xsl:value-of select="...">).
    
    My patch makes XPATH() work for both nodset-returning
    *and* scalar-value-returning expressions. This has the advantage
    of being simpler, but it does force the scalar values produced
    by an XPath expression to be valid XML fragments. For boolean and
    numeric values this isn't a problem, but it does limit what you
    can do with string-returning XPath expressions.
    
    If people deem this to be a problem, we could instead add a separate
    function XPATH_VALUE() that returns VARCHAR, and make people use that
    for scalar-value-returning expressions. However, to avoid confusion,
    XPATH() should then be taught to raise an error if used for scalar-value
    returning expressions, instead of silently returning an empty array as
    it does now.
    
    Thoughts, anyone?
    
    best regards,
    Florian Pflug
    
    
    
  3. Re: [PATCH] Bug in XPATH() if expression returns a scalar value

    Ross Reedstrom <reedstrm@rice.edu> — 2011-05-31T17:15:36Z

    On Tue, May 31, 2011 at 04:19:29PM +0200, Florian Pflug wrote:
    > Sorry for the self-reply but I figured it'd be helpful to add information
    > that I discovered only after my initial post.
    > 
    > On May30, 2011, at 15:17 , Florian Pflug wrote:
    > > The XPath expression 'name(/*)', for example, is supposed to return 'root'
    > > when applied to the XML fragment '<root><nested/><nested/></root>'. Postgres,
    > > however, currently returns an empty array.
    > 
    > In the mean while, I've discovered that this was discussed previously about
    > a year ago here:
    >   http://archives.postgresql.org/pgsql-general/2010-07/msg00355.php
    > 
    > The basic confusion seems to be whether XPATH() is supposed to work on
    > everything that http://www.w3.org/TR/xpath/ consideres to be an "Expression",
    > or only on what that document calls a "Location Path".
    > 
    > The difference is basically that "Location Paths" server purely as
    > predicates, i.e. *select* a subset of nodes from an XML fragment, while
    > "Expressions" can produce node sets *or* arbitrary scalar values
    > (boolean, numeric or string).
    > 
    > According to the thread from last summer, XLST handles this by defining
    > *two* constructs which evaluate XPath expressions, one for those which
    > return node sets (<xsl:template match="...">) and one for those which
    > return scalar values (<xsl:value-of select="...">).
    >
    > My patch makes XPATH() work for both nodset-returning
    > *and* scalar-value-returning expressions. This has the advantage
    > of being simpler, but it does force the scalar values produced
    > by an XPath expression to be valid XML fragments. For boolean and
    > numeric values this isn't a problem, but it does limit what you
    > can do with string-returning XPath expressions.
    > 
    > If people deem this to be a problem, we could instead add a separate
    > function XPATH_VALUE() that returns VARCHAR, and make people use that
    > for scalar-value-returning expressions. However, to avoid confusion,
    > XPATH() should then be taught to raise an error if used for scalar-value
    > returning expressions, instead of silently returning an empty array as
    > it does now.
    > 
    > Thoughts, anyone?
    
    I think it's important to note here that the nodeset returning nature of
    XPATH in XSLT is a context setting functionality: these nodes are then
    further processed by the template. In the postgresql case, the
    distinction between returning a value and doing further processing isn't
    so clear. My one use-cases tend toward processing a table full of
    records with an XML field, using the XPATH to select out fragments and
    records ids into a secondary table for further processing/analysis.
    
    What you describe, making XPATH return something for the scalar
    functions, is sorely needed. Constraining the return values to be valid
    XML fragments is the sort of wart that makes XML processing in
    postgresql seem odd to those familiar with other tools, though. How
    about naming the other function XPATH_VALUE_OF, just to make it the XSLT
    connection clear?
    
    Ross
    -- 
    Ross Reedstrom, Ph.D.                                 reedstrm@rice.edu
    Systems Engineer & Admin, Research Scientist        phone: 713-348-6166
    Connexions                  http://cnx.org            fax: 713-348-3665
    Rice University MS-375, Houston, TX 77005
    GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E  F888 D3AE 810E 88F0 BEDE
    
    
  4. Re: [PATCH] Bug in XPATH() if expression returns a scalar value

    Florian G. Pflug <fgp@phlo.org> — 2011-05-31T19:28:50Z

    On May31, 2011, at 19:15 , Ross J. Reedstrom wrote:
    > What you describe, making XPATH return something for the scalar
    > functions, is sorely needed. Constraining the return values to be valid
    > XML fragments is the sort of wart that makes XML processing in
    > postgresql seem odd to those familiar with other tools, though.
    
    I've now changes things so that the results of scalar-value
    returning XPath expressions are correctly entity-encoded (i.e.,
    a literal "<" gets translated to "&lt;").
    
    After realizing that this is necessary even for node-set
    returning XPath expressions (see my other mail from today),
    because they may very well select text nodes, I came to the
    conclusion that doing this unconditionally (well, except for
    element nodes obviously) is the least surprising behaviour.
    
    The following subsumes the behavior with this and the patch
    from my other e-mail applied.
    
    SELECT
            (XPATH('namespace-uri(/*)', x))[1] AS namespace,
            (XPATH('/*/@value', x))[1] AS value,
            (XPATH('/*/text()', x))[1] AS text
    FROM (VALUES (XMLELEMENT(name "root",
            XMLATTRIBUTES('<n' AS xmlns, '<v' AS value),
            '<t'
    ))) v(x);
    
     namespace | value | text  
    -----------+-------+-------
     &lt;n     | &lt;v | &lt;t
    
    Without the patch from the other mail, the "namespace" result
    stays the same, but "value" and "text" are "<v" and "<t"
    respectively.
    
    Updated patch is attached
    
    best regards,
    Florian Pflug
    
    PS: Btw, while trying this I think I found another problem. If you
    do 
    
    SELECT (XPATH(
            '/*',
            XMLELEMENT(NAME "root",
                       XMLATTRIBUTES('<n' AS xmlns,
                                     '<v' AS value))
    ))[1];
    
    you get
    
                  xpath               
    ----------------------------------
     <root xmlns="<n" value="&lt;v"/>
    
    i.e. the "<" in the namespace URI isn't quoted properly.
    Trying to cast that value to text and back to xml fails.
    Funnily enough, if you skip the XPATH() call, things work properly
    
    SELECT XMLELEMENT(NAME "root",
                      XMLATTRIBUTES('<n' AS xmlns,
                                    '<v' AS value))
    
    gives
    
                 xmlelement              
    -------------------------------------
     <root xmlns="&lt;n" value="&lt;v"/>
    
    I'll start a new thread for this issue...
    
  5. Re: [PATCH] Bug in XPATH() if expression returns a scalar value

    Peter Eisentraut <peter_e@gmx.net> — 2011-06-06T12:56:59Z

    On tis, 2011-05-31 at 16:19 +0200, Florian Pflug wrote:
    > If people deem this to be a problem, we could instead add a separate
    > function XPATH_VALUE() that returns VARCHAR, and make people use that
    > for scalar-value-returning expressions.
    
    Why not replicate what contrib/xml2 provides, namely
    
    xpath_string()
    xpath_number()
    xpath_bool()
    
    That way, types are preserved.
    
    > However, to avoid confusion,
    > XPATH() should then be taught to raise an error if used for
    > scalar-value
    > returning expressions, instead of silently returning an empty array as
    > it does now.
    
    Sounds reasonable.
    
    
    
  6. Re: [PATCH] Bug in XPATH() if expression returns a scalar value

    Florian G. Pflug <fgp@phlo.org> — 2011-06-08T08:14:09Z

    On Jun6, 2011, at 14:56 , Peter Eisentraut wrote:
    > On tis, 2011-05-31 at 16:19 +0200, Florian Pflug wrote:
    >> If people deem this to be a problem, we could instead add a separate
    >> function XPATH_VALUE() that returns VARCHAR, and make people use that
    >> for scalar-value-returning expressions.
    > 
    > Why not replicate what contrib/xml2 provides, namely
    > 
    > xpath_string()
    > xpath_number()
    > xpath_bool()
    > 
    > That way, types are preserved.
    
    
    But then you lose the ability to evaluate user-supplied
    XPath expressions, because there's no way of telling which of these
    function to use.
    
    Since XPATH_BOOL() can be emulated by doing XPATH(...)::text::boolean
    if XPATH() implements the more lenient behaviour I proposed, that
    seems like a bad tradeoff overall.
    
    best regards,
    Florian Pflug
    
    
    
  7. Re: [PATCH] Bug in XPATH() if expression returns a scalar value

    Florian G. Pflug <fgp@phlo.org> — 2011-06-09T23:28:21Z

    On Jun8, 2011, at 10:14 , Florian Pflug wrote:
    > On Jun6, 2011, at 14:56 , Peter Eisentraut wrote:
    >> On tis, 2011-05-31 at 16:19 +0200, Florian Pflug wrote:
    >>> If people deem this to be a problem, we could instead add a separate
    >>> function XPATH_VALUE() that returns VARCHAR, and make people use that
    >>> for scalar-value-returning expressions.
    >> 
    >> Why not replicate what contrib/xml2 provides, namely
    >> 
    >> xpath_string()
    >> xpath_number()
    >> xpath_bool()
    >> 
    >> That way, types are preserved.
    > 
    > But then you lose the ability to evaluate user-supplied
    > XPath expressions, because there's no way of telling which of these
    > function to use.
    > 
    > Since XPATH_BOOL() can be emulated by doing XPATH(...)::text::boolean
    > if XPATH() implements the more lenient behaviour I proposed, that
    > seems like a bad tradeoff overall.
    
    
    Patch rebased onto HEAD.
    
    best regards,
    Florian Pflug
    
  8. Re: [PATCH] Bug in XPATH() if expression returns a scalar value

    Peter Eisentraut <peter_e@gmx.net> — 2011-06-13T19:24:49Z

    On ons, 2011-06-08 at 10:14 +0200, Florian Pflug wrote:
    > But then you lose the ability to evaluate user-supplied
    > XPath expressions, because there's no way of telling which of these
    > function to use.
    
    Perhaps having both variants, one type-safe and one not, would work.  I
    don't agree with doing away with type-safety completely for the sake of
    convenience.
    
    
    
    
  9. Re: [PATCH] Bug in XPATH() if expression returns a scalar value

    Florian G. Pflug <fgp@phlo.org> — 2011-06-14T09:56:31Z

    On Jun13, 2011, at 21:24 , Peter Eisentraut wrote:
    > On ons, 2011-06-08 at 10:14 +0200, Florian Pflug wrote:
    >> But then you lose the ability to evaluate user-supplied
    >> XPath expressions, because there's no way of telling which of these
    >> function to use.
    > 
    > Perhaps having both variants, one type-safe and one not, would work.  I
    > don't agree with doing away with type-safety completely for the sake of
    > convenience.
    
    In theory, I agree.
    
    In practice, however, XPath 1.0 isn't very strongly typed itself. The
    built-in types are all auto-converted into one another (As if
    string(), number(), boolean() had been called). Also, only three of
    the functions defined by XPath 1.0 seem interesting. Here's the break-down
    
    The functions returning "string" are
      string(): Converts arbitrary values to strings
      local-name(): Name of node-set's first top-level w/o namespace prefix
      namespace-uri(): Namespace of node-set's first top-level
      name(): Namespace prefix and name of node-set's first top-level node
      concat()
      starts-with()
      contains()
      substring-before()
      substring-after()
      substring()
      string-length()
      translate()
      
    For all of these function postgres provides corresponding SQL functions,
    which the exception of
      local-name()
      namespace-uri()
      name()
    
    In fact, these three functions are the raison d'être for my patch and this thread.
    I needed to find the name of a tag returned by an XPath expression, and to my
    dismay discovered that XPATH('local-name(...)', ...) returns an empty array. The
    only reason I added support for boolean and numeric XPath expressions at all was
    for the sake of completeness.
    
    Here's the rest of the functions defined by XPath 1.0. I'm convinces that none
    of them are particularly useful as top-level functions, and therefore believe
    that adding XPATH_BOOLEAN() and XPATH_NUMBER() is largely overkill.
    
    The functions returning "number" are
      number(): Converts arbitrary values to numbers
      last()
      position()
      count()
      sum(): Sum over a node-set after implicit conversion of nodes to numbers
      floor()
      ceiling()
      round()
      operators +, -, *, div, mod
    
    The functions returning "boolean" are
      boolean(): Converts arbitrary to boolean
      not()
      true()
      false()
      operators or, and, =, !=, <=, <, >=, >
    
    best regards,
    Florian Pflug