Thread

  1. Re: Remaining dependency on setlocale()

    Jeff Davis <pgsql@j-davis.com> — 2025-11-29T20:50:56Z

    On Wed, 2025-11-26 at 09:50 +0800, Chao Li wrote:
    > * The retry logic implies that a single-byte char may become multiple
    > bytes after folding, otherwise retry is not needed because you have
    > allocated s+1 bytes for dest buffers. From this perspective, we
    > should use two needed variables: neededA and neededB, if neededA !=
    > neededB, then the two strings are different; if neededA == neededB,
    > then we should be perform strncmp, but here we should pass neededA
    > (or neededB as they are identical) to strncmp(al, bl, neededA).
    
    Thank you.
    
    It's actually worse than that: having a single 's' argument is just
    completely wrong. Consider:
    
      a: U&'x\0394\0394\0394'
      b: U&'\0394\0394\0394'
    
    There is no value for byte length 's' for which both 'a' and 'b' are
    properly-encoded strings. So, the current code passes invalid byte
    sequences to LOWER(), which is another pre-existing bug.
    
    ltree_strncasecmp() is only used for checking equality of the first s
    bytes of the query, so let's make it a safer API that just checks
    prefix equality. Attached.
    
    > * Based on the logic you implemented in 0004, first pg_strfold() has
    > copied as many chars as possible to dest buffer, so when retry,
    > ideally we can should resume instead of start over. However, if
    > single-byte->multi-byte folding happens, we have no information to
    > decide from where to resume.
    
    Right.
    
    That suggests that we might want some kind of lazy or iterator-based
    API for string folding. We'd need to find the right way to do that with
    ICU. If we find that it's a performance problem somewhere, we can look
    into that. Do you think we need that now?
    
    >  From this perspective, in 0004, do we really need to take the try-
    > the-best strategy for strlower_c()? If there are some other use cases
    > that require data to be placed in dest buffer even if dest buffer
    > doesn’t have enough space, then my patch [1] of changing
    > strlower_libc_sb() should be considered.
    
    I will look into that.
    
    > SB_lower_char should be changed to C_IMatchText.
    
    Updated comment.
    
    > I think the comment should be updated accordingly, like “for ILIKE in
    > the C locale”.
    
    Done, thank you.
    
    > * match is allocated with pattlen+1 bytes, is it long enough to hold
    > pattlen multiple-byte chars?
    > 
    > * match is not freed, but looks like it should be: 
    
    ...
    
    > Should “pos” be “part[pos]” assigning to match[match_pos++]?
    
    All fixed, thank you! (I apologize for posting a patch in that state to
    begin with...)
    
    I also reorganized slightly to separate out the pg_iswcased() API into
    its own patch, and moved the like_support.c changes from the ctype_is_c
    patch (already committed: 1476028225) into the pattern prefixes patch.
    
    Regards,
    	Jeff Davis