Thread

  1. Re: Remaining dependency on setlocale()

    Jeff Davis <pgsql@j-davis.com> — 2025-12-12T20:11:40Z

    On Fri, 2025-12-05 at 16:01 +0100, Peter Eisentraut wrote:
    > v11-0003-Fix-inconsistency-between-ltree_strncasecmp-and-.patch
    > 
    > The function comment reads "Check if b has a prefix of a." -- Is that
    > the same as "Check if a is a prefix of b."?  The latter might be
    > clearer.
    
    Yes, fixed.
    
    Note: I separated this into two patches. 0003 fixes the multibyte
    mishandling issue, and 0004 consistently performs case folding. 0003 is
    backpatchable, I believe.
    
    > but the patch removes SB_lower_char().
    
    Fixed and committed.
    
    > v11-0006-Use-multibyte-aware-extraction-of-pattern-prefix.patch
    > 
    > Might have a small typo in the commit message:
    > 
    > ; and preserve and char-at-a-time logic for bytea.
    
    Fixed.
    
    I also changed it into two functions: like_fixed_prefix(), which is
    almost unchanged from the original; and like_fixed_prefix_ci(), which
    is multibyte and locale-aware. It was too confusing to have single-byte
    and multi-byte logic in the same function, and they didn't share much
    code anyway.
    
    > case '\xc7':        /* C with cedilla */
    > 
    > so the premise that "fuzzystrmatch is designed for ASCII" does not
    > appear to be correct.  Needs more analysis.
    > 
    > (But apparently it's not multibyte aware at all, so I don't know what
    > to 
    > do about that.)
    
    I didn't notice that, thank you. Agreed, we need a bit more discussion
    around this case as well as soundex().
    
    > v11-0008-downcase_identifier-use-method-table-from-locale.patch
    > 
    > I'm confused here about the name of the function pg_strfold_ident(). 
    > In 
    > general, case "folding" results in an opaque string that is really
    > only 
    > useful for comparing against other case-folded strings.  But for 
    > identifiers we are actually interested lower-casing.  I think this 
    > should be corrected in the API naming.
    
    Agreed and fixed.
    
    Also, I added 0006, which saves a locale_t object for ICU in this one
    case where it's required. Surely that's not what we want in the long
    term, but we don't have the infrastructure for decoding pg_wchar into
    code points yet, and 0006 avoids the dependency on the global LC_CTYPE
    setting.
    
    > v11-0009-Control-LC_COLLATE-with-GUC.patch
    > 
    > I know there were some complaints about compatibility with
    > extensions, 
    > but I don't think anything concrete was presented.  I would like to
    > see 
    > more evidence that we need this.
    > 
    > Also, recall that we used to have a lc_collate GUC, and in the end 
    > people got confused that it didn't actually show a meaningful value
    > when 
    > you used ICU.  So we removed that.  It seems adding this back in
    > would 
    > create a similar kind of confusion.  So to avoid that, maybe this
    > should 
    > be called fallback_lc_collate or something like that.
    
    Yes, this is a POC patch and needs more discussion.
    
    What are your thoughts about a similar lc_ctype GUC, though? That has
    slightly different trade-offs.
    
    
    I believe v12 0001-0005 are about ready for commit, and 0003 should be
    backported.
    
    Regards,
    	Jeff Davis