Thread

  1. Re: Change initdb default to the builtin collation provider

    Jeff Davis <pgsql@j-davis.com> — 2025-10-24T16:54:45Z

    On Fri, 2025-10-17 at 15:02 -0700, Jeff Davis wrote:
    > On Fri, 2025-10-17 at 17:23 +0200, Peter Eisentraut wrote:
    > > I remain violently opposed to this idea.  I don't understand how it
    > > could be acceptable to just not provide a good display order by
    > > default 
    > > and have everyone rewrite their queries.
    > 
    > I assume that you favor alternative 3 listed here[1], which is to use
    > ICU "und" as the default. Is that correct? Or do you prefer to get
    > the
    > locale from the environment at initdb time?
    
    Right now we're still stuck with the worst possible default: libc. Can
    you make a more concrete counter-proposal here that sorts through some
    of the details?
    
    * Should we base the ICU locale on the environment, or just default
    everyone to the "und" locale?
    
    * If ICU support is disabled, how does that affect the defaults?
    
    * If using the environment, what happens if the locale is not supported
    by ICU (in particular "C" or "C.UTF-8")?
    
    * What would be the default encoding, or should that come from the
    environment?
    
    * The ICU provider has some weaknesses around non-UTF8 encodings
    because of casts from wchar_t and the use of tolower() in
    downcase_identifier(). Are those potential blockers, and if so, are
    they fixable?
    
    * Can we try harder to find an acceptable way to use memcmp() for the
    indexes by default, at least primary keys, even if the database
    collation is ICU? I know that I've argued for this in the past and it's
    been soundly rejected[1], but some variation on this idea could be
    worthy of consideration.
    
    Regards,
    	Jeff Davis
    
    [1]
    https://www.postgresql.org/message-id/b7a9f32eee8d24518f791168bc6fb653d1f95f4d.camel@j-davis.com