Thread

  1. Re: Change initdb default to the builtin collation provider

    Jeff Davis <pgsql@j-davis.com> — 2025-10-31T21:30:19Z

    On Fri, 2025-10-10 at 17:48 -0700, Jeff Davis wrote:
    > -------
    > Summary
    > -------
    > 
    > The libc collation provider is a bad default[1]. The builtin
    > collation
    > provider is a good default, so let's use that.
    
    The attached patches implement a more modest proposal which does not
    conflict with Peter's objection about the display order:
    
    0001: If the encoding is unspecified, and cannot be determined from the
    locale (i.e. the locale is C), then use UTF-8 rather than SQL_ASCII.
    
    0002: If the provider is unspecified, and the locale is C or C.UTF-8,
    then use the builtin provider.
    
    Motivation:
    
    * UTF-8 seems safer than SQL_ASCII when the locale is compatible with
    either.
    
    * Whether the "C" locale uses the builtin provider or the libc provider
    is mostly about the catalog representation, because the implementation
    is the same. I don't have a strong motivation for this change, it just
    clarifies that libc is not actually being used when the locale is "C".
    
    * I think most users of the "C.UTF-8" locale would be better off with
    the builtin provider, which benefits from important optimizations.
    
    Note:
    
    This would mean that "initdb --no-locale" would select UTF-8 and the
    builtin provider with locale "C", whereas previously it would have
    selected SQL_ASCII and the libc provider (though it didn't ever really
    use libc internally). I'm not sure if others want this behavior or if
    it would be surprising.
    
    Regards,
    	Jeff Davis