Thread
-
Re: Change initdb default to the builtin collation provider
Jeff Davis <pgsql@j-davis.com> — 2025-10-31T21:30:19Z
On Fri, 2025-10-10 at 17:48 -0700, Jeff Davis wrote: > ------- > Summary > ------- > > The libc collation provider is a bad default[1]. The builtin > collation > provider is a good default, so let's use that. The attached patches implement a more modest proposal which does not conflict with Peter's objection about the display order: 0001: If the encoding is unspecified, and cannot be determined from the locale (i.e. the locale is C), then use UTF-8 rather than SQL_ASCII. 0002: If the provider is unspecified, and the locale is C or C.UTF-8, then use the builtin provider. Motivation: * UTF-8 seems safer than SQL_ASCII when the locale is compatible with either. * Whether the "C" locale uses the builtin provider or the libc provider is mostly about the catalog representation, because the implementation is the same. I don't have a strong motivation for this change, it just clarifies that libc is not actually being used when the locale is "C". * I think most users of the "C.UTF-8" locale would be better off with the builtin provider, which benefits from important optimizations. Note: This would mean that "initdb --no-locale" would select UTF-8 and the builtin provider with locale "C", whereas previously it would have selected SQL_ASCII and the libc provider (though it didn't ever really use libc internally). I'm not sure if others want this behavior or if it would be surprising. Regards, Jeff Davis