Re: Change initdb default to the builtin collation provider

Jeff Davis <pgsql@j-davis.com>

From: Jeff Davis <pgsql@j-davis.com>
To: pgsql-hackers@postgresql.org
Date: 2025-10-31T21:30:19Z
Lists: pgsql-hackers

Attachments

On Fri, 2025-10-10 at 17:48 -0700, Jeff Davis wrote:
> -------
> Summary
> -------
> 
> The libc collation provider is a bad default[1]. The builtin
> collation
> provider is a good default, so let's use that.

The attached patches implement a more modest proposal which does not
conflict with Peter's objection about the display order:

0001: If the encoding is unspecified, and cannot be determined from the
locale (i.e. the locale is C), then use UTF-8 rather than SQL_ASCII.

0002: If the provider is unspecified, and the locale is C or C.UTF-8,
then use the builtin provider.

Motivation:

* UTF-8 seems safer than SQL_ASCII when the locale is compatible with
either.

* Whether the "C" locale uses the builtin provider or the libc provider
is mostly about the catalog representation, because the implementation
is the same. I don't have a strong motivation for this change, it just
clarifies that libc is not actually being used when the locale is "C".

* I think most users of the "C.UTF-8" locale would be better off with
the builtin provider, which benefits from important optimizations.

Note:

This would mean that "initdb --no-locale" would select UTF-8 and the
builtin provider with locale "C", whereas previously it would have
selected SQL_ASCII and the libc provider (though it didn't ever really
use libc internally). I'm not sure if others want this behavior or if
it would be surprising.

Regards,
	Jeff Davis