Thread
-
Re: Change initdb default to the builtin collation provider
Jeff Davis <pgsql@j-davis.com> — 2025-10-24T16:54:45Z
On Fri, 2025-10-17 at 15:02 -0700, Jeff Davis wrote: > On Fri, 2025-10-17 at 17:23 +0200, Peter Eisentraut wrote: > > I remain violently opposed to this idea. I don't understand how it > > could be acceptable to just not provide a good display order by > > default > > and have everyone rewrite their queries. > > I assume that you favor alternative 3 listed here[1], which is to use > ICU "und" as the default. Is that correct? Or do you prefer to get > the > locale from the environment at initdb time? Right now we're still stuck with the worst possible default: libc. Can you make a more concrete counter-proposal here that sorts through some of the details? * Should we base the ICU locale on the environment, or just default everyone to the "und" locale? * If ICU support is disabled, how does that affect the defaults? * If using the environment, what happens if the locale is not supported by ICU (in particular "C" or "C.UTF-8")? * What would be the default encoding, or should that come from the environment? * The ICU provider has some weaknesses around non-UTF8 encodings because of casts from wchar_t and the use of tolower() in downcase_identifier(). Are those potential blockers, and if so, are they fixable? * Can we try harder to find an acceptable way to use memcmp() for the indexes by default, at least primary keys, even if the database collation is ICU? I know that I've argued for this in the past and it's been soundly rejected[1], but some variation on this idea could be worthy of consideration. Regards, Jeff Davis [1] https://www.postgresql.org/message-id/b7a9f32eee8d24518f791168bc6fb653d1f95f4d.camel@j-davis.com