Thread

  1. Re: [PATCH] Fix severe performance regression with gettext 0.20+ on Windows

    Bryan Green <dbryan.green@gmail.com> — 2025-12-11T17:22:11Z

    On 12/11/2025 8:43 AM, Peter Eisentraut wrote:
    > On 10.12.25 01:45, Bryan Green wrote:
    >> The attached patch takes a pragmatic approach: for gettext 0.20.1+, we
    >> avoid triggering the bug by using Windows locale format instead of
    >> calling IsoLocaleName(). This works because gettext 0.20.1+ internally
    >> converts the Windows format back to POSIX for catalog lookups, whereas
    >> 0.19.8 and earlier need POSIX format directly.
    > 
    > I can confirm that this patch fixes the performance deviation from
    > activating --enable-nls on Windows (tested with MSYS2/UCRT64).
    > 
    > I wonder, this change that gettext did with the locale naming, does that
    > also affect what guidance we need to provide to users about how to
    > configure locale names?  For example, on a Unix-ish system, a user can
    > do something like initdb ... --lc-messages=de_DE.  What locale name
    > format do you need to use on Windows to get the translations to
    > activate?  Does this also depend on the gettext version?
    > 
    If the language catalogue is installed then they will get translated
    messages as expected.  The downside is that because they are passing a
    posix locale name then gettext will still do the enumeration everytime.
    This will have the negative performance impact.  The good news is that
    gettext has accepted my cache patch for their next release.  If a
    Windows system is configured with lc_messages="de_DE", but has the next
    release of gettext-- they should be fine.  If they don't have the next
    release of gettext-- they will notice the performance issue, but that
    can be fixed by just changing to from "de_DE" to the correct Windows
    locale name.
    
    
    Walk-through:
    
    
    1. LCID Lookup: get_lcid("de_DE")
       - Enumerates Windows locales looking for "de_DE"
       - Fails: Windows locales are named "German_Germany", not "de_DE"
       - Returns: 0
       - BUG: Doesn't cache the failure, repeats on every call (patched on
    next gettext release)
    
    2. Catalog Search: _nl_make_l10nflist()
       - Tries: locale/de_DE/LC_MESSAGES/postgres-19.mo (not found)
       - Tries: locale/de/LC_MESSAGES/postgres-19.mo (found!)
       - Loads German translations
       - Success!
    
    So, the user gets German messages (catalog fallback works) but
    performance is poor (LCID lookup repeats every time) because we don't
    cache the failed locale search.
    
    
    
    More detailed information for the curious:
    
    Even though get_lcid() returned 0, gettext continues with catalog lookup:
    
      Function: _nl_find_domain() and _nl_make_l10nflist()
      Location: gettext-runtime/intl/dcigettext.c and l10nflist.c
    
      Process:
        1. Parse "de_DE" into components:
           language = "de"
           territory = "DE"
           codeset = NULL
           modifier = NULL
    
        2. Try catalog paths in order (most specific to least specific):
    
           Try #1: language + territory + codeset + modifier
             Path: /share/locale/de_DE.UTF-8@euro/LC_MESSAGES/postgres-19.mo
             stat(): File not found
    
           Try #2: language + territory + codeset
             Path: /share/locale/de_DE.UTF-8/LC_MESSAGES/postgres-19.mo
             stat(): File not found
    
           Try #3: language + territory
             Path: /share/locale/de_DE/LC_MESSAGES/postgres-19.mo
             stat(): File not found (PostgreSQL doesn't ship de_DE)
    
           Try #4: language + codeset
             Path: /share/locale/de.UTF-8/LC_MESSAGES/postgres-19.mo
             stat(): File not found
    
           Try #5: language only
             Path: /share/locale/de/LC_MESSAGES/postgres-19.mo
             stat(): SUCCESS! File exists ✓
    
        3. Load catalog: _nl_load_domain()
           Parse .mo file, load German translations
    
        4. Look up message: _nl_find_msg()
           Binary search for "division by zero"
           Find translation: "Teilung durch Null"
    
        5. Return translated message
    
    
    You might be wondering what happens if the "de" catalog doesn't exist?
    It depends on whether the user has set the environment variable LANGUAGE
    for their preferred ordered list of languages.  On Windows you can also
    set this in the registry. Gettext figures this out.  If LANGUAGE is not
    set on Windows then Gettext uses GetUserDefaultUILanguage() to determine
    what locale to use. If everything fails, you would get back the msgid
    you sent in to start with...so, untranslated.
    
    -- 
    Bryan Green
    EDB: https://www.enterprisedb.com