Thread

  1. NLS: use gettext() to translate system error messages

    Jeff Davis <pgsql@j-davis.com> — 2025-10-23T22:53:51Z

    This is related to my effort to remove the global LC_CTYPE dependency,
    and set the global LC_CTYPE to C.
    
    The replacement of "%m" (e.g. with "Permission denied" if
    errno==EACCES) in a message is done using strerror_r(), which sometimes
    does translation. If it does translate, strerror uses LC_CTYPE to
    determine the target encoding, and LC_MESSAGES to determine the
    language/region. (It appears that strerror translation only happens on
    Linux -- corrent me if I'm wrong.)
    
    Currently, strerror translation is orthogonal to our NLS system which
    translates Postgres messages (e.g. "division by zero") using gettext
    along with our own translations (.po files). The Postgres messages
    might be translated but not the "%m" replacements, or vice-versa,
    depending on whether NLS is enabled, the OS, etc.
    
    The attached patch changes "%m" replacements to use gettext for
    translation. That makes the overall translations more consistent,
    equally available on all platforms, and not dependent on LC_CTYPE
    (because gettext allows the encoding for gettext can be set separately
    with bind_textdomain_codeset()).
    
    It also fixes an issue with translations when LC_CTYPE=C, where
    strerror can't find the target encoding, so it forces the translated
    message into ASCII even if the database encoding supports all of the
    resulting characters. For instance, if LC_CTYPE=C and
    LC_MESSAGES=fr_FR.UTF-8 and errno=EACCES and the database encoding is
    UTF-8, you get:
    
       Permission non accord?e
    
    instead of:
    
       Permission non accordée
    
    I also attached a C file for testing, which generates the messages and
    translations for a range of errnos, and outputs in .po format. As
    mentioned earlier, I think the only OS that does any translation of
    these messages is linux, but corrections are welcome.
    
    One downside is that there are more messages to translate -- one per
    errno that Postgres might plausibly encounter, plus a few more for
    variations between platforms.
    
    Comments?
    
    Regards,
    	Jeff Davis
    
    
  2. Re: NLS: use gettext() to translate system error messages

    Álvaro Herrera <alvherre@kurilemu.de> — 2025-10-27T13:10:07Z

    On 2025-Oct-23, Jeff Davis wrote:
    
    > The attached patch changes "%m" replacements to use gettext for
    > translation. That makes the overall translations more consistent,
    > equally available on all platforms, and not dependent on LC_CTYPE
    > (because gettext allows the encoding for gettext can be set separately
    > with bind_textdomain_codeset()).
    
    Hmm, interesting idea.  I think the most difficult part is obtaining the
    source strings: we need to run your errno_translation.c program on _all_
    platforms, merge the output files together, and then create a single
    errstrings.po file with all the variations, to reside on our source
    tree, which would be given to translators.
    
    Also we need a separate step to create the final postgres.po by
    catenating the existing postgres.po with the new errstrings.po; this
    should not occur in the source tree but rather at install time, because
    of course pg_dump.po is going to have to do the same, and we don't need
    to make translators responsible for propagating translations from one
    file to others; that occurs already to a very small scale with the
    src/common files and I hate it, so I wouldn't want to see it happening
    with this much larger set of strings.
    
    BTW looking at the output of that program I realized that with
    _GNU_SOURCE, there's strerrorname_np() which can be helpful to generate
    the new file in a way that doesn't require you to have all these E
    constants in the program.  Not sure if other platforms have equivalent
    gadgets; but without that I get entries like
    
    	#. (null)
    	msgid "Object is remote"
    	msgstr "El objeto es remoto"
    
    the (null) bit should perhaps be avoided anyhow.
    
    FWIW the last valid errno I get having patched to use strerrname_np() is
    133.
    
    	$ ./a.out 0 135
    	#. 0
    	msgid "Success"
    	msgstr "Conseguido"
    
    	...
    
    	#. EHWPOISON
    	msgid "Memory page has hardware error"
    	msgstr "La página de memoria tiene un error de hardware"
    
    	#. (null)
    	msgid "Unknown error 134"
    	msgstr "Error desconocido 134"
    
    (I think the exit condition of that loop should be "i <= max_err",
    otherwise it's confusing.)
    
    > One downside is that there are more messages to translate -- one per
    > errno that Postgres might plausibly encounter,
    
    It's not all that many messages, and they only have to be translated
    once, so I think this shouldn't be too much of an issue.
    
    -- 
    Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
    
    
    
    
  3. Re: NLS: use gettext() to translate system error messages

    Jeff Davis <pgsql@j-davis.com> — 2025-10-27T20:06:31Z

    On Mon, 2025-10-27 at 15:10 +0200, Álvaro Herrera wrote:
    > Hmm, interesting idea.  I think the most difficult part is obtaining
    > the
    > source strings: we need to run your errno_translation.c program on
    > _all_
    > platforms,
    
    I have attached .po files for the standard set of errnos (those
    recognized by strerror.c:get_errno_symbol()) on linux+glibc,
    linux+musl, freebsd, and mac.
    
    Windows and solaris/illumos are missing, and perhaps some other
    variations too (e.g. the other BSDs).
    
    If we need to merge the files, let me know what format you have in mind
    for the resulting file. I was thinking something like:
    
       #. EIO (linux+glibc, freebsd, mac)
       msgid "Input/output error"
       msgstr "Input/output error"
    
       #. EIO (linux+musl)
       msgid "I/O error"
       msgstr "I/O error"
    
    We might not want to get too detailed with the comments, but it would
    be nice to have a hint of where it might have come from.
    
    >  merge the output files together, and then create a single
    > errstrings.po file with all the variations, to reside on our source
    > tree, which would be given to translators.
    > 
    > Also we need a separate step to create the final postgres.po by
    > catenating the existing postgres.po with the new errstrings.po; this
    > should not occur in the source tree but rather at install time,
    > because
    > of course pg_dump.po is going to have to do the same, and we don't
    > need
    > to make translators responsible for propagating translations from one
    > file to others; that occurs already to a very small scale with the
    > src/common files and I hate it, so I wouldn't want to see it
    > happening
    > with this much larger set of strings.
    
    I'm not familiar with the tooling in this area, but I can take a look
    into it. Would it affect packagers?
    
    > BTW looking at the output of that program I realized that with
    > _GNU_SOURCE, there's strerrorname_np() which can be helpful to
    > generate
    > the new file in a way that doesn't require you to have all these E
    > constants in the program.
    
    I just borrowed get_errno_symbol from strerror.c. It doesn't have the
    nonstandard errnos, though, so I used strerrorname_np() to generate
    only the nonstandard errnos and attached the result in errstrings-
    linux-glibc-np.po.
    
    I included that file so we can see if there are nonstandard errnos that
    we really want to translate.
    
    > the (null) bit should perhaps be avoided anyhow.
    
    Done.
    
    > (I think the exit condition of that loop should be "i <= max_err",
    > otherwise it's confusing.)
    
    Done. The the new C file also uses tab-delimited lines, to make it
    easier to sort by the symbolic name before creating the .po file.
    
    > > One downside is that there are more messages to translate -- one
    > > per
    > > errno that Postgres might plausibly encounter,
    > 
    > It's not all that many messages, and they only have to be translated
    > once, so I think this shouldn't be too much of an issue.
    
    Great, thank you.
    
    Also, do you think it's fine to use the static variable (as in the
    patch) for newlocale() in any NLS-enabled binary? I think it should be
    fine because it's only done for platforms with HAVE_USELOCALE.
    
    Regards,
    	Jeff Davis
    
    
  4. Re: NLS: use gettext() to translate system error messages

    Jeff Davis <pgsql@j-davis.com> — 2025-12-23T18:46:08Z

    On Mon, 2025-10-27 at 13:06 -0700, Jeff Davis wrote:
    > On Mon, 2025-10-27 at 15:10 +0200, Álvaro Herrera wrote:
    > > Hmm, interesting idea.  I think the most difficult part is
    > > obtaining
    > > the
    > > source strings: we need to run your errno_translation.c program on
    > > _all_
    > > platforms,
    > 
    > I have attached .po files for the standard set of errnos (those
    > recognized by strerror.c:get_errno_symbol()) on linux+glibc,
    > linux+musl, freebsd, and mac.
    > 
    > Windows and solaris/illumos are missing, and perhaps some other
    > variations too (e.g. the other BSDs).
    
    Is this going in the right direction?
    
    And generally, is NLS translation of system messages wanted at all, or
    are ASCII messages more convenient anyway (given that it's just a
    simple text representation of errno)?
    
    If we don't actually want translation of the system messages, then do
    we want to take the part of this patch that switches to the C locale,
    so that it consistently uses ASCII messages across platforms?
    
    The status quo seems like an awkward middle ground, where the system
    messages are only translated on some platforms (perhaps only glibc?);
    and whether they are translated or not is independent of whether
    Postgres was compiled with NLS, which can lead to partially-translated
    messages.
    
    For instance, on linux/glibc if NLS is not enabled, you can end up with
    messages like:
    
      ERROR:  could not open file "/etc/shadow" for reading: Permission non
    accordée
    
    AFAICT it makes zero sense to translate the errno message but not
    translate the more interesting Postgres message.
    
    > > Also we need a separate step to create the final postgres.po by
    > > catenating the existing postgres.po with the new errstrings.po;
    > > this
    > > should not occur in the source tree but rather at install time,
    > > because
    > > of course pg_dump.po is going to have to do the same, and we don't
    > > need
    > > to make translators responsible for propagating translations from
    > > one
    > > file to others; that occurs already to a very small scale with the
    > > src/common files and I hate it, so I wouldn't want to see it
    > > happening
    > > with this much larger set of strings.
    > 
    > I'm not familiar with the tooling in this area, but I can take a look
    > into it. Would it affect packagers?
    
    Would someone be willing to help here?
    
    Attached new version; trivial rebase only.
    
    > 
    Regards,
    	Jeff Davis
    
    
  5. Re: NLS: use gettext() to translate system error messages

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-12-23T20:07:07Z

    Jeff Davis <pgsql@j-davis.com> writes:
    > Is this going in the right direction?
    
    > And generally, is NLS translation of system messages wanted at all, or
    > are ASCII messages more convenient anyway (given that it's just a
    > simple text representation of errno)?
    
    I do not like putting snprintf.c in charge of this, for certain.
    That seems just plain nasty from a modularity/layering standpoint.
    Also, the proposed implementation is not thread-safe, which is bad
    right now on client-side regardless of whether it will be bad in
    the future server-side.
    
    > The status quo seems like an awkward middle ground, where the system
    > messages are only translated on some platforms (perhaps only glibc?);
    
    Well, they're translated if strerror() responds to LC_MESSAGES [1].
    If it doesn't, then the users of that platform are unaccustomed to
    seeing translated errno strings, and they are unlikely to thank us
    for behaving differently from every other program on the platform.
    
    So I don't really see any reason to think this proposal is an
    improvement over what we have.
    
    			regards, tom lane
    
    [1] Or at least that's the intent ... but I don't see translation
    happening in HEAD on my Linux box:
    
    regression=# create table zed(f1 text);
    CREATE TABLE
    regression=# copy zed from '/etc/shadow';
    ERROR:  could not open file "/etc/shadow" for reading: Permission denied
    HINT:  COPY FROM instructs the PostgreSQL server process to read a file. You may want a client-side facility such as psql's \copy.
    regression=# set lc_messages = 'es_ES';
    SET
    regression=# copy zed from '/etc/shadow';
    ERROR:  no se pudo abrir archivo <</etc/shadow>> para lectura: Permission denied
    HINT:  COPY FROM indica al proceso servidor de PostgreSQL leer un archivo. Puede desear usar una facilidad del lado del cliente como \copy de psql.
    
    This surprises me, because pg_locale.c sets LC_MESSAGES "for real"
    precisely so that strerror() will see it.  We should look into
    what is happening there.
    
    
    
    
  6. Re: NLS: use gettext() to translate system error messages

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-12-23T20:21:10Z

    I wrote:
    > [1] Or at least that's the intent ... but I don't see translation
    > happening in HEAD on my Linux box:
    
    Huh ... it works fine on another nearby RHEL machine:
    
    regression=# copy zed from '/etc/shadow';
    ERROR:  no se pudo abrir archivo «/etc/shadow» para lectura: Permiso denegado
    HINT:  COPY FROM indica al proceso servidor de PostgreSQL leer un archivo. Puede desear usar una facilidad del lado del cliente como \copy de psql.
    
    But poking a little harder, the same behavior applies in other
    programs:
    
    RHEL8 box:
    
    $ LANG=es_ES.utf8 sed 's/x/y/' /etc/shadow
    sed: no se puede leer /etc/shadow: Permission denied
    
    RHEL9 box:
    
    $ LANG=es_ES.utf8 sed 's/x/y/' /etc/shadow
    sed: no se puede leer /etc/shadow: Permiso denegado
    
    Surely RHEL8 does not pre-date glibc's ability to translate messages.
    I suspect I have some system-wide setting for this, or maybe a
    missing package on that machine?  But anyway, I think this reinforces
    my point that we should (and do) act similarly to other programs.
    
    			regards, tom lane
    
    
    
    
  7. Re: NLS: use gettext() to translate system error messages

    Jeff Davis <pgsql@j-davis.com> — 2025-12-26T19:32:30Z

    On Tue, 2025-12-23 at 15:07 -0500, Tom Lane wrote:
    > This surprises me, because pg_locale.c sets LC_MESSAGES "for real"
    > precisely so that strerror() will see it.
    
    Isn't LC_MESSAGES also necessary for gettext()?
    
    If it's only strerror() we care about, then we could use uselocale()
    instead, because the platforms that don't support uselocale() also
    don't seem to do translation in strerror(). (I think only glibc
    translates through strerror(), though I've seen hints that Solaris may
    also.)
    
    Regards,
    	Jeff Davis
    
    
    
    
    
  8. Re: NLS: use gettext() to translate system error messages

    Jeff Davis <pgsql@j-davis.com> — 2025-12-26T19:33:21Z

    On Tue, 2025-12-23 at 15:21 -0500, Tom Lane wrote:
    > Surely RHEL8 does not pre-date glibc's ability to translate messages.
    > I suspect I have some system-wide setting for this, or maybe a
    > missing package on that machine?
    
    Probably a missing language package.
    
    >   But anyway, I think this reinforces
    > my point that we should (and do) act similarly to other programs.
    
    It depends on the perspective. For a system administrator, what you say
    makes sense. But from a Postgres user who is expecting consistent
    translation, it can be a bit mysterious. And from an engineering
    standpoint, translation through strerror() is not tested and -- as far
    as I can tell -- only works on glibc.
    
    Regards,
    	Jeff Davis