Обсуждение: NLS: use gettext() to translate system error messages

Поиск
Список
Период
Сортировка

NLS: use gettext() to translate system error messages

От
Jeff Davis
Дата:
This is related to my effort to remove the global LC_CTYPE dependency,
and set the global LC_CTYPE to C.

The replacement of "%m" (e.g. with "Permission denied" if
errno==EACCES) in a message is done using strerror_r(), which sometimes
does translation. If it does translate, strerror uses LC_CTYPE to
determine the target encoding, and LC_MESSAGES to determine the
language/region. (It appears that strerror translation only happens on
Linux -- corrent me if I'm wrong.)

Currently, strerror translation is orthogonal to our NLS system which
translates Postgres messages (e.g. "division by zero") using gettext
along with our own translations (.po files). The Postgres messages
might be translated but not the "%m" replacements, or vice-versa,
depending on whether NLS is enabled, the OS, etc.

The attached patch changes "%m" replacements to use gettext for
translation. That makes the overall translations more consistent,
equally available on all platforms, and not dependent on LC_CTYPE
(because gettext allows the encoding for gettext can be set separately
with bind_textdomain_codeset()).

It also fixes an issue with translations when LC_CTYPE=C, where
strerror can't find the target encoding, so it forces the translated
message into ASCII even if the database encoding supports all of the
resulting characters. For instance, if LC_CTYPE=C and
LC_MESSAGES=fr_FR.UTF-8 and errno=EACCES and the database encoding is
UTF-8, you get:

   Permission non accord?e

instead of:

   Permission non accordée

I also attached a C file for testing, which generates the messages and
translations for a range of errnos, and outputs in .po format. As
mentioned earlier, I think the only OS that does any translation of
these messages is linux, but corrections are welcome.

One downside is that there are more messages to translate -- one per
errno that Postgres might plausibly encounter, plus a few more for
variations between platforms.

Comments?

Regards,
    Jeff Davis


Вложения

Re: NLS: use gettext() to translate system error messages

От
Álvaro Herrera
Дата:
On 2025-Oct-23, Jeff Davis wrote:

> The attached patch changes "%m" replacements to use gettext for
> translation. That makes the overall translations more consistent,
> equally available on all platforms, and not dependent on LC_CTYPE
> (because gettext allows the encoding for gettext can be set separately
> with bind_textdomain_codeset()).

Hmm, interesting idea.  I think the most difficult part is obtaining the
source strings: we need to run your errno_translation.c program on _all_
platforms, merge the output files together, and then create a single
errstrings.po file with all the variations, to reside on our source
tree, which would be given to translators.

Also we need a separate step to create the final postgres.po by
catenating the existing postgres.po with the new errstrings.po; this
should not occur in the source tree but rather at install time, because
of course pg_dump.po is going to have to do the same, and we don't need
to make translators responsible for propagating translations from one
file to others; that occurs already to a very small scale with the
src/common files and I hate it, so I wouldn't want to see it happening
with this much larger set of strings.

BTW looking at the output of that program I realized that with
_GNU_SOURCE, there's strerrorname_np() which can be helpful to generate
the new file in a way that doesn't require you to have all these E
constants in the program.  Not sure if other platforms have equivalent
gadgets; but without that I get entries like

    #. (null)
    msgid "Object is remote"
    msgstr "El objeto es remoto"

the (null) bit should perhaps be avoided anyhow.

FWIW the last valid errno I get having patched to use strerrname_np() is
133.

    $ ./a.out 0 135
    #. 0
    msgid "Success"
    msgstr "Conseguido"

    ...

    #. EHWPOISON
    msgid "Memory page has hardware error"
    msgstr "La página de memoria tiene un error de hardware"

    #. (null)
    msgid "Unknown error 134"
    msgstr "Error desconocido 134"

(I think the exit condition of that loop should be "i <= max_err",
otherwise it's confusing.)

> One downside is that there are more messages to translate -- one per
> errno that Postgres might plausibly encounter,

It's not all that many messages, and they only have to be translated
once, so I think this shouldn't be too much of an issue.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: NLS: use gettext() to translate system error messages

От
Jeff Davis
Дата:
On Mon, 2025-10-27 at 15:10 +0200, Álvaro Herrera wrote:
> Hmm, interesting idea.  I think the most difficult part is obtaining
> the
> source strings: we need to run your errno_translation.c program on
> _all_
> platforms,

I have attached .po files for the standard set of errnos (those
recognized by strerror.c:get_errno_symbol()) on linux+glibc,
linux+musl, freebsd, and mac.

Windows and solaris/illumos are missing, and perhaps some other
variations too (e.g. the other BSDs).

If we need to merge the files, let me know what format you have in mind
for the resulting file. I was thinking something like:

   #. EIO (linux+glibc, freebsd, mac)
   msgid "Input/output error"
   msgstr "Input/output error"

   #. EIO (linux+musl)
   msgid "I/O error"
   msgstr "I/O error"

We might not want to get too detailed with the comments, but it would
be nice to have a hint of where it might have come from.

>  merge the output files together, and then create a single
> errstrings.po file with all the variations, to reside on our source
> tree, which would be given to translators.
>
> Also we need a separate step to create the final postgres.po by
> catenating the existing postgres.po with the new errstrings.po; this
> should not occur in the source tree but rather at install time,
> because
> of course pg_dump.po is going to have to do the same, and we don't
> need
> to make translators responsible for propagating translations from one
> file to others; that occurs already to a very small scale with the
> src/common files and I hate it, so I wouldn't want to see it
> happening
> with this much larger set of strings.

I'm not familiar with the tooling in this area, but I can take a look
into it. Would it affect packagers?

> BTW looking at the output of that program I realized that with
> _GNU_SOURCE, there's strerrorname_np() which can be helpful to
> generate
> the new file in a way that doesn't require you to have all these E
> constants in the program.

I just borrowed get_errno_symbol from strerror.c. It doesn't have the
nonstandard errnos, though, so I used strerrorname_np() to generate
only the nonstandard errnos and attached the result in errstrings-
linux-glibc-np.po.

I included that file so we can see if there are nonstandard errnos that
we really want to translate.

> the (null) bit should perhaps be avoided anyhow.

Done.

> (I think the exit condition of that loop should be "i <= max_err",
> otherwise it's confusing.)

Done. The the new C file also uses tab-delimited lines, to make it
easier to sort by the symbolic name before creating the .po file.

> > One downside is that there are more messages to translate -- one
> > per
> > errno that Postgres might plausibly encounter,
>
> It's not all that many messages, and they only have to be translated
> once, so I think this shouldn't be too much of an issue.

Great, thank you.

Also, do you think it's fine to use the static variable (as in the
patch) for newlocale() in any NLS-enabled binary? I think it should be
fine because it's only done for platforms with HAVE_USELOCALE.

Regards,
    Jeff Davis


Вложения