Обсуждение: NLS: use gettext() to translate system error messages

Поиск

Список

Период

Сортировка

NLS: use gettext() to translate system error messages

От

Jeff Davis

Дата:

24 октября 2025 г., 01:53:51

This is related to my effort to remove the global LC_CTYPE dependency,
and set the global LC_CTYPE to C.

The replacement of "%m" (e.g. with "Permission denied" if
errno==EACCES) in a message is done using strerror_r(), which sometimes
does translation. If it does translate, strerror uses LC_CTYPE to
determine the target encoding, and LC_MESSAGES to determine the
language/region. (It appears that strerror translation only happens on
Linux -- corrent me if I'm wrong.)

Currently, strerror translation is orthogonal to our NLS system which
translates Postgres messages (e.g. "division by zero") using gettext
along with our own translations (.po files). The Postgres messages
might be translated but not the "%m" replacements, or vice-versa,
depending on whether NLS is enabled, the OS, etc.

The attached patch changes "%m" replacements to use gettext for
translation. That makes the overall translations more consistent,
equally available on all platforms, and not dependent on LC_CTYPE
(because gettext allows the encoding for gettext can be set separately
with bind_textdomain_codeset()).

It also fixes an issue with translations when LC_CTYPE=C, where
strerror can't find the target encoding, so it forces the translated
message into ASCII even if the database encoding supports all of the
resulting characters. For instance, if LC_CTYPE=C and
LC_MESSAGES=fr_FR.UTF-8 and errno=EACCES and the database encoding is
UTF-8, you get:

   Permission non accord?e

instead of:

   Permission non accordée

I also attached a C file for testing, which generates the messages and
translations for a range of errnos, and outputs in .po format. As
mentioned earlier, I think the only OS that does any translation of
these messages is linux, but corrections are welcome.

One downside is that there are more messages to translate -- one per
errno that Postgres might plausibly encounter, plus a few more for
variations between platforms.

Comments?

Regards,
    Jeff Davis

Вложения

Re: NLS: use gettext() to translate system error messages

От

Álvaro Herrera

Дата:

27 октября 2025 г., 16:10:07

On 2025-Oct-23, Jeff Davis wrote:

> The attached patch changes "%m" replacements to use gettext for
> translation. That makes the overall translations more consistent,
> equally available on all platforms, and not dependent on LC_CTYPE
> (because gettext allows the encoding for gettext can be set separately
> with bind_textdomain_codeset()).

Hmm, interesting idea.  I think the most difficult part is obtaining the
source strings: we need to run your errno_translation.c program on _all_
platforms, merge the output files together, and then create a single
errstrings.po file with all the variations, to reside on our source
tree, which would be given to translators.

Also we need a separate step to create the final postgres.po by
catenating the existing postgres.po with the new errstrings.po; this
should not occur in the source tree but rather at install time, because
of course pg_dump.po is going to have to do the same, and we don't need
to make translators responsible for propagating translations from one
file to others; that occurs already to a very small scale with the
src/common files and I hate it, so I wouldn't want to see it happening
with this much larger set of strings.

BTW looking at the output of that program I realized that with
_GNU_SOURCE, there's strerrorname_np() which can be helpful to generate
the new file in a way that doesn't require you to have all these E
constants in the program.  Not sure if other platforms have equivalent
gadgets; but without that I get entries like

    #. (null)
    msgid "Object is remote"
    msgstr "El objeto es remoto"

the (null) bit should perhaps be avoided anyhow.

FWIW the last valid errno I get having patched to use strerrname_np() is
133.

    $ ./a.out 0 135
    #. 0
    msgid "Success"
    msgstr "Conseguido"

    ...

    #. EHWPOISON
    msgid "Memory page has hardware error"
    msgstr "La página de memoria tiene un error de hardware"

    #. (null)
    msgid "Unknown error 134"
    msgstr "Error desconocido 134"

(I think the exit condition of that loop should be "i <= max_err",
otherwise it's confusing.)

> One downside is that there are more messages to translate -- one per
> errno that Postgres might plausibly encounter,

It's not all that many messages, and they only have to be translated
once, so I think this shouldn't be too much of an issue.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: NLS: use gettext() to translate system error messages

От

Jeff Davis

Дата:

27 октября 2025 г., 23:06:31

On Mon, 2025-10-27 at 15:10 +0200, Álvaro Herrera wrote:
> Hmm, interesting idea.  I think the most difficult part is obtaining
> the
> source strings: we need to run your errno_translation.c program on
> _all_
> platforms,

I have attached .po files for the standard set of errnos (those
recognized by strerror.c:get_errno_symbol()) on linux+glibc,
linux+musl, freebsd, and mac.

Windows and solaris/illumos are missing, and perhaps some other
variations too (e.g. the other BSDs).

If we need to merge the files, let me know what format you have in mind
for the resulting file. I was thinking something like:

   #. EIO (linux+glibc, freebsd, mac)
   msgid "Input/output error"
   msgstr "Input/output error"

   #. EIO (linux+musl)
   msgid "I/O error"
   msgstr "I/O error"

We might not want to get too detailed with the comments, but it would
be nice to have a hint of where it might have come from.

>  merge the output files together, and then create a single
> errstrings.po file with all the variations, to reside on our source
> tree, which would be given to translators.
>
> Also we need a separate step to create the final postgres.po by
> catenating the existing postgres.po with the new errstrings.po; this
> should not occur in the source tree but rather at install time,
> because
> of course pg_dump.po is going to have to do the same, and we don't
> need
> to make translators responsible for propagating translations from one
> file to others; that occurs already to a very small scale with the
> src/common files and I hate it, so I wouldn't want to see it
> happening
> with this much larger set of strings.

I'm not familiar with the tooling in this area, but I can take a look
into it. Would it affect packagers?

> BTW looking at the output of that program I realized that with
> _GNU_SOURCE, there's strerrorname_np() which can be helpful to
> generate
> the new file in a way that doesn't require you to have all these E
> constants in the program.

I just borrowed get_errno_symbol from strerror.c. It doesn't have the
nonstandard errnos, though, so I used strerrorname_np() to generate
only the nonstandard errnos and attached the result in errstrings-
linux-glibc-np.po.

I included that file so we can see if there are nonstandard errnos that
we really want to translate.

> the (null) bit should perhaps be avoided anyhow.

Done.

> (I think the exit condition of that loop should be "i <= max_err",
> otherwise it's confusing.)

Done. The the new C file also uses tab-delimited lines, to make it
easier to sort by the symbolic name before creating the .po file.

> > One downside is that there are more messages to translate -- one
> > per
> > errno that Postgres might plausibly encounter,
>
> It's not all that many messages, and they only have to be translated
> once, so I think this shouldn't be too much of an issue.

Great, thank you.

Also, do you think it's fine to use the static variable (as in the
patch) for newlocale() in any NLS-enabled binary? I think it should be
fine because it's only done for platforms with HAVE_USELOCALE.

Regards,
    Jeff Davis

Вложения

Re: NLS: use gettext() to translate system error messages

От

Jeff Davis

Дата:

23 декабря 2025 г., 21:46:08

On Mon, 2025-10-27 at 13:06 -0700, Jeff Davis wrote:
> On Mon, 2025-10-27 at 15:10 +0200, Álvaro Herrera wrote:
> > Hmm, interesting idea.  I think the most difficult part is
> > obtaining
> > the
> > source strings: we need to run your errno_translation.c program on
> > _all_
> > platforms,
>
> I have attached .po files for the standard set of errnos (those
> recognized by strerror.c:get_errno_symbol()) on linux+glibc,
> linux+musl, freebsd, and mac.
>
> Windows and solaris/illumos are missing, and perhaps some other
> variations too (e.g. the other BSDs).

Is this going in the right direction?

And generally, is NLS translation of system messages wanted at all, or
are ASCII messages more convenient anyway (given that it's just a
simple text representation of errno)?

If we don't actually want translation of the system messages, then do
we want to take the part of this patch that switches to the C locale,
so that it consistently uses ASCII messages across platforms?

The status quo seems like an awkward middle ground, where the system
messages are only translated on some platforms (perhaps only glibc?);
and whether they are translated or not is independent of whether
Postgres was compiled with NLS, which can lead to partially-translated
messages.

For instance, on linux/glibc if NLS is not enabled, you can end up with
messages like:

  ERROR:  could not open file "/etc/shadow" for reading: Permission non
accordée

AFAICT it makes zero sense to translate the errno message but not
translate the more interesting Postgres message.

> > Also we need a separate step to create the final postgres.po by
> > catenating the existing postgres.po with the new errstrings.po;
> > this
> > should not occur in the source tree but rather at install time,
> > because
> > of course pg_dump.po is going to have to do the same, and we don't
> > need
> > to make translators responsible for propagating translations from
> > one
> > file to others; that occurs already to a very small scale with the
> > src/common files and I hate it, so I wouldn't want to see it
> > happening
> > with this much larger set of strings.
>
> I'm not familiar with the tooling in this area, but I can take a look
> into it. Would it affect packagers?

Would someone be willing to help here?

Attached new version; trivial rebase only.

>
Regards,
    Jeff Davis

Вложения

v2-0001-NLS-use-gettext-to-translate-system-error-message.patch

Re: NLS: use gettext() to translate system error messages

От

Tom Lane

Дата:

23 декабря 2025 г., 23:07:07

Jeff Davis <pgsql@j-davis.com> writes:
> Is this going in the right direction?

> And generally, is NLS translation of system messages wanted at all, or
> are ASCII messages more convenient anyway (given that it's just a
> simple text representation of errno)?

I do not like putting snprintf.c in charge of this, for certain.
That seems just plain nasty from a modularity/layering standpoint.
Also, the proposed implementation is not thread-safe, which is bad
right now on client-side regardless of whether it will be bad in
the future server-side.

> The status quo seems like an awkward middle ground, where the system
> messages are only translated on some platforms (perhaps only glibc?);

Well, they're translated if strerror() responds to LC_MESSAGES [1].
If it doesn't, then the users of that platform are unaccustomed to
seeing translated errno strings, and they are unlikely to thank us
for behaving differently from every other program on the platform.

So I don't really see any reason to think this proposal is an
improvement over what we have.

            regards, tom lane

[1] Or at least that's the intent ... but I don't see translation
happening in HEAD on my Linux box:

regression=# create table zed(f1 text);
CREATE TABLE
regression=# copy zed from '/etc/shadow';
ERROR:  could not open file "/etc/shadow" for reading: Permission denied
HINT:  COPY FROM instructs the PostgreSQL server process to read a file. You may want a client-side facility such as
psql's\copy. 
regression=# set lc_messages = 'es_ES';
SET
regression=# copy zed from '/etc/shadow';
ERROR:  no se pudo abrir archivo <</etc/shadow>> para lectura: Permission denied
HINT:  COPY FROM indica al proceso servidor de PostgreSQL leer un archivo. Puede desear usar una facilidad del lado del
clientecomo \copy de psql. 

This surprises me, because pg_locale.c sets LC_MESSAGES "for real"
precisely so that strerror() will see it.  We should look into
what is happening there.

Re: NLS: use gettext() to translate system error messages

От

Tom Lane

Дата:

23 декабря 2025 г., 23:21:10

I wrote:
> [1] Or at least that's the intent ... but I don't see translation
> happening in HEAD on my Linux box:

Huh ... it works fine on another nearby RHEL machine:

regression=# copy zed from '/etc/shadow';
ERROR:  no se pudo abrir archivo «/etc/shadow» para lectura: Permiso denegado
HINT:  COPY FROM indica al proceso servidor de PostgreSQL leer un archivo. Puede desear usar una facilidad del lado del
clientecomo \copy de psql. 

But poking a little harder, the same behavior applies in other
programs:

RHEL8 box:

$ LANG=es_ES.utf8 sed 's/x/y/' /etc/shadow
sed: no se puede leer /etc/shadow: Permission denied

RHEL9 box:

$ LANG=es_ES.utf8 sed 's/x/y/' /etc/shadow
sed: no se puede leer /etc/shadow: Permiso denegado

Surely RHEL8 does not pre-date glibc's ability to translate messages.
I suspect I have some system-wide setting for this, or maybe a
missing package on that machine?  But anyway, I think this reinforces
my point that we should (and do) act similarly to other programs.

            regards, tom lane

Re: NLS: use gettext() to translate system error messages

От

Jeff Davis

Дата:

26 декабря 2025 г., 22:32:30

On Tue, 2025-12-23 at 15:07 -0500, Tom Lane wrote:
> This surprises me, because pg_locale.c sets LC_MESSAGES "for real"
> precisely so that strerror() will see it.

Isn't LC_MESSAGES also necessary for gettext()?

If it's only strerror() we care about, then we could use uselocale()
instead, because the platforms that don't support uselocale() also
don't seem to do translation in strerror(). (I think only glibc
translates through strerror(), though I've seen hints that Solaris may
also.)

Regards,
    Jeff Davis

Re: NLS: use gettext() to translate system error messages

От

Jeff Davis

Дата:

26 декабря 2025 г., 22:33:21

On Tue, 2025-12-23 at 15:21 -0500, Tom Lane wrote:
> Surely RHEL8 does not pre-date glibc's ability to translate messages.
> I suspect I have some system-wide setting for this, or maybe a
> missing package on that machine?

Probably a missing language package.

>   But anyway, I think this reinforces
> my point that we should (and do) act similarly to other programs.

It depends on the perspective. For a system administrator, what you say
makes sense. But from a Postgres user who is expecting consistent
translation, it can be a bit mysterious. And from an engineering
standpoint, translation through strerror() is not tested and -- as far
as I can tell -- only works on glibc.

Regards,
    Jeff Davis

Re: NLS: use gettext() to translate system error messages

От

Jeff Davis

Дата:

06 января, 22:54:29

On Fri, 2025-12-26 at 11:32 -0800, Jeff Davis wrote:
> Isn't LC_MESSAGES also necessary for gettext()?
>
> If it's only strerror() we care about, then we could use uselocale()
> instead, because the platforms that don't support uselocale() also
> don't seem to do translation in strerror(). (I think only glibc
> translates through strerror(), though I've seen hints that Solaris
> may
> also.)

There seems to be no thread-safe way on NetBSD to use gettext() with a
specific LC_MESSAGES setting, which is unfortunate -- except maybe
wrapping it in a mutex and using setlocale().

I'll briefly summarize the constraints (as far as I can tell), which
may be useful if we are considering changes in this area:

* Windows and NetBSD don't support uselocale(); other platforms do
(though maybe not older versions?).
  - Windows supports _configthreadlocale(_ENABLE_PER_THREAD_LOCALE)
  - On NetBSD, I think the only thread-safe option is to wrap the
    function in a mutex and setlocale().

* strerror() on glibc (and maybe one or two other implementations?)
cares about LC_CTYPE and LC_MESSAGES, but on other platforms it just
returns ASCII. strerror_l() is supported on most platforms, but not
windows. Translation is done regardless of NLS, but dependent on the
libc implementation and installed packages.

* gettext() cares about LC_MESSAGES but not LC_CTYPE (the encoding is
specified separately). Translation is done iff compiled with NLS.

Regards,
    Jeff Davis

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: NLS: use gettext() to translate system error messages

Вложения

Вложения

Вложения