Обсуждение: PRI?64 vs Visual Studio (2022)
Hello, If you're already aware of this and have taken it into account, please feel free to ignore this. As described in the recent commit a0ed19e0a9e, many %ll? format specifiers are being replaced with %<PRI?64>. I hadn’t paid much attention to this before, but I happened to check how this behaves on Windows, and it seems that with VS2022, PRId64 expands to "%lld". As a result, I suspect the gettext message catalog won't match these messages correctly. I haven't been able to build with -Dnls=enabled myself, but I did check the strings embedded in a binary compiled with VS2022, and they indeed use %lld. Just wanted to share this in case it’s helpful. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 31.03.25 08:28, Kyotaro Horiguchi wrote: > If you're already aware of this and have taken it into account, please > feel free to ignore this. > > As described in the recent commit a0ed19e0a9e, many %ll? format > specifiers are being replaced with %<PRI?64>. > > I hadn’t paid much attention to this before, but I happened to check > how this behaves on Windows, and it seems that with VS2022, PRId64 > expands to "%lld". As a result, I suspect the gettext message catalog > won't match these messages correctly. I think this is working correctly. Gettext has a built-in mechanism to translate the %<PRI...> back to the appropriate %lld or %ld. See also <https://www.gnu.org/software/gettext/manual/html_node/c_002dformat.html>.
On Wed, Apr 2, 2025 at 2:04 AM Peter Eisentraut <peter@eisentraut.org> wrote: > On 31.03.25 08:28, Kyotaro Horiguchi wrote: > > I hadn’t paid much attention to this before, but I happened to check > > how this behaves on Windows, and it seems that with VS2022, PRId64 > > expands to "%lld". As a result, I suspect the gettext message catalog > > won't match these messages correctly. > > I think this is working correctly. Gettext has a built-in mechanism to > translate the %<PRI...> back to the appropriate %lld or %ld. See also > <https://www.gnu.org/software/gettext/manual/html_node/c_002dformat.html>. Interesting report though. Commit 962da900 assumed that our in-tree printf implementation still needed to understand that %I64 stuff in case it came to us from system headers, but it looks like it disappeared with MSVCRT: 1. I checked with CI (VS 2019). puts(PRId64) prints out "lld". 2. MinGW's inttypes.h[1] only uses "I64" et al if you build against MSVCRT. So I think we should delete that stuff. Attached. I worried that GNU gettext() might still know about %I64 somewhere, but it just expands the macros to whatever inttypes.h defines[2]. Good. We don't even test -Dnls on the Windows CI task, so the fact that it passes there doesn't mean much (if our tests would even pick up <PRI*64> expansion failure, not sure). We should probably do something about that and/or its absence from the build farm. We're effectively counting on the EDB packaging team or end users to tell us if we break localisation on this platform. I was also curious to know if the nearby floating point formatting kludge added by commit f1885386 was still needed today. CI passes without it, and the standard is pretty clear: "The exponent always contains at least two digits, and only as many more digits as necessary to represent the exponent". I didn't look too closely at the fine print, but that text was already present in C89 so I guess MSVCRT just failed to conform on that point. [1] https://github.com/mingw-w64/mingw-w64/blob/master/mingw-w64-headers/crt/inttypes.h [2] https://github.com/autotools-mirror/gettext/blob/637b208fbe13f1c306f19d4f31c21fec7e9986d2/gettext-runtime/intl/loadmsgcat.c#L473
Вложения
Thomas Munro <thomas.munro@gmail.com> writes:
> We don't even test -Dnls on the Windows CI task, so the fact that it
> passes there doesn't mean much (if our tests would even pick up
> <PRI*64> expansion failure, not sure). We should probably do
> something about that and/or its absence from the build farm. We're
> effectively counting on the EDB packaging team or end users to tell us
> if we break localisation on this platform.
I'm pretty certain that we do not test NLS localization at all,
anywhere :-(. (There are no test cases checking enable_nls,
which would be a necessary thing to not fail on buildfarm critters
not using NLS.)
I agree that starting to rely on PRI?64 in translatable strings
is raising the bar a good deal, so maybe it's time to do something
about that.
regards, tom lane
On Wed, Nov 19, 2025 at 3:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I agree that starting to rely on PRI?64 in translatable strings > is raising the bar a good deal, so maybe it's time to do something > about that. Perhaps meson/configure should do a po -> mo -> gettext() check with a canned test message? That'd also make sure your msgfmt and libintl are compatible, something I worried about when I wrote about musl recently.
Thomas Munro <thomas.munro@gmail.com> writes:
> Perhaps meson/configure should do a po -> mo -> gettext() check with a
> canned test message? That'd also make sure your msgfmt and libintl
> are compatible, something I worried about when I wrote about musl
> recently.
No, I don't think that's a good approach. That is testing the library
available at configure time, not the one you are actually running
with (possibly years later and on a different machine, even without
considering cross-compilation cases). I think we should do it
honestly with a regression test. It doesn't need to be very
complicated --- I think checking one message in one translation is
sufficient, so long as it includes a PRI?64 usage.
regards, tom lane
On 19.11.25 04:15, Tom Lane wrote: > Thomas Munro <thomas.munro@gmail.com> writes: >> Perhaps meson/configure should do a po -> mo -> gettext() check with a >> canned test message? That'd also make sure your msgfmt and libintl >> are compatible, something I worried about when I wrote about musl >> recently. > > No, I don't think that's a good approach. That is testing the library > available at configure time, not the one you are actually running > with (possibly years later and on a different machine, even without > considering cross-compilation cases). I think we should do it > honestly with a regression test. It doesn't need to be very > complicated --- I think checking one message in one translation is > sufficient, so long as it includes a PRI?64 usage. We could generate an English message catalog that translates all messages unchanged, and run the whole test suite with that. This would exercise the whole gettext run-time machinery. Generating the message catalog is easy, gettext provides a tool for that. What's a little tricky is convincing all our testing infrastructure to *not* disable NLS-related locale settings. See attached for a rough, incomplete demo.
Вложения
On 19.11.25 03:13, Thomas Munro wrote: > Interesting report though. Commit 962da900 assumed that our in-tree > printf implementation still needed to understand that %I64 stuff in > case it came to us from system headers, but it looks like it > disappeared with MSVCRT: > > 1. I checked with CI (VS 2019). puts(PRId64) prints out "lld". > 2. MinGW's inttypes.h[1] only uses "I64" et al if you build against MSVCRT. > > So I think we should delete that stuff. Attached. Looks good to me.
Peter Eisentraut <peter@eisentraut.org> writes:
> On 19.11.25 04:15, Tom Lane wrote:
>> I think we should do it
>> honestly with a regression test. It doesn't need to be very
>> complicated --- I think checking one message in one translation is
>> sufficient, so long as it includes a PRI?64 usage.
> We could generate an English message catalog that translates all
> messages unchanged, and run the whole test suite with that. This would
> exercise the whole gettext run-time machinery.
... except that if it were actually doing nothing whatsoever, you
could not tell. This seems particularly troublesome for gettext,
since its fallback behavior is exactly to return the given string.
I'd prefer a test that fails in a visible way.
regards, tom lane
On 2025-Nov-19, Tom Lane wrote: > Peter Eisentraut <peter@eisentraut.org> writes: > > On 19.11.25 04:15, Tom Lane wrote: > >> I think we should do it > >> honestly with a regression test. It doesn't need to be very > >> complicated --- I think checking one message in one translation is > >> sufficient, so long as it includes a PRI?64 usage. > > > We could generate an English message catalog that translates all > > messages unchanged, and run the whole test suite with that. This would > > exercise the whole gettext run-time machinery. > > ... except that if it were actually doing nothing whatsoever, you > could not tell. You could feed the message catalog a translated string that differs from the original in some simple way, say, by adding a constant prefix "[translated]" or something like that. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On Wed, Nov 19, 2025 at 9:07 AM Álvaro Herrera <alvherre@kurilemu.de> wrote: > You could feed the message catalog a translated string that differs from > the original in some simple way, say, by adding a constant prefix > "[translated]" or something like that. `xgettext -m` can do that. (But I wish I'd known about msgen earlier...) We could additionally use preloadable_libintl.so, in combination with GETTEXT_LOG_UNTRANSLATED, and check if the log contains entries from our domains. I was doing that just last week. But beware that the log file can grow very quickly. And we'd probably have to differentiate the "no domain" text belonging to other software from accidental no-domain strings in our own code, like what I described in [1]. --Jacob [1] https://postgr.es/m/CAOYmi+kQQ8vpRcoSrA5EQ98Wa3G6jFj1yRHs6mh1V7ohkTC7JA@mail.gmail.com
On Thu, Nov 20, 2025 at 4:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Eisentraut <peter@eisentraut.org> writes: > > On 19.11.25 04:15, Tom Lane wrote: > >> I think we should do it > >> honestly with a regression test. It doesn't need to be very > >> complicated --- I think checking one message in one translation is > >> sufficient, so long as it includes a PRI?64 usage. > > > We could generate an English message catalog that translates all > > messages unchanged, and run the whole test suite with that. This would > > exercise the whole gettext run-time machinery. > > ... except that if it were actually doing nothing whatsoever, you > could not tell. This seems particularly troublesome for gettext, > since its fallback behavior is exactly to return the given string. > I'd prefer a test that fails in a visible way. How about a test module with a test_nls() function that just raises an error containing "hello %" PRId64 ", ..." with all the macros we care about, and regression test that calls it, and two alternative expected files with "hello ...", "hola ...", matching en.po and es.po (or choose some other second language that we think is likely to be tested by a subset of BF animals and/or a CI task)? Then if you didn't enable -Dnls it'd still pass with English, and if you did it'd pass for any language. Since there are no other .po files, if you had some third language it'd fall back to English, and the .po would have a "do not translate" comment or even prefix in the message to avoid confusing the translation team. That assumes that modules are allowed to supply .po files, I didn't check, if that's not true then maybe it'd have to be in core instead. That'd test quite a lot of moving parts at once. The reason I thought about a contrived message with lots of macros is that I'd stumbled across a partial implementation[1] in Alpine's alternative non-GNU msgfmt program, which appears to have PRIu64 but not PRIx64 and others. It also has some other way of encoding this stuff in the .mo that musl's alternative built-in libintl implementation can understand (it looks like they have arranged to be able to mmap the .mo and use it directly as shared read-only memory, while GNU's implementation has to allocate memory to translate them to %lld etc in every process, clever but (I assume) broken if msgfmt/libintl implementations are mixed), so I figured it'd be a good idea to make sure that we test that all the macros actually work. I didn't try to understand the implications of Wolfgang's reply, but I guess that to have any chance of Alpine's libintl pickle being straightened out, we'd ideally want a test case that someone interested in that could use to validate the whole localisation pipeline conclusively. [1] https://www.postgresql.org/message-id/flat/CA%2BhUKG%2Bpp%3D%3Dd-3LVhdNOvOAzwQN0vP4gBSxtHkmxnmfQD3NY%3Dw%40mail.gmail.com#167da8f2fb3093bc0fa0a8335c054c19
On Thu, Nov 20, 2025 at 6:07 AM Álvaro Herrera <alvherre@kurilemu.de> wrote: > You could feed the message catalog a translated string that differs from > the original in some simple way, say, by adding a constant prefix > "[translated]" or something like that. Oh, that's probably better than my nearby en.po + es.po suggestion. Combining the ideas, you could have just an en.po translation, but expected files to match "hello ..." and "[translated] hello ...". Though, hmm, I suppose that fails to fail if it didn't translate when it should have, so maybe a TAP test or a test_nls() function that internally checks the translation rather than using error() and expected files...
Thomas Munro <thomas.munro@gmail.com> writes:
> On Thu, Nov 20, 2025 at 6:07 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>> You could feed the message catalog a translated string that differs from
>> the original in some simple way, say, by adding a constant prefix
>> "[translated]" or something like that.
> Oh, that's probably better than my nearby en.po + es.po suggestion.
> Combining the ideas, you could have just an en.po translation, but
> expected files to match "hello ..." and "[translated] hello ...".
> Though, hmm, I suppose that fails to fail if it didn't translate when
> it should have,
Yeah. I think it's critical that the test be set up so that
failure-to-translate cannot look like a success.
I agree with the idea of just using a single test message that checks
all the PRI* macros we care about. I don't think we need to invent a
whole new translation for this. I'd be inclined to just get the
desired translated string pushed into one or two .po files in HEAD,
then we can start testing with those specific languages, and we're
good. Over time the translators would presumably get translations
into other .po files, and then maybe we'd want to expand the set of
tested languages, or maybe that wouldn't really buy much. (Managing
the encoding of the expected-file might be tricky if you got too
ambitious about that.)
regards, tom lane
I wrote:
> I agree with the idea of just using a single test message that checks
> all the PRI* macros we care about. I don't think we need to invent a
> whole new translation for this. I'd be inclined to just get the
> desired translated string pushed into one or two .po files in HEAD,
> then we can start testing with those specific languages, and we're
> good. Over time the translators would presumably get translations
> into other .po files, and then maybe we'd want to expand the set of
> tested languages, or maybe that wouldn't really buy much. (Managing
> the encoding of the expected-file might be tricky if you got too
> ambitious about that.)
Just as proof-of-concept, this is approximately what I think we
should do to begin with.
The main thing that's likely wrong here is that I just manually
shoved a new entry into src/backend/po/es.po. I suspect that
the .po-extraction machinery would fail to pick up that string
because it's in src/test/regress/regress.c. We could hack it
to do that, or we could put the test function into some backend
file. I don't have much sense of which would be cleaner.
Lesser loose ends: I didn't bother fleshing out the test message
to cover all of the likely PRI* cases, and my Spanish probably
sucks. I'm also unsure if this will work as-is on Windows;
are the LC_MESSAGES settings the same there?
regards, tom lane
From 3f89fb8f0070a35e26e35eb63fe54caea647a4ec Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Wed, 19 Nov 2025 17:16:04 -0500
Subject: [PATCH v1] Simple test of NLS translation.
This is just intended to verify minimal functionality of the
NLS message-translation system, and in particular to check that
the PRI* macros work.
---
src/backend/po/es.po | 5 +++++
src/test/regress/expected/nls.out | 17 +++++++++++++++++
src/test/regress/expected/nls_1.out | 17 +++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/regress.c | 18 ++++++++++++++++++
src/test/regress/sql/nls.sql | 16 ++++++++++++++++
6 files changed, 74 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/nls.out
create mode 100644 src/test/regress/expected/nls_1.out
create mode 100644 src/test/regress/sql/nls.sql
diff --git a/src/backend/po/es.po b/src/backend/po/es.po
index e2593b52271..861aea61b68 100644
--- a/src/backend/po/es.po
+++ b/src/backend/po/es.po
@@ -31143,3 +31143,8 @@ msgstr "uso no estandar de escape en un literal de cadena"
#, c-format
msgid "Use the escape string syntax for escapes, e.g., E'\\r\\n'."
msgstr "Use la sintaxis de escape para cadenas, por ej. E'\\r\\n'."
+
+#: regress.c:1041
+#, c-format
+msgid "translated PRId64 = %<PRId64>, PRId32 = %<PRId32>"
+msgstr "traducido PRId64 = %<PRId64>, PRId32 = %<PRId32>"
diff --git a/src/test/regress/expected/nls.out b/src/test/regress/expected/nls.out
new file mode 100644
index 00000000000..b97802aeee8
--- /dev/null
+++ b/src/test/regress/expected/nls.out
@@ -0,0 +1,17 @@
+-- directory paths and dlsuffix are passed to us in environment variables
+\getenv libdir PG_LIBDIR
+\getenv dlsuffix PG_DLSUFFIX
+\set regresslib :libdir '/regress' :dlsuffix
+CREATE FUNCTION test_translation()
+ RETURNS void
+ AS :'regresslib'
+ LANGUAGE C;
+SET lc_messages = 'es_ES';
+SELECT test_translation();
+NOTICE: traducido PRId64 = 4242, PRId32 = -1234
+ test_translation
+------------------
+
+(1 row)
+
+RESET lc_messages;
diff --git a/src/test/regress/expected/nls_1.out b/src/test/regress/expected/nls_1.out
new file mode 100644
index 00000000000..4b707e9dad4
--- /dev/null
+++ b/src/test/regress/expected/nls_1.out
@@ -0,0 +1,17 @@
+-- directory paths and dlsuffix are passed to us in environment variables
+\getenv libdir PG_LIBDIR
+\getenv dlsuffix PG_DLSUFFIX
+\set regresslib :libdir '/regress' :dlsuffix
+CREATE FUNCTION test_translation()
+ RETURNS void
+ AS :'regresslib'
+ LANGUAGE C;
+SET lc_messages = 'es_ES';
+SELECT test_translation();
+NOTICE: NLS is not enabled
+ test_translation
+------------------
+
+(1 row)
+
+RESET lc_messages;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index f56482fb9f1..66ce1b7d9cd 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -76,7 +76,7 @@ test: brin_bloom brin_multi
# ----------
# Another group of parallel tests
# ----------
-test: create_table_like alter_generic alter_operator misc async dbsize merge misc_functions sysviews tsrf tid tidscan
tidrangescancollate.utf8 collate.icu.utf8 incremental_sort create_role without_overlaps generated_virtual
+test: create_table_like alter_generic alter_operator misc async dbsize merge misc_functions nls sysviews tsrf tid
tidscantidrangescan collate.utf8 collate.icu.utf8 incremental_sort create_role without_overlaps generated_virtual
# collate.linux.utf8 and collate.icu.utf8 tests cannot be run in parallel with each other
# psql depends on create_am
diff --git a/src/test/regress/regress.c b/src/test/regress/regress.c
index a2db6080876..7d939565e2e 100644
--- a/src/test/regress/regress.c
+++ b/src/test/regress/regress.c
@@ -1028,3 +1028,21 @@ test_relpath(PG_FUNCTION_ARGS)
PG_RETURN_VOID();
}
+
+/*
+ * Simple test to verify NLS support, particularly that the PRI* macros work.
+ */
+PG_FUNCTION_INFO_V1(test_translation);
+Datum
+test_translation(PG_FUNCTION_ARGS)
+{
+#ifdef ENABLE_NLS
+ ereport(NOTICE,
+ (errmsg("translated PRId64 = %" PRId64 ", PRId32 = %" PRId32,
+ (int64) 4242, (int32) -1234)));
+#else
+ elog(NOTICE, "NLS is not enabled");
+#endif
+
+ PG_RETURN_VOID();
+}
diff --git a/src/test/regress/sql/nls.sql b/src/test/regress/sql/nls.sql
new file mode 100644
index 00000000000..53b4add86eb
--- /dev/null
+++ b/src/test/regress/sql/nls.sql
@@ -0,0 +1,16 @@
+-- directory paths and dlsuffix are passed to us in environment variables
+\getenv libdir PG_LIBDIR
+\getenv dlsuffix PG_DLSUFFIX
+
+\set regresslib :libdir '/regress' :dlsuffix
+
+CREATE FUNCTION test_translation()
+ RETURNS void
+ AS :'regresslib'
+ LANGUAGE C;
+
+SET lc_messages = 'es_ES';
+
+SELECT test_translation();
+
+RESET lc_messages;
--
2.43.7
On Thu, Nov 20, 2025 at 11:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm also unsure if this will work as-is on Windows;
> are the LC_MESSAGES settings the same there?
Bilal [CC'd], have you ever looked into gettext support for Windows
CI? I think we'd need at least msgfmt.exe, libintl.{dll,lib,h}
installed on the image, though I have no clue which
distribution/package/whatever would be appropriate. I assume a script
in pg-vm-images[1] would need to install that, once we pick one. Does
anyone happen to know where EDB's installer pipeline pulls gettext
from?
[1] https://github.com/anarazel/pg-vm-images/tree/main/scripts
I wrote:
> The main thing that's likely wrong here is that I just manually
> shoved a new entry into src/backend/po/es.po. I suspect that
> the .po-extraction machinery would fail to pick up that string
> because it's in src/test/regress/regress.c. We could hack it
> to do that, or we could put the test function into some backend
> file. I don't have much sense of which would be cleaner.
Oh, better idea about that: let's make regress.so have its own
translation domain. This allows testing the TEXTDOMAIN mechanism
as well as the basics, and it keeps the patch pretty self-contained.
I was amused to see that "make update-po" was able to fill in
translations for all of the pre-existing ereport's in regress.c.
I guess they all had duplicates somewhere else? But I take no
credit or blame for any of those translations.
The other loose ends remain.
regards, tom lane
From 4cc78e9deea5cd69d711bdf15d20d9b8e80d363f Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Wed, 19 Nov 2025 20:16:57 -0500
Subject: [PATCH v2] Simple test of NLS translation.
This is just intended to verify minimal functionality of the
NLS message-translation system, and in particular to check that
the PRI* macros work.
---
src/test/regress/expected/nls.out | 18 +++++++++
src/test/regress/expected/nls_1.out | 17 +++++++++
src/test/regress/meson.build | 2 +
src/test/regress/nls.mk | 5 +++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/po/LINGUAS | 1 +
src/test/regress/po/es.po | 59 +++++++++++++++++++++++++++++
src/test/regress/po/meson.build | 3 ++
src/test/regress/regress.c | 32 ++++++++++++++++
src/test/regress/sql/nls.sql | 16 ++++++++
10 files changed, 154 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/nls.out
create mode 100644 src/test/regress/expected/nls_1.out
create mode 100644 src/test/regress/nls.mk
create mode 100644 src/test/regress/po/LINGUAS
create mode 100644 src/test/regress/po/es.po
create mode 100644 src/test/regress/po/meson.build
create mode 100644 src/test/regress/sql/nls.sql
diff --git a/src/test/regress/expected/nls.out b/src/test/regress/expected/nls.out
new file mode 100644
index 00000000000..d16c29741db
--- /dev/null
+++ b/src/test/regress/expected/nls.out
@@ -0,0 +1,18 @@
+-- directory paths and dlsuffix are passed to us in environment variables
+\getenv libdir PG_LIBDIR
+\getenv dlsuffix PG_DLSUFFIX
+\set regresslib :libdir '/regress' :dlsuffix
+CREATE FUNCTION test_translation()
+ RETURNS void
+ AS :'regresslib'
+ LANGUAGE C;
+SET lc_messages = 'es_ES';
+SELECT test_translation();
+NOTICE: traducido PRId64 = 4242
+NOTICE: traducido PRId32 = -1234
+ test_translation
+------------------
+
+(1 row)
+
+RESET lc_messages;
diff --git a/src/test/regress/expected/nls_1.out b/src/test/regress/expected/nls_1.out
new file mode 100644
index 00000000000..4b707e9dad4
--- /dev/null
+++ b/src/test/regress/expected/nls_1.out
@@ -0,0 +1,17 @@
+-- directory paths and dlsuffix are passed to us in environment variables
+\getenv libdir PG_LIBDIR
+\getenv dlsuffix PG_DLSUFFIX
+\set regresslib :libdir '/regress' :dlsuffix
+CREATE FUNCTION test_translation()
+ RETURNS void
+ AS :'regresslib'
+ LANGUAGE C;
+SET lc_messages = 'es_ES';
+SELECT test_translation();
+NOTICE: NLS is not enabled
+ test_translation
+------------------
+
+(1 row)
+
+RESET lc_messages;
diff --git a/src/test/regress/meson.build b/src/test/regress/meson.build
index 1da9e9462a9..4001a81ffe5 100644
--- a/src/test/regress/meson.build
+++ b/src/test/regress/meson.build
@@ -57,3 +57,5 @@ tests += {
'dbname': 'regression',
},
}
+
+subdir('po', if_found: libintl)
diff --git a/src/test/regress/nls.mk b/src/test/regress/nls.mk
new file mode 100644
index 00000000000..43227c64f09
--- /dev/null
+++ b/src/test/regress/nls.mk
@@ -0,0 +1,5 @@
+# src/test/regress/nls.mk
+CATALOG_NAME = regress
+GETTEXT_FILES = regress.c
+GETTEXT_TRIGGERS = $(BACKEND_COMMON_GETTEXT_TRIGGERS)
+GETTEXT_FLAGS = $(BACKEND_COMMON_GETTEXT_FLAGS)
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index f56482fb9f1..66ce1b7d9cd 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -76,7 +76,7 @@ test: brin_bloom brin_multi
# ----------
# Another group of parallel tests
# ----------
-test: create_table_like alter_generic alter_operator misc async dbsize merge misc_functions sysviews tsrf tid tidscan
tidrangescancollate.utf8 collate.icu.utf8 incremental_sort create_role without_overlaps generated_virtual
+test: create_table_like alter_generic alter_operator misc async dbsize merge misc_functions nls sysviews tsrf tid
tidscantidrangescan collate.utf8 collate.icu.utf8 incremental_sort create_role without_overlaps generated_virtual
# collate.linux.utf8 and collate.icu.utf8 tests cannot be run in parallel with each other
# psql depends on create_am
diff --git a/src/test/regress/po/LINGUAS b/src/test/regress/po/LINGUAS
new file mode 100644
index 00000000000..8357fcaaed4
--- /dev/null
+++ b/src/test/regress/po/LINGUAS
@@ -0,0 +1 @@
+es
diff --git a/src/test/regress/po/es.po b/src/test/regress/po/es.po
new file mode 100644
index 00000000000..3049b73f9f9
--- /dev/null
+++ b/src/test/regress/po/es.po
@@ -0,0 +1,59 @@
+# Spanish message translation file for regress test library
+#
+# Copyright (C) 2025 PostgreSQL Global Development Group
+# This file is distributed under the same license as the regress (PostgreSQL) package.
+#
+# Tom Lane <tgl@sss.pgh.pa.us>, 2025.
+#
+msgid ""
+msgstr ""
+"Project-Id-Version: regress (PostgreSQL) 19\n"
+"Report-Msgid-Bugs-To: pgsql-bugs@lists.postgresql.org\n"
+"POT-Creation-Date: 2025-11-19 19:01-0500\n"
+"PO-Revision-Date: 2025-11-19 19:01-0500\n"
+"Last-Translator: Tom Lane <tgl@sss.pgh.pa.us>\n"
+"Language-Team: PgSQL-es-Ayuda <pgsql-es-ayuda@lists.postgresql.org>\n"
+"Language: es\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=UTF-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+
+#: regress.c:200
+#, c-format
+msgid "invalid input syntax for type %s: \"%s\""
+msgstr "la sintaxis de entrada no es válida para tipo %s: «%s»"
+
+#: regress.c:907
+#, c-format
+msgid "invalid source encoding name \"%s\""
+msgstr "la codificación de origen «%s» no es válida"
+
+#: regress.c:912
+#, c-format
+msgid "invalid destination encoding name \"%s\""
+msgstr "la codificación de destino «%s» no es válida"
+
+#: regress.c:957
+#, c-format
+msgid "default conversion function for encoding \"%s\" to \"%s\" does not exist"
+msgstr "no existe el procedimiento por omisión de conversión desde la codificación «%s» a «%s»"
+
+#: regress.c:964
+#, c-format
+msgid "out of memory"
+msgstr "memoria agotada"
+
+#: regress.c:965
+#, c-format
+msgid "String of %d bytes is too long for encoding conversion."
+msgstr "La cadena de %d bytes es demasiado larga para la recodificación."
+
+#: regress.c:1054
+#, c-format
+msgid "translated PRId64 = %<PRId64>"
+msgstr "traducido PRId64 = %<PRId64>"
+
+#: regress.c:1056
+#, c-format
+msgid "translated PRId32 = %<PRId32>"
+msgstr "traducido PRId32 = %<PRId32>"
diff --git a/src/test/regress/po/meson.build b/src/test/regress/po/meson.build
new file mode 100644
index 00000000000..e9bd964aa7f
--- /dev/null
+++ b/src/test/regress/po/meson.build
@@ -0,0 +1,3 @@
+# Copyright (c) 2022-2025, PostgreSQL Global Development Group
+
+nls_targets += [i18n.gettext('regress-' + pg_version_major.to_string())]
diff --git a/src/test/regress/regress.c b/src/test/regress/regress.c
index a2db6080876..4a584ca88ae 100644
--- a/src/test/regress/regress.c
+++ b/src/test/regress/regress.c
@@ -46,6 +46,10 @@
#include "utils/rel.h"
#include "utils/typcache.h"
+/* define our text domain for translations */
+#undef TEXTDOMAIN
+#define TEXTDOMAIN PG_TEXTDOMAIN("regress")
+
#define EXPECT_TRUE(expr) \
do { \
if (!(expr)) \
@@ -1028,3 +1032,31 @@ test_relpath(PG_FUNCTION_ARGS)
PG_RETURN_VOID();
}
+
+/*
+ * Simple test to verify NLS support, particularly that the PRI* macros work.
+ */
+PG_FUNCTION_INFO_V1(test_translation);
+Datum
+test_translation(PG_FUNCTION_ARGS)
+{
+#ifdef ENABLE_NLS
+ /* This would be better done in _PG_init(), if this module had one */
+ static bool inited = false;
+
+ if (!inited)
+ {
+ pg_bindtextdomain(TEXTDOMAIN);
+ inited = true;
+ }
+
+ ereport(NOTICE,
+ errmsg("translated PRId64 = %" PRId64, (int64) 4242));
+ ereport(NOTICE,
+ errmsg("translated PRId32 = %" PRId32, (int32) -1234));
+#else
+ elog(NOTICE, "NLS is not enabled");
+#endif
+
+ PG_RETURN_VOID();
+}
diff --git a/src/test/regress/sql/nls.sql b/src/test/regress/sql/nls.sql
new file mode 100644
index 00000000000..53b4add86eb
--- /dev/null
+++ b/src/test/regress/sql/nls.sql
@@ -0,0 +1,16 @@
+-- directory paths and dlsuffix are passed to us in environment variables
+\getenv libdir PG_LIBDIR
+\getenv dlsuffix PG_DLSUFFIX
+
+\set regresslib :libdir '/regress' :dlsuffix
+
+CREATE FUNCTION test_translation()
+ RETURNS void
+ AS :'regresslib'
+ LANGUAGE C;
+
+SET lc_messages = 'es_ES';
+
+SELECT test_translation();
+
+RESET lc_messages;
--
2.43.7
Hi,
On Thu, 20 Nov 2025 at 01:44, Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Thu, Nov 20, 2025 at 11:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I'm also unsure if this will work as-is on Windows;
> > are the LC_MESSAGES settings the same there?
>
> Bilal [CC'd], have you ever looked into gettext support for Windows
> CI? I think we'd need at least msgfmt.exe, libintl.{dll,lib,h}
> installed on the image, though I have no clue which
> distribution/package/whatever would be appropriate. I assume a script
> in pg-vm-images[1] would need to install that, once we pick one. Does
> anyone happen to know where EDB's installer pipeline pulls gettext
> from?
>
> [1] https://github.com/anarazel/pg-vm-images/tree/main/scripts
Yes, I was working on that some time ago and I was able to enable NLS
(and many other dependencies) on Windows CI image by using the
dependencies from Dave Page's winpgbuild repository [1]. I was going
to share these changes but some other things came up and this one got
delayed. I am planning to return to this again soon.
As an example, I re-generated the Windows CI image and tested it with
VS 2019 [2] and VS 2022 [3]. There are 3 tests failed on both but they
are not related to NLS. A portion of configure output:
[07:40:23.149] External libraries
[07:40:23.149] bonjour : NO
[07:40:23.149] bsd_auth : NO
[07:40:23.149] docs : YES
[07:40:23.149] docs_pdf : NO
[07:40:23.149] gss : YES 1.22.1
[07:40:23.149] icu : YES 77.1
[07:40:23.149] ldap : YES
[07:40:23.149] libcurl : NO
[07:40:23.149] libnuma : NO
[07:40:23.149] liburing : NO
[07:40:23.149] libxml : YES 2.13.9
[07:40:23.149] libxslt : YES 1.1.43
[07:40:23.149] llvm : NO
[07:40:23.149] lz4 : YES 1.10.0
[07:40:23.149] nls : YES
[07:40:23.149] openssl : YES 3.0.18
[07:40:23.149] pam : NO
[07:40:23.149] plperl : YES 5.42.0
[07:40:23.149] plpython : YES 3.10
[07:40:23.149] pltcl : NO
[07:40:23.149] readline : NO
[07:40:23.149] selinux : NO
[07:40:23.149] systemd : NO
[07:40:23.149] uuid : YES 1.6.2
[07:40:23.149] zlib : YES 1.3.1
[07:40:23.149] zstd : YES 1.5.7
[1] https://github.com/dpage/winpgbuild
[2] https://cirrus-ci.com/task/4655787001249792
[3] https://cirrus-ci.com/task/5786281818456064
--
Regards,
Nazir Bilal Yavuz
Microsoft
On Wed, Nov 19, 2025 at 3:13 PM Thomas Munro <thomas.munro@gmail.com> wrote: > I was also curious to know if the nearby floating point formatting > kludge added by commit f1885386 was still needed today. CI passes > without it, and the standard is pretty clear: "The exponent always > contains at least two digits, and only as many more digits as > necessary to represent the exponent". I didn't look too closely at > the fine print, but that text was already present in C89 so I guess > MSVCRT just failed to conform on that point. We can also drop HAVE_BUGGY_STRTOF for MinGW. This passes on CI. That'd leave only Cygwin with HAVE BUGGY_STRTOF. Perhaps they have fixed their implementation[1]? Here's an experimental patch to drop all remnants, which could be used to find out. No Windows/Cygwin here. Hmm, what if we just commit it anyway? If their strtof() is still broken and someone out there is running the tests and sees this test fail, why shouldn't they take that up with libc at this stage? [1] https://github.com/cygwin/cygwin/commit/fb01286fab9b370c86323f84a46285cfbebfe4ff
Вложения
Thomas Munro <thomas.munro@gmail.com> writes:
> That'd leave only Cygwin with HAVE BUGGY_STRTOF. Perhaps they have
> fixed their implementation[1]? Here's an experimental patch to drop
> all remnants, which could be used to find out. No Windows/Cygwin
> here. Hmm, what if we just commit it anyway? If their strtof() is
> still broken and someone out there is running the tests and sees this
> test fail, why shouldn't they take that up with libc at this stage?
Hmm, we could get rid of the whole resultmap mechanism ...
regards, tom lane
On Sun, Nov 23, 2025 at 4:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > That'd leave only Cygwin with HAVE BUGGY_STRTOF. Perhaps they have > > fixed their implementation[1]? Here's an experimental patch to drop > > all remnants, which could be used to find out. No Windows/Cygwin > > here. Hmm, what if we just commit it anyway? If their strtof() is > > still broken and someone out there is running the tests and sees this > > test fail, why shouldn't they take that up with libc at this stage? > > Hmm, we could get rid of the whole resultmap mechanism ... Yeah. I thought I'd see what blowback my if-Cygwin-strtof()-really-is-still-broken-they-should-fix-it argument attracted before spending the time to nuke all those lines too. Here's that patch. We could always revert resultmap we found a new reason to need it, but I hope we wouldn't.
Вложения
On 24.11.25 00:03, Thomas Munro wrote: > On Sun, Nov 23, 2025 at 4:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Thomas Munro <thomas.munro@gmail.com> writes: >>> That'd leave only Cygwin with HAVE BUGGY_STRTOF. Perhaps they have >>> fixed their implementation[1]? Here's an experimental patch to drop >>> all remnants, which could be used to find out. No Windows/Cygwin >>> here. Hmm, what if we just commit it anyway? If their strtof() is >>> still broken and someone out there is running the tests and sees this >>> test fail, why shouldn't they take that up with libc at this stage? >> >> Hmm, we could get rid of the whole resultmap mechanism ... > > Yeah. I thought I'd see what blowback my > if-Cygwin-strtof()-really-is-still-broken-they-should-fix-it argument > attracted before spending the time to nuke all those lines too. > Here's that patch. We could always revert resultmap we found a new > reason to need it, but I hope we wouldn't. These patches look sensible to me. Maybe wait a bit to see if Andrew can manually reproduce the issue one way or the other on Cygwin. Otherwise, I'd say go for it.
Thomas Munro <thomas.munro@gmail.com> writes:
> The reason I thought about a contrived message with lots of macros is
> that I'd stumbled across a partial implementation[1] in Alpine's
> alternative non-GNU msgfmt program, which appears to have PRIu64 but
> not PRIx64 and others. It also has some other way of encoding this
> stuff in the .mo that musl's alternative built-in libintl
> implementation can understand (it looks like they have arranged to be
> able to mmap the .mo and use it directly as shared read-only memory,
> while GNU's implementation has to allocate memory to translate them to
> %lld etc in every process, clever but (I assume) broken if
> msgfmt/libintl implementations are mixed), so I figured it'd be a good
> idea to make sure that we test that all the macros actually work. I
> didn't try to understand the implications of Wolfgang's reply, but I
> guess that to have any chance of Alpine's libintl pickle being
> straightened out, we'd ideally want a test case that someone
> interested in that could use to validate the whole localisation
> pipeline conclusively.
So now we have that, and sure enough our two Alpine buildfarm members
are failing like this [1][2]:
diff -U3 /mnt/build/HEAD/pgsql/src/test/regress/expected/nls_1.out
/mnt/build/HEAD/pgsql.build/testrun/regress/regress/results/nls.out
--- /mnt/build/HEAD/pgsql/src/test/regress/expected/nls_1.out
+++ /mnt/build/HEAD/pgsql.build/testrun/regress/regress/results/nls.out
@@ -34,7 +34,22 @@
\\quit
\\endif
SELECT test_translation();
-NOTICE: NLS is not enabled
+NOTICE: translated PRId64 = 424242424242
+NOTICE: translated PRId32 = -1234
+NOTICE: translated PRIdMAX = -5678
+NOTICE: translated PRIdPTR = 9999
+NOTICE: traducido PRIu64 = 424242424242
+NOTICE: traducido PRIu32 = 1234
+NOTICE: translated PRIuMAX = 5678
+NOTICE: translated PRIuPTR = 9999
+NOTICE: translated PRIx64 = 62c6d1a9b2
+NOTICE: translated PRIx32 = 4d2
+NOTICE: translated PRIxMAX = 162e
+NOTICE: translated PRIxPTR = 270f
+NOTICE: translated PRIX64 = 62C6D1A9B2
+NOTICE: translated PRIX32 = 4D2
+NOTICE: translated PRIXMAX = 162E
+NOTICE: translated PRIXPTR = 270F
test_translation
------------------
So their gettext handles PRIu64 and PRIu32 and nothing else.
What to do now? I could revert 8c498479d and followups, but
I sure don't want to. A stopgap measure to make the farm look
green would be to add a variant expected-file that accepts
this output, but yech. Thoughts?
regards, tom lane
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=basilisk&dt=2025-12-15%2004%3A35%3A44
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dogfish&dt=2025-12-15%2005%3A19%3A50
On Mon, Dec 15, 2025 at 8:01 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > So their gettext handles PRIu64 and PRIu32 and nothing else. Hah, I had predicted that three would work. Off by one. > What to do now? I could revert 8c498479d and followups, but > I sure don't want to. A stopgap measure to make the farm look > green would be to add a variant expected-file that accepts > this output, but yech. Thoughts? So close yet so far... I tried asking if it's easy to fix: https://github.com/sabotage-linux/gettext-tiny/issues/76
Thomas Munro <thomas.munro@gmail.com> writes: > On Mon, Dec 15, 2025 at 8:01 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> What to do now? I could revert 8c498479d and followups, but >> I sure don't want to. A stopgap measure to make the farm look >> green would be to add a variant expected-file that accepts >> this output, but yech. Thoughts? > So close yet so far... I tried asking if it's easy to fix: > https://github.com/sabotage-linux/gettext-tiny/issues/76 Hmm, not sure if you found the live upstream for that project, but if you did, this code hasn't been touched since 2019. Think we shouldn't hold our breath for a fix :-(. I will go add another expected-file. I'm also thinking that maybe we should expand the ambition of that test script a little. Instead of only checking the behavior of PRI* when we can test translation, why not run the ereport's all the time? This would at least test that <inttypes.h> is sane and snprintf.c agrees with it, which we now know is something worth checking. That's colored by seeing that less than half of the buildfarm is finding any variant of es_ES to test in. That's not great, but I'm not seeing anything to be done about it. The only locale names we can be sure will be accepted are C/POSIX, and I'd expect gettext() to short-circuit that case and not look for a translation. I'm thinking though that it's still worth checking that the untranslated string is processed correctly. regards, tom lane
On 12/15/2025 11:05 AM, Tom Lane wrote: > Thomas Munro <thomas.munro@gmail.com> writes: >> On Mon, Dec 15, 2025 at 8:01 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> What to do now? I could revert 8c498479d and followups, but >>> I sure don't want to. A stopgap measure to make the farm look >>> green would be to add a variant expected-file that accepts >>> this output, but yech. Thoughts? > >> So close yet so far... I tried asking if it's easy to fix: > >> https://github.com/sabotage-linux/gettext-tiny/issues/76 > > Hmm, not sure if you found the live upstream for that project, but if > you did, this code hasn't been touched since 2019. Think we shouldn't > hold our breath for a fix :-(. I will go add another expected-file. > > I'm also thinking that maybe we should expand the ambition of that > test script a little. Instead of only checking the behavior of PRI* > when we can test translation, why not run the ereport's all the time? > This would at least test that <inttypes.h> is sane and snprintf.c > agrees with it, which we now know is something worth checking. That's > colored by seeing that less than half of the buildfarm is finding any > variant of es_ES to test in. That's not great, but I'm not seeing > anything to be done about it. The only locale names we can be sure > will be accepted are C/POSIX, and I'd expect gettext() to > short-circuit that case and not look for a translation. I'm thinking > though that it's still worth checking that the untranslated string is > processed correctly. > > regards, tom lane > > The GNU gettext implementation does not short-circuit that. It still goes through the path of trying to find the message catalogue, it fails, there is no fallback, messages are untranslated. This is true on Windows as well as Linux. Windows just has the curse of an expensive call to enumerate the locales to find the passed in locale every single time because of the failure to cache the unfound case. -- Bryan Green EDB: https://www.enterprisedb.com
Bryan Green <dbryan.green@gmail.com> writes:
> On 12/15/2025 11:05 AM, Tom Lane wrote:
>> ... That's
>> colored by seeing that less than half of the buildfarm is finding any
>> variant of es_ES to test in. That's not great, but I'm not seeing
>> anything to be done about it. The only locale names we can be sure
>> will be accepted are C/POSIX, and I'd expect gettext() to
>> short-circuit that case and not look for a translation. I'm thinking
>> though that it's still worth checking that the untranslated string is
>> processed correctly.
> The GNU gettext implementation does not short-circuit that. It still
> goes through the path of trying to find the message catalogue, it fails,
> there is no fallback, messages are untranslated. This is true on Windows
> as well as Linux.
It'd be great to not need the assumption of es_ES being installed.
However, I tried making a POSIX.po file and setting lc_messages to
POSIX, and it didn't work. The msgfmt infrastructure seemed unfazed
and installed a .mo file under $sharedir/locale/POSIX/LC_MESSAGES as
I'd expect, but no translation happened (this on a Linux box). Same
with 'C'. It did work if I set lc_messages to 'C.utf8', which is a
known name according to this box's "locale -a", but this doesn't give
me a warm feeling about this approach being a lot more portable than
what we have. Any ideas?
regards, tom lane
On 12/15/2025 12:28 PM, Tom Lane wrote:
> It'd be great to not need the assumption of es_ES being installed.
> However, I tried making a POSIX.po file and setting lc_messages to
> POSIX, and it didn't work. The msgfmt infrastructure seemed unfazed
> and installed a .mo file under $sharedir/locale/POSIX/LC_MESSAGES as
> I'd expect, but no translation happened (this on a Linux box). Same
> with 'C'. It did work if I set lc_messages to 'C.utf8', which is a
> known name according to this box's "locale -a", but this doesn't give
> me a warm feeling about this approach being a lot more portable than
> what we have. Any ideas?
My answer did not feel like it was right, so I checked multiple versions
and realized there is a check.
char *
DCIGETTEXT (const char *domainname, const char *msgid, ...)
{
// Get the locale name
categoryvalue = guess_category_value (category, categoryname);
if (categoryvalue != NULL
&& !(categoryvalue[0] == 'C' && categoryvalue[1] == '\0')
&& strcmp (categoryvalue, "POSIX") != 0)
{
// Only do translation if NOT "C" and NOT "POSIX"
retval = _nl_find_msg (...);
}
// For "C" and "POSIX", skip directly to returning msgid
return (char *) msgid;
}
C.utf8 works because it is not "C" so is treated as a real locale. Now
that I'm back into that code...looking it over in more detail to see
what might work...
--
Bryan Green
EDB: https://www.enterprisedb.com
Bryan Green <dbryan.green@gmail.com> writes:
> On 12/15/2025 12:28 PM, Tom Lane wrote:
>> ... It did work if I set lc_messages to 'C.utf8', which is a
>> known name according to this box's "locale -a", but this doesn't give
>> me a warm feeling about this approach being a lot more portable than
>> what we have. Any ideas?
> My answer did not feel like it was right, so I checked multiple versions
> and realized there is a check.
> ...
> C.utf8 works because it is not "C" so is treated as a real locale.
Ah-hah. I didn't think they hadn't optimized the case at all.
Experimenting here, it looks like 'C.UTF-8' might be accepted
everywhere. I even got it to pass on Solaris's not-GNU gettext,
which I thought for sure would be the weak spot in the idea.
I'll press forward with that.
regards, tom lane
On 15.12.25 08:01, Tom Lane wrote: > So their gettext handles PRIu64 and PRIu32 and nothing else. > > What to do now? I could revert 8c498479d and followups, but > I sure don't want to. A stopgap measure to make the farm look > green would be to add a variant expected-file that accepts > this output, but yech. Thoughts? I think that means that that gettext implementation is not currently supportable. So either we revert our PRI* use except those two (unlikely), or those buildfarm members should disable NLS.
On Tue, Dec 16, 2025 at 8:29 AM Peter Eisentraut <peter@eisentraut.org> wrote: > On 15.12.25 08:01, Tom Lane wrote: > > So their gettext handles PRIu64 and PRIu32 and nothing else. > > > > What to do now? I could revert 8c498479d and followups, but > > I sure don't want to. A stopgap measure to make the farm look > > green would be to add a variant expected-file that accepts > > this output, but yech. Thoughts? > > I think that means that that gettext implementation is not currently > supportable. So either we revert our PRI* use except those two > (unlikely), or those buildfarm members should disable NLS. Yeah. My goal in mentioning the problem back when it was just a problem in theory (we had no test, the Alpine packages disable nls (perhaps it used to be *more* broken, if they did that before we used PRI?)) was to try to see if someone closer to these musl distros wanted to have a crack at fixing it, since it looks pretty close to being usable. But now that it's a problem in practice, it's hard to disagree with Peter's take. It could be reenabled any time it works enough to pass the test.
Thomas Munro <thomas.munro@gmail.com> writes:
> On Tue, Dec 16, 2025 at 8:29 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>> I think that means that that gettext implementation is not currently
>> supportable. So either we revert our PRI* use except those two
>> (unlikely), or those buildfarm members should disable NLS.
> Yeah. My goal in mentioning the problem back when it was just a
> problem in theory (we had no test, the Alpine packages disable nls
> (perhaps it used to be *more* broken, if they did that before we used
> PRI?)) was to try to see if someone closer to these musl distros
> wanted to have a crack at fixing it, since it looks pretty close to
> being usable. But now that it's a problem in practice, it's hard to
> disagree with Peter's take. It could be reenabled any time it works
> enough to pass the test.
Fair enough. I've revised the test mechanism per discussion with
Bryan Green, in hopes of being able to test on more BF animals than
we could yesterday. But I won't put in an expected-file for this
Alpine misbehavior.
regards, tom lane
I wrote:
> Experimenting here, it looks like 'C.UTF-8' might be accepted
> everywhere. I even got it to pass on Solaris's not-GNU gettext,
> which I thought for sure would be the weak spot in the idea.
> I'll press forward with that.
Hmmm ... the first batch of BF reports show that on some Linux
machines, it works to set lc_messages to 'C.UTF-8', but nonetheless
no translation happens. Did you notice any other gating factors?
regards, tom lane
Tom Lane: > Thomas Munro <thomas.munro@gmail.com> writes: >> On Tue, Dec 16, 2025 at 8:29 AM Peter Eisentraut <peter@eisentraut.org> wrote: >>> I think that means that that gettext implementation is not currently >>> supportable. So either we revert our PRI* use except those two >>> (unlikely), or those buildfarm members should disable NLS. > >> Yeah. My goal in mentioning the problem back when it was just a >> problem in theory (we had no test, the Alpine packages disable nls >> (perhaps it used to be *more* broken, if they did that before we used >> PRI?)) was to try to see if someone closer to these musl distros >> wanted to have a crack at fixing it, since it looks pretty close to >> being usable. But now that it's a problem in practice, it's hard to >> disagree with Peter's take. It could be reenabled any time it works >> enough to pass the test. > > Fair enough. I've revised the test mechanism per discussion with > Bryan Green, in hopes of being able to test on more BF animals than > we could yesterday. But I won't put in an expected-file for this > Alpine misbehavior. Both alpine animals now have NLS disabled. Best, Wolfgang
On 12/15/2025 2:39 PM, Tom Lane wrote:
> I wrote:
>> Experimenting here, it looks like 'C.UTF-8' might be accepted
>> everywhere. I even got it to pass on Solaris's not-GNU gettext,
>> which I thought for sure would be the weak spot in the idea.
>> I'll press forward with that.
>
> Hmmm ... the first batch of BF reports show that on some Linux
> machines, it works to set lc_messages to 'C.UTF-8', but nonetheless
> no translation happens. Did you notice any other gating factors?
>
> regards, tom lane
Yes - the LANGUAGE environment variable.
gettext has a priority order for locale selection that's different from
what most people expect. Here's what guess_category_value() does:
Environment Variable Priority (from dcigettext.c):
1. LANGUAGE - GNU extension, colon-separated list (e.g., "en_US:en:C")
2. setlocale(category, NULL) result - the actual locale set
3. LC_ALL - POSIX override for all categories
4. LC_MESSAGES (or other LC_* for that category)
5. LANG - fallback default
LANGUAGE has the highest priority and will override LC_MESSAGES completely.
I am not sure this is the problem, but you probably should unset
LANGUAGE before doing almost anything else in the test script. I
wouldn't be surprised if the CI/BF environments have it set.
Do we know what version of libintl is being used on those BF machines?
There are some marked differences between some versions, which makes
this a little more guesswork than it should be.
---------------------------------------------------------------------
What follows is a walkthrough that just shows that language overrides
lc_messages and how that can impact things. No need to read this unless
you just want more detail.
Assume,
export LANGUAGE=en_US:en
export LC_MESSAGES=C.UTF-8
System has catalogs for C.UTF-8 and es.
#postgres.conf
lc_messages = 'C.UTF-8'
InitPostgres calls pg_perm_setlocale with C.UTF-8.
pg_perm_setlocale calls setlocale(LC_MESSAGES, "C.UTF-8") and succeeds.
setlocale uses setenv to set LC_MESSAGES=C.UTF_8
Now assume an error occurs and gettext is called. A couple of wrappers
down we get to DCIGETTEXT() with a category of LC_MESSAGES. We call
guess_category_value with LC_MESSAGES.
guess_category_value implements the priorty as discussed above. The
very first thing it checks is getenv("LANGUAGE"). If that is not NULL
or an empty string it returns whatever is in LANGUAUGE, which in this
case is en_US:en.
Then back in DCIGETTEXT() we will loop through en_US:en. We try to find
the message catalog with 'en_US' first...and fail because we don't have
that catalog. Then we loop back and try 'en'...and fail again because
we don't have that catalog. One more time through the loop where we
don't have anything left in our list of languages, so we set the locale
to 'C'. Then we check that we don't translate if the locale is a single
C. and we break. Nothing translated.
--
Bryan Green
EDB: https://www.enterprisedb.com
On 12/15/2025 2:39 PM, Tom Lane wrote:
> I wrote:
>> Experimenting here, it looks like 'C.UTF-8' might be accepted
>> everywhere. I even got it to pass on Solaris's not-GNU gettext,
>> which I thought for sure would be the weak spot in the idea.
>> I'll press forward with that.
>
> Hmmm ... the first batch of BF reports show that on some Linux
> machines, it works to set lc_messages to 'C.UTF-8', but nonetheless
> no translation happens. Did you notice any other gating factors?
>
> regards, tom lane
I should have asked you which version of libintl is being used. I went
ahead and jumped to 0.26 and they now gate like this:
/* If the current locale value is "C" or "C.<encoding>" or "POSIX",
we don't load a domain. Return the MSGID. */
if ((single_locale[0] == 'C'
&& (single_locale[1] == '\0' || single_locale[1] == '.'))
|| strcmp (single_locale, "POSIX") == 0)
break;
--
Bryan Green
EDB: https://www.enterprisedb.com
Bryan Green <dbryan.green@gmail.com> writes:
> On 12/15/2025 2:39 PM, Tom Lane wrote:
>> Hmmm ... the first batch of BF reports show that on some Linux
>> machines, it works to set lc_messages to 'C.UTF-8', but nonetheless
>> no translation happens. Did you notice any other gating factors?
> Yes - the LANGUAGE environment variable.
Not it, I think. pg_regress unsets that. Also, I've been able to
reproduce the failure here using a Fedora 42 image, and LANGUAGE
is definitely not set in that environment.
> Do we know what version of libintl is being used on those BF machines?
> There are some marked differences between some versions, which makes
> this a little more guesswork than it should be.
On my Fedora image, there seems to be no libintl.so anywhere; it's
certainly not getting linked into Postgres. The routines must be
built into libc. rpm says the glibc version is
glibc-2.41-11.fc42.x86_64. Judging by buildfarm reports, the
behavior changed sometime between Fedora 39 and Fedora 42.
strace'ing shows that during "SET lc_messages = 'C.UTF-8'",
it successfully finds system locale data about C.utf8:
recvfrom(10, "Q\0\0\0!SET lc_messages = 'C.UTF-8'"..., 8192, 0, NULL, NULL) = 34
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 21
fstat(21, {st_mode=S_IFREG|0644, st_size=217804320, ...}) = 0
mmap(NULL, 217804320, PROT_READ, MAP_PRIVATE, 21, 0) = 0x7f935c5f6000
close(21) = 0
openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 21
fstat(21, {st_mode=S_IFREG|0644, st_size=2997, ...}) = 0
read(21, "# Locale name alias data base.\n#"..., 4096) = 2997
read(21, "", 4096) = 0
close(21) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_MESSAGES", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_MESSAGES", O_RDONLY|O_CLOEXEC) = 21
fstat(21, {st_mode=S_IFDIR|0755, st_size=29, ...}) = 0
close(21) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_MESSAGES/SYS_LC_MESSAGES", O_RDONLY|O_CLOEXEC) = 21
fstat(21, {st_mode=S_IFREG|0644, st_size=53, ...}) = 0
mmap(NULL, 53, PROT_READ, MAP_PRIVATE, 21, 0) = 0x7f937690a000
close(21) = 0
openat(AT_FDCWD, "/usr/lib64/gconv/gconv-modules.cache", O_RDONLY) = 21
fstat(21, {st_mode=S_IFREG|0644, st_size=26998, ...}) = 0
mmap(NULL, 26998, PROT_READ, MAP_SHARED, 21, 0) = 0x7f9376903000
close(21) = 0
futex(0x7f93752999a8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
sendto(10, "C\0\0\0\10SET\0Z\0\0\0\5I", 15, 0, NULL, 0) = 15
This part of the trace seems indistinguishable between an older
RHEL system and Fedora 42. But on the older system, when we
invoke pg_bindtextdomain, it does open the regress-19.mo file:
recvfrom(10, "Q\0\0\0\37SELECT test_translation();\0", 8192, 0, NULL, NULL) = 32
stat("/home/postgres/pgsql/src/test/regress/regress.so", {st_mode=S_IFREG|0775, st_size=264904, ...}) = 0
openat(AT_FDCWD, "/home/postgres/testversion/share/locale/C.UTF-8/LC_MESSAGES/regress-19.mo", O_RDONLY) = -1 ENOENT (No
suchfile or directory)
openat(AT_FDCWD, "/home/postgres/testversion/share/locale/C.utf8/LC_MESSAGES/regress-19.mo", O_RDONLY) = -1 ENOENT (No
suchfile or directory)
openat(AT_FDCWD, "/home/postgres/testversion/share/locale/C/LC_MESSAGES/regress-19.mo", O_RDONLY) = 21
fstat(21, {st_mode=S_IFREG|0644, st_size=2316, ...}) = 0
mmap(NULL, 2316, PROT_READ, MAP_PRIVATE, 21, 0) = 0x7f9376902000
close(21) = 0
brk(NULL) = 0x1a52000
brk(0x1a75000) = 0x1a75000
openat(AT_FDCWD, "/home/postgres/testversion/share/locale/C.UTF-8/LC_MESSAGES/postgres-19.mo", O_RDONLY) = -1 ENOENT
(Nosuch file or directory)
openat(AT_FDCWD, "/home/postgres/testversion/share/locale/C.utf8/LC_MESSAGES/postgres-19.mo", O_RDONLY) = -1 ENOENT (No
suchfile or directory)
openat(AT_FDCWD, "/home/postgres/testversion/share/locale/C/LC_MESSAGES/postgres-19.mo", O_RDONLY) = -1 ENOENT (No such
fileor directory)
and away we go. On the newer system, there is no attempt to access
the .mo file at all. It looks like it's decided that C.UTF-8 isn't
really a valid locale and it's just going to ignore everything.
regards, tom lane
Bryan Green <dbryan.green@gmail.com> writes:
> I should have asked you which version of libintl is being used. I went
> ahead and jumped to 0.26 and they now gate like this:
> /* If the current locale value is "C" or "C.<encoding>" or "POSIX",
> we don't load a domain. Return the MSGID. */
> if ((single_locale[0] == 'C'
> && (single_locale[1] == '\0' || single_locale[1] == '.'))
> || strcmp (single_locale, "POSIX") == 0)
> break;
Bleah. I wonder if "POSIX.UTF-8" would work?
regression=# set lc_messages TO 'POSIX.UTF-8';
ERROR: invalid value for parameter "lc_messages": "POSIX.UTF-8"
... nope. Back to the drawing board I guess.
I've reverted the latest patch for now.
regards, tom lane