Обсуждение: pgbench error: (setshell) of script 0; execution of meta-command failed

Поиск
Список
Период
Сортировка

pgbench error: (setshell) of script 0; execution of meta-command failed

От
Andy Fan
Дата:
Hi:

I run into the {subject} issue with the below setup.

cat foo.sql

\setshell txn_mode echo ${TXN_MODE}
\setshell speed echo ${SPEED}
\setshell sleep_ms echo ${SLEEP_MS}
\setshell subtxn_mode echo ${SUBTXN_MODE}

select 1;

$ TXN_MODE=-1 SPEED=1 SLEEP_MS=0 SUBTXN_MODE=-1 pgbench -n -ffoo.sql postgres -T5 -c4 --exit-on-abort

I *randomly*(7/8) get errors like:

pgbench (18devel)
pgbench: error: client 2 aborted in command 0 (setshell) of script 0; execution of meta-command failed
pgbench: error: Run was aborted due to an error in thread 0

I debug this for 1+ hour and didn't find anything useful, so I'd like
have a ask if there is any known issue or the way I use \setshell is
wrong?

Thanks

-- 
Best Regards
Andy Fan




Re: pgbench error: (setshell) of script 0; execution of meta-command failed

От
Fujii Masao
Дата:

On 2025/01/10 16:09, Andy Fan wrote:
> Andy Fan <zhihuifan1213@163.com> writes:
> 
>> Hi:
>>
>> I run into the {subject} issue with the below setup.
>>
>> cat foo.sql
>>
>> \setshell txn_mode echo ${TXN_MODE}
>> \setshell speed echo ${SPEED}
>> \setshell sleep_ms echo ${SLEEP_MS}
>> \setshell subtxn_mode echo ${SUBTXN_MODE}
>>
>> select 1;
>>
>> $ TXN_MODE=-1 SPEED=1 SLEEP_MS=0 SUBTXN_MODE=-1 pgbench -n -ffoo.sql postgres -T5 -c4 --exit-on-abort
>>
>> I *randomly*(7/8) get errors like:
>>
>> pgbench (18devel)
>> pgbench: error: client 2 aborted in command 0 (setshell) of script 0; execution of meta-command failed
>> pgbench: error: Run was aborted due to an error in thread 0

Interestingly, my git bisect pointed to the following commit
as the cause of this issue, even though it seems unrelated to
the pgbench problem at all. It’s possible my git bisect result
is incorrect, but when I reverted this commit on HEAD,
the pgbench issue didn’t occur during my tests.

----------------------
06843df4abc5a0c24e4bd154a8a1327e074fa3ae is the first bad commit
commit 06843df4abc5a0c24e4bd154a8a1327e074fa3ae
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Fri Sep 29 14:07:30 2023 -0400

     Suppress macOS warnings about duplicate libraries in link commands.
----------------------

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: pgbench error: (setshell) of script 0; execution of meta-command failed

От
Nazir Bilal Yavuz
Дата:
Hi,

On Fri, 10 Jan 2025 at 10:10, Andy Fan <zhihuifan1213@163.com> wrote:
>
> Andy Fan <zhihuifan1213@163.com> writes:
>
> > Hi:
> >
> > I run into the {subject} issue with the below setup.
> >
> > cat foo.sql
> >
> > \setshell txn_mode echo ${TXN_MODE}
> > \setshell speed echo ${SPEED}
> > \setshell sleep_ms echo ${SLEEP_MS}
> > \setshell subtxn_mode echo ${SUBTXN_MODE}
> >
> > select 1;
> >
> > $ TXN_MODE=-1 SPEED=1 SLEEP_MS=0 SUBTXN_MODE=-1 pgbench -n -ffoo.sql postgres -T5 -c4 --exit-on-abort
> >
> > I *randomly*(7/8) get errors like:
> >
> > pgbench (18devel)
> > pgbench: error: client 2 aborted in command 0 (setshell) of script 0; execution of meta-command failed
> > pgbench: error: Run was aborted due to an error in thread 0
>
> I think I have figured out the issue, if you want reproduce it quicker,
> you can change the '-T5' to '-T1' in the pgbench command and run many times.

I ran it ~500 times on HEAD but the issue did not occur on my machine.
Is 500 times not enough?

-- 
Regards,
Nazir Bilal Yavuz
Microsoft



Re: pgbench error: (setshell) of script 0; execution of meta-command failed

От
Fujii Masao
Дата:

On 2025/01/10 21:41, Fujii Masao wrote:
> 
> 
> On 2025/01/10 16:09, Andy Fan wrote:
>> Andy Fan <zhihuifan1213@163.com> writes:
>>
>>> Hi:
>>>
>>> I run into the {subject} issue with the below setup.
>>>
>>> cat foo.sql
>>>
>>> \setshell txn_mode echo ${TXN_MODE}
>>> \setshell speed echo ${SPEED}
>>> \setshell sleep_ms echo ${SLEEP_MS}
>>> \setshell subtxn_mode echo ${SUBTXN_MODE}
>>>
>>> select 1;
>>>
>>> $ TXN_MODE=-1 SPEED=1 SLEEP_MS=0 SUBTXN_MODE=-1 pgbench -n -ffoo.sql postgres -T5 -c4 --exit-on-abort
>>>
>>> I *randomly*(7/8) get errors like:
>>>
>>> pgbench (18devel)
>>> pgbench: error: client 2 aborted in command 0 (setshell) of script 0; execution of meta-command failed
>>> pgbench: error: Run was aborted due to an error in thread 0
> 
> Interestingly, my git bisect pointed to the following commit
> as the cause of this issue, even though it seems unrelated to
> the pgbench problem at all. It’s possible my git bisect result
> is incorrect, but when I reverted this commit on HEAD,
> the pgbench issue didn’t occur during my tests.
> 
> ----------------------
> 06843df4abc5a0c24e4bd154a8a1327e074fa3ae is the first bad commit
> commit 06843df4abc5a0c24e4bd154a8a1327e074fa3ae
> Author: Tom Lane <tgl@sss.pgh.pa.us>
> Date:   Fri Sep 29 14:07:30 2023 -0400
> 
>      Suppress macOS warnings about duplicate libraries in link commands.
> ----------------------

Before this commit, pgbench used pqsignal() from port/pqsignal.c
to set the signal handler for SIGALRM. This version of pqsignal()
sets SA_RESTART for frontend code, so fgets() in runShellCommand()
wouldn't return NULL even if SIGALRM arrived during fgets(),
preventing the reported error.

On the other hand, currently, pgbench seems to use pqsignal()
from legacy-pqsignal.c, which doesn't set SA_RESTART for SIGALRM.
As a result, SIGALRM can interrupt fgets() in runShellCommand()
and make it return NULL, leading to the reported error.

I'm not sure if this change was an intentional result of that commit...

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: pgbench error: (setshell) of script 0; execution of meta-command failed

От
Tom Lane
Дата:
Fujii Masao <masao.fujii@oss.nttdata.com> writes:
> Before this commit, pgbench used pqsignal() from port/pqsignal.c
> to set the signal handler for SIGALRM. This version of pqsignal()
> sets SA_RESTART for frontend code, so fgets() in runShellCommand()
> wouldn't return NULL even if SIGALRM arrived during fgets(),
> preventing the reported error.

> On the other hand, currently, pgbench seems to use pqsignal()
> from legacy-pqsignal.c, which doesn't set SA_RESTART for SIGALRM.
> As a result, SIGALRM can interrupt fgets() in runShellCommand()
> and make it return NULL, leading to the reported error.

Oh, interesting.

> I'm not sure if this change was an intentional result of that commit...

Definitely not.  I think we'd better look into how to undo that
effect.

Since legacy-pqsignal is really not supposed to be used by clients
anymore, maybe we could just adjust it to set SA_RESTART for SIGALRM?
The other alternatives I can think of amount to re-introducing
link order dependencies, which would be horrid.

            regards, tom lane



Re: pgbench error: (setshell) of script 0; execution of meta-command failed

От
Tom Lane
Дата:
Nazir Bilal Yavuz <byavuz81@gmail.com> writes:
> I ran it ~500 times on HEAD but the issue did not occur on my machine.

What platform are you testing on?

            regards, tom lane



Re: pgbench error: (setshell) of script 0; execution of meta-command failed

От
Nazir Bilal Yavuz
Дата:
Hi,

On Fri, 10 Jan 2025 at 17:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Nazir Bilal Yavuz <byavuz81@gmail.com> writes:
> > I ran it ~500 times on HEAD but the issue did not occur on my machine.
>
> What platform are you testing on?

My local machine is: Linux 6.12.8-amd64 #1 SMP PREEMPT_DYNAMIC Debian
6.12.8-1 (2025-01-02) x86_64 GNU/Linux

Also, trying on macOS CI VM (by re-running with terminal access) but
no luck so far.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft



Re: pgbench error: (setshell) of script 0; execution of meta-command failed

От
Tom Lane
Дата:
I wrote:
> Since legacy-pqsignal is really not supposed to be used by clients
> anymore, maybe we could just adjust it to set SA_RESTART for SIGALRM?
> The other alternatives I can think of amount to re-introducing
> link order dependencies, which would be horrid.

Actually, after re-reading the thread that led to 06843df4a [1],
I think a better idea is to introduce some macro magic to force
frontend clients to use libpgport's version of pqsignal() instead
of the one from libpq.  We mustn't change the real name of libpq's
version, but I think we could get away with that for libpgport.

I'm a bit tied up today, but can look at this over the weekend if
nobody beats me to it.

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/467042.1695766998%40sss.pgh.pa.us



Re: pgbench error: (setshell) of script 0; execution of meta-command failed

От
Tom Lane
Дата:
I wrote:
> Actually, after re-reading the thread that led to 06843df4a [1],
> I think a better idea is to introduce some macro magic to force
> frontend clients to use libpgport's version of pqsignal() instead
> of the one from libpq.  We mustn't change the real name of libpq's
> version, but I think we could get away with that for libpgport.

After studying this more, I think what we should do in HEAD is
even more aggressive: let's make the real name of port/pqsignal.c's
function be either pqsignal_fe in frontend, or pqsignal_be in backend.
This positively ensures that there's no collision with libpq's
export, and it seems like a good idea for the additional reason that
the frontend and backend versions are not really interchangeable.

However, we can't get away with that in released branches because
it'd be an ABI break for any outside code that calls pqsignal.
What I propose doing in the released branches is what's shown in
the attached patch for v17: rename port/pqsignal.c's function to
pqsignal_fe in frontend, but leave it as pqsignal in the backend.
Leaving it as pqsignal in the backend preserves ABI for extension
modules, at the cost that it's not entirely clear which pqsignal
an extension that's also linked with libpq will get.  But that
hazard affects none of our code, and it existed already for any
third-party extensions that call pqsignal.  In principle using
"pqsignal_fe" in frontend creates an ABI hazard for outside users of
libpgport.a.  But I think it's not really a problem, because we don't
support that as a shared library.  As long as people build with
headers and .a files from the same minor release, they'll be OK.

BTW, this decouples legacy-pqsignal.c from pqsignal.c enough
that we could now do what's contemplated in the comments from
3b00fdba9: simplify that version by making it return void,
or perhaps better just a true/false success report.
I've not touched that point here, though.

            regards, tom lane

diff --git a/src/include/port.h b/src/include/port.h
index 818b7c7bae..f0e28ce5c5 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -513,7 +513,12 @@ extern int    pg_check_dir(const char *dir);
 /* port/pgmkdirp.c */
 extern int    pg_mkdir_p(char *path, int omode);

-/* port/pqsignal.c */
+/* port/pqsignal.c (see also interfaces/libpq/legacy-pqsignal.c) */
+#ifdef FRONTEND
+#define pqsignal pqsignal_fe
+#else
+#define pqsignal pqsignal_be
+#endif
 typedef void (*pqsigfunc) (SIGNAL_ARGS);
 extern pqsigfunc pqsignal(int signo, pqsigfunc func);

diff --git a/src/interfaces/libpq/legacy-pqsignal.c b/src/interfaces/libpq/legacy-pqsignal.c
index 0864888562..01977b4c81 100644
--- a/src/interfaces/libpq/legacy-pqsignal.c
+++ b/src/interfaces/libpq/legacy-pqsignal.c
@@ -28,8 +28,16 @@
  * with the semantics it had in 9.2; in particular, this has different
  * behavior for SIGALRM than the version in src/port/pqsignal.c.
  *
- * libpq itself does not use this.
+ * libpq itself does not use this, nor does anything else in our code.
+ *
+ * src/include/port.h #define's pqsignal as pqsignal_fe or pqsignal_be,
+ * but here we want to export just plain "pqsignal".  We can't rely on
+ * port.h's extern declaration either.  (The point of the #define's
+ * is to ensure that no in-tree code accidentally calls this version.)
  */
+#undef pqsignal
+extern pqsigfunc pqsignal(int signo, pqsigfunc func);
+
 pqsigfunc
 pqsignal(int signo, pqsigfunc func)
 {
diff --git a/src/port/pqsignal.c b/src/port/pqsignal.c
index d328a27279..f707b1c54e 100644
--- a/src/port/pqsignal.c
+++ b/src/port/pqsignal.c
@@ -123,6 +123,10 @@ wrapper_handler(SIGNAL_ARGS)
  * function instead of providing potentially-bogus return values.
  * Unfortunately, that requires modifying the pqsignal() in legacy-pqsignal.c,
  * which in turn requires an SONAME bump, which is probably not worth it.
+ *
+ * Note: the actual name of this function is either pqsignal_fe when
+ * compiled with -DFRONTEND, or pqsignal_be when compiled without.
+ * This is to avoid a name collision with libpq's legacy-pqsignal.c.
  */
 pqsigfunc
 pqsignal(int signo, pqsigfunc func)
diff --git a/src/include/port.h b/src/include/port.h
index 4d2d911cb6..acc39deac4 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -515,7 +515,10 @@ extern int    pg_check_dir(const char *dir);
 /* port/pgmkdirp.c */
 extern int    pg_mkdir_p(char *path, int omode);

-/* port/pqsignal.c */
+/* port/pqsignal.c (see also interfaces/libpq/legacy-pqsignal.c) */
+#ifdef FRONTEND
+#define pqsignal pqsignal_fe
+#endif
 typedef void (*pqsigfunc) (SIGNAL_ARGS);
 extern pqsigfunc pqsignal(int signo, pqsigfunc func);

diff --git a/src/interfaces/libpq/legacy-pqsignal.c b/src/interfaces/libpq/legacy-pqsignal.c
index e8c716ad0f..49755e8acd 100644
--- a/src/interfaces/libpq/legacy-pqsignal.c
+++ b/src/interfaces/libpq/legacy-pqsignal.c
@@ -28,8 +28,16 @@
  * with the semantics it had in 9.2; in particular, this has different
  * behavior for SIGALRM than the version in src/port/pqsignal.c.
  *
- * libpq itself does not use this.
+ * libpq itself does not use this, nor does anything else in our code.
+ *
+ * src/include/port.h will #define pqsignal as pqsignal_fe,
+ * but here we want to export just plain "pqsignal".  We can't rely on
+ * port.h's extern declaration either.  (The point of the #define
+ * is to ensure that no in-tree code accidentally calls this version.)
  */
+#undef pqsignal
+extern pqsigfunc pqsignal(int signo, pqsigfunc func);
+
 pqsigfunc
 pqsignal(int signo, pqsigfunc func)
 {
diff --git a/src/port/pqsignal.c b/src/port/pqsignal.c
index bdaa9f10c8..844e37c827 100644
--- a/src/port/pqsignal.c
+++ b/src/port/pqsignal.c
@@ -123,6 +123,10 @@ wrapper_handler(SIGNAL_ARGS)
  * function instead of providing potentially-bogus return values.
  * Unfortunately, that requires modifying the pqsignal() in legacy-pqsignal.c,
  * which in turn requires an SONAME bump, which is probably not worth it.
+ *
+ * Note: the actual name of this function is either pqsignal_fe when
+ * compiled with -DFRONTEND, or pqsignal when compiled without.
+ * This is to avoid a name collision with libpq's legacy-pqsignal.c.
  */
 pqsigfunc
 pqsignal(int signo, pqsigfunc func)

Re: pgbench error: (setshell) of script 0; execution of meta-command failed

От
Tom Lane
Дата:
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Sat, Jan 11, 2025 at 02:04:13PM -0500, Tom Lane wrote:
>> What I propose doing in the released branches is what's shown in
>> the attached patch for v17: rename port/pqsignal.c's function to
>> pqsignal_fe in frontend, but leave it as pqsignal in the backend.
>> Leaving it as pqsignal in the backend preserves ABI for extension
>> modules, at the cost that it's not entirely clear which pqsignal
>> an extension that's also linked with libpq will get.  But that
>> hazard affects none of our code, and it existed already for any
>> third-party extensions that call pqsignal.  In principle using
>> "pqsignal_fe" in frontend creates an ABI hazard for outside users of
>> libpgport.a.  But I think it's not really a problem, because we don't
>> support that as a shared library.  As long as people build with
>> headers and .a files from the same minor release, they'll be OK.

> I don't have any concrete reasons to think you are wrong about this, but it
> does feel somewhat risky to me.  It might be worth testing it with a couple
> of third-party projects that use the frontend version of pqsignal().
> codesearch.debian.net shows a couple that may be doing so [0] [1] [2].

It's fair to worry about this, but I don't think my testing that would
prove a lot.  AFAICS, whether somebody runs into trouble would depend
on many factors like their specific build process and what versions of
which packages they have installed.

In any case, I think we have a very limited amount of wiggle room.
We definitely cannot change libpq's ABI without provoking howls of
anguish.

I have been wondering whether it would help to add something like
this at the end of port/pqsignal.c in the back branches:

    #ifdef FRONTEND
    #undef pqsignal

    /* ABI-compatibility wrapper */
    pqsigfunc
    pqsignal(int signo, pqsigfunc func)
    {
        return pqsignal_fe(signo, func);
    }
    #endif

(plus or minus an extern or so, but you get the idea).  The point of
this is that compiling against old headers and then linking against
newer libpgport.a would still work.  It does nothing however for the
reverse scenario of compiling against new headers and then linking
against old libpgport.a.  So I haven't persuaded myself that it's
worth the trouble -- but I'm happy to include it if others think
it would help.

            regards, tom lane



Re: pgbench error: (setshell) of script 0; execution of meta-command failed

От
Nathan Bossart
Дата:
On Mon, Jan 13, 2025 at 05:51:54PM -0500, Tom Lane wrote:
> It's fair to worry about this, but I don't think my testing that would
> prove a lot.  AFAICS, whether somebody runs into trouble would depend
> on many factors like their specific build process and what versions of
> which packages they have installed.
> 
> In any case, I think we have a very limited amount of wiggle room.
> We definitely cannot change libpq's ABI without provoking howls of
> anguish.

Fair point.

> I have been wondering whether it would help to add something like
> this at the end of port/pqsignal.c in the back branches:
> 
>     #ifdef FRONTEND
>     #undef pqsignal
> 
>     /* ABI-compatibility wrapper */
>     pqsigfunc
>     pqsignal(int signo, pqsigfunc func)
>     {
>         return pqsignal_fe(signo, func);
>     }
>     #endif
> 
> (plus or minus an extern or so, but you get the idea).  The point of
> this is that compiling against old headers and then linking against
> newer libpgport.a would still work.  It does nothing however for the
> reverse scenario of compiling against new headers and then linking
> against old libpgport.a.  So I haven't persuaded myself that it's
> worth the trouble -- but I'm happy to include it if others think
> it would help.

After sleeping on it, I think I agree that the extra gymnastics aren't
worth it to partially fix something that wasn't supported anyway.  But I'm
not mortally opposed to it if someone feels strongly that it should be
included.

-- 
nathan



Re: pgbench error: (setshell) of script 0; execution of meta-command failed

От
Tom Lane
Дата:
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Mon, Jan 13, 2025 at 05:51:54PM -0500, Tom Lane wrote:
>> (plus or minus an extern or so, but you get the idea).  The point of
>> this is that compiling against old headers and then linking against
>> newer libpgport.a would still work.  It does nothing however for the
>> reverse scenario of compiling against new headers and then linking
>> against old libpgport.a.  So I haven't persuaded myself that it's
>> worth the trouble -- but I'm happy to include it if others think
>> it would help.

> After sleeping on it, I think I agree that the extra gymnastics aren't
> worth it to partially fix something that wasn't supported anyway.  But I'm
> not mortally opposed to it if someone feels strongly that it should be
> included.

After more thought I've realized that the asymmetrical detection
here isn't all that bad, because the outcomes are different.
If we fail to catch old-headers-and-new-library, the result will
either be a link failure or (if the extension uses libpq) silently
linking to libpq's pqsignal, which was likely not what was intended.
If we fail to catch the other case, the result is always a link
failure, and that will happen at build time not in the field.

So now I'm inclined to include the ABI-compatible wrapper, which
will ensure that extensions continue to link to libpgport's pqsignal.

            regards, tom lane



Re: pgbench error: (setshell) of script 0; execution of meta-command failed

От
Nathan Bossart
Дата:
On Tue, Jan 14, 2025 at 02:52:29PM -0500, Tom Lane wrote:
> After more thought I've realized that the asymmetrical detection
> here isn't all that bad, because the outcomes are different.
> If we fail to catch old-headers-and-new-library, the result will
> either be a link failure or (if the extension uses libpq) silently
> linking to libpq's pqsignal, which was likely not what was intended.
> If we fail to catch the other case, the result is always a link
> failure, and that will happen at build time not in the field.

Assuming libpgport is only used as a static library, I think that makes
sense.  My web searches indicate that it is possible to do things like
convert a static library to a dynamic one, but perhaps that's far enough
beyond the realm of things we care about.

> So now I'm inclined to include the ABI-compatible wrapper, which
> will ensure that extensions continue to link to libpgport's pqsignal.

Fine by me.

-- 
nathan



Re: pgbench error: (setshell) of script 0; execution of meta-command failed

От
Tom Lane
Дата:
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Tue, Jan 14, 2025 at 02:52:29PM -0500, Tom Lane wrote:
>> So now I'm inclined to include the ABI-compatible wrapper, which
>> will ensure that extensions continue to link to libpgport's pqsignal.

> Fine by me.

OK, I'll make it so.  Thanks for looking at it!

            regards, tom lane



Re: pgbench error: (setshell) of script 0; execution of meta-command failed

От
Tom Lane
Дата:
I wrote:
> OK, I'll make it so.  Thanks for looking at it!

Or not.  My idea worked okay in v17, but not in older branches.
Pre-v17, libpq itself can call pqsignal (though only in non-
thread-safe builds).  With this patch, that would have resulted
in pulling src/port/pqsignal.o into libpq.  libpq itself is fine
with calling that version, but if a stub version of pqsignal()
comes along for the ride then we have two candidates for which
function will be exported.

It would probably be possible to fix that, say by making the
wrapper version be a separate .o module within libpgport.
But it would be more work and complication, and I couldn't
get excited about investing such effort for a hypothetical
build problem.  So I've pushed the patches as I originally
proposed them.  If anyone else is excited about doing something
more, step right up.

            regards, tom lane