Обсуждение: pgbench error: (setshell) of script 0; execution of meta-command failed
Hi: I run into the {subject} issue with the below setup. cat foo.sql \setshell txn_mode echo ${TXN_MODE} \setshell speed echo ${SPEED} \setshell sleep_ms echo ${SLEEP_MS} \setshell subtxn_mode echo ${SUBTXN_MODE} select 1; $ TXN_MODE=-1 SPEED=1 SLEEP_MS=0 SUBTXN_MODE=-1 pgbench -n -ffoo.sql postgres -T5 -c4 --exit-on-abort I *randomly*(7/8) get errors like: pgbench (18devel) pgbench: error: client 2 aborted in command 0 (setshell) of script 0; execution of meta-command failed pgbench: error: Run was aborted due to an error in thread 0 I debug this for 1+ hour and didn't find anything useful, so I'd like have a ask if there is any known issue or the way I use \setshell is wrong? Thanks -- Best Regards Andy Fan
On 2025/01/10 16:09, Andy Fan wrote: > Andy Fan <zhihuifan1213@163.com> writes: > >> Hi: >> >> I run into the {subject} issue with the below setup. >> >> cat foo.sql >> >> \setshell txn_mode echo ${TXN_MODE} >> \setshell speed echo ${SPEED} >> \setshell sleep_ms echo ${SLEEP_MS} >> \setshell subtxn_mode echo ${SUBTXN_MODE} >> >> select 1; >> >> $ TXN_MODE=-1 SPEED=1 SLEEP_MS=0 SUBTXN_MODE=-1 pgbench -n -ffoo.sql postgres -T5 -c4 --exit-on-abort >> >> I *randomly*(7/8) get errors like: >> >> pgbench (18devel) >> pgbench: error: client 2 aborted in command 0 (setshell) of script 0; execution of meta-command failed >> pgbench: error: Run was aborted due to an error in thread 0 Interestingly, my git bisect pointed to the following commit as the cause of this issue, even though it seems unrelated to the pgbench problem at all. It’s possible my git bisect result is incorrect, but when I reverted this commit on HEAD, the pgbench issue didn’t occur during my tests. ---------------------- 06843df4abc5a0c24e4bd154a8a1327e074fa3ae is the first bad commit commit 06843df4abc5a0c24e4bd154a8a1327e074fa3ae Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Fri Sep 29 14:07:30 2023 -0400 Suppress macOS warnings about duplicate libraries in link commands. ---------------------- Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: pgbench error: (setshell) of script 0; execution of meta-command failed
От
Nazir Bilal Yavuz
Дата:
Hi, On Fri, 10 Jan 2025 at 10:10, Andy Fan <zhihuifan1213@163.com> wrote: > > Andy Fan <zhihuifan1213@163.com> writes: > > > Hi: > > > > I run into the {subject} issue with the below setup. > > > > cat foo.sql > > > > \setshell txn_mode echo ${TXN_MODE} > > \setshell speed echo ${SPEED} > > \setshell sleep_ms echo ${SLEEP_MS} > > \setshell subtxn_mode echo ${SUBTXN_MODE} > > > > select 1; > > > > $ TXN_MODE=-1 SPEED=1 SLEEP_MS=0 SUBTXN_MODE=-1 pgbench -n -ffoo.sql postgres -T5 -c4 --exit-on-abort > > > > I *randomly*(7/8) get errors like: > > > > pgbench (18devel) > > pgbench: error: client 2 aborted in command 0 (setshell) of script 0; execution of meta-command failed > > pgbench: error: Run was aborted due to an error in thread 0 > > I think I have figured out the issue, if you want reproduce it quicker, > you can change the '-T5' to '-T1' in the pgbench command and run many times. I ran it ~500 times on HEAD but the issue did not occur on my machine. Is 500 times not enough? -- Regards, Nazir Bilal Yavuz Microsoft
On 2025/01/10 21:41, Fujii Masao wrote: > > > On 2025/01/10 16:09, Andy Fan wrote: >> Andy Fan <zhihuifan1213@163.com> writes: >> >>> Hi: >>> >>> I run into the {subject} issue with the below setup. >>> >>> cat foo.sql >>> >>> \setshell txn_mode echo ${TXN_MODE} >>> \setshell speed echo ${SPEED} >>> \setshell sleep_ms echo ${SLEEP_MS} >>> \setshell subtxn_mode echo ${SUBTXN_MODE} >>> >>> select 1; >>> >>> $ TXN_MODE=-1 SPEED=1 SLEEP_MS=0 SUBTXN_MODE=-1 pgbench -n -ffoo.sql postgres -T5 -c4 --exit-on-abort >>> >>> I *randomly*(7/8) get errors like: >>> >>> pgbench (18devel) >>> pgbench: error: client 2 aborted in command 0 (setshell) of script 0; execution of meta-command failed >>> pgbench: error: Run was aborted due to an error in thread 0 > > Interestingly, my git bisect pointed to the following commit > as the cause of this issue, even though it seems unrelated to > the pgbench problem at all. It’s possible my git bisect result > is incorrect, but when I reverted this commit on HEAD, > the pgbench issue didn’t occur during my tests. > > ---------------------- > 06843df4abc5a0c24e4bd154a8a1327e074fa3ae is the first bad commit > commit 06843df4abc5a0c24e4bd154a8a1327e074fa3ae > Author: Tom Lane <tgl@sss.pgh.pa.us> > Date: Fri Sep 29 14:07:30 2023 -0400 > > Suppress macOS warnings about duplicate libraries in link commands. > ---------------------- Before this commit, pgbench used pqsignal() from port/pqsignal.c to set the signal handler for SIGALRM. This version of pqsignal() sets SA_RESTART for frontend code, so fgets() in runShellCommand() wouldn't return NULL even if SIGALRM arrived during fgets(), preventing the reported error. On the other hand, currently, pgbench seems to use pqsignal() from legacy-pqsignal.c, which doesn't set SA_RESTART for SIGALRM. As a result, SIGALRM can interrupt fgets() in runShellCommand() and make it return NULL, leading to the reported error. I'm not sure if this change was an intentional result of that commit... Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Fujii Masao <masao.fujii@oss.nttdata.com> writes: > Before this commit, pgbench used pqsignal() from port/pqsignal.c > to set the signal handler for SIGALRM. This version of pqsignal() > sets SA_RESTART for frontend code, so fgets() in runShellCommand() > wouldn't return NULL even if SIGALRM arrived during fgets(), > preventing the reported error. > On the other hand, currently, pgbench seems to use pqsignal() > from legacy-pqsignal.c, which doesn't set SA_RESTART for SIGALRM. > As a result, SIGALRM can interrupt fgets() in runShellCommand() > and make it return NULL, leading to the reported error. Oh, interesting. > I'm not sure if this change was an intentional result of that commit... Definitely not. I think we'd better look into how to undo that effect. Since legacy-pqsignal is really not supposed to be used by clients anymore, maybe we could just adjust it to set SA_RESTART for SIGALRM? The other alternatives I can think of amount to re-introducing link order dependencies, which would be horrid. regards, tom lane
Nazir Bilal Yavuz <byavuz81@gmail.com> writes: > I ran it ~500 times on HEAD but the issue did not occur on my machine. What platform are you testing on? regards, tom lane
Re: pgbench error: (setshell) of script 0; execution of meta-command failed
От
Nazir Bilal Yavuz
Дата:
Hi, On Fri, 10 Jan 2025 at 17:58, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Nazir Bilal Yavuz <byavuz81@gmail.com> writes: > > I ran it ~500 times on HEAD but the issue did not occur on my machine. > > What platform are you testing on? My local machine is: Linux 6.12.8-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.12.8-1 (2025-01-02) x86_64 GNU/Linux Also, trying on macOS CI VM (by re-running with terminal access) but no luck so far. -- Regards, Nazir Bilal Yavuz Microsoft
I wrote: > Since legacy-pqsignal is really not supposed to be used by clients > anymore, maybe we could just adjust it to set SA_RESTART for SIGALRM? > The other alternatives I can think of amount to re-introducing > link order dependencies, which would be horrid. Actually, after re-reading the thread that led to 06843df4a [1], I think a better idea is to introduce some macro magic to force frontend clients to use libpgport's version of pqsignal() instead of the one from libpq. We mustn't change the real name of libpq's version, but I think we could get away with that for libpgport. I'm a bit tied up today, but can look at this over the weekend if nobody beats me to it. regards, tom lane [1] https://www.postgresql.org/message-id/flat/467042.1695766998%40sss.pgh.pa.us
I wrote: > Actually, after re-reading the thread that led to 06843df4a [1], > I think a better idea is to introduce some macro magic to force > frontend clients to use libpgport's version of pqsignal() instead > of the one from libpq. We mustn't change the real name of libpq's > version, but I think we could get away with that for libpgport. After studying this more, I think what we should do in HEAD is even more aggressive: let's make the real name of port/pqsignal.c's function be either pqsignal_fe in frontend, or pqsignal_be in backend. This positively ensures that there's no collision with libpq's export, and it seems like a good idea for the additional reason that the frontend and backend versions are not really interchangeable. However, we can't get away with that in released branches because it'd be an ABI break for any outside code that calls pqsignal. What I propose doing in the released branches is what's shown in the attached patch for v17: rename port/pqsignal.c's function to pqsignal_fe in frontend, but leave it as pqsignal in the backend. Leaving it as pqsignal in the backend preserves ABI for extension modules, at the cost that it's not entirely clear which pqsignal an extension that's also linked with libpq will get. But that hazard affects none of our code, and it existed already for any third-party extensions that call pqsignal. In principle using "pqsignal_fe" in frontend creates an ABI hazard for outside users of libpgport.a. But I think it's not really a problem, because we don't support that as a shared library. As long as people build with headers and .a files from the same minor release, they'll be OK. BTW, this decouples legacy-pqsignal.c from pqsignal.c enough that we could now do what's contemplated in the comments from 3b00fdba9: simplify that version by making it return void, or perhaps better just a true/false success report. I've not touched that point here, though. regards, tom lane diff --git a/src/include/port.h b/src/include/port.h index 818b7c7bae..f0e28ce5c5 100644 --- a/src/include/port.h +++ b/src/include/port.h @@ -513,7 +513,12 @@ extern int pg_check_dir(const char *dir); /* port/pgmkdirp.c */ extern int pg_mkdir_p(char *path, int omode); -/* port/pqsignal.c */ +/* port/pqsignal.c (see also interfaces/libpq/legacy-pqsignal.c) */ +#ifdef FRONTEND +#define pqsignal pqsignal_fe +#else +#define pqsignal pqsignal_be +#endif typedef void (*pqsigfunc) (SIGNAL_ARGS); extern pqsigfunc pqsignal(int signo, pqsigfunc func); diff --git a/src/interfaces/libpq/legacy-pqsignal.c b/src/interfaces/libpq/legacy-pqsignal.c index 0864888562..01977b4c81 100644 --- a/src/interfaces/libpq/legacy-pqsignal.c +++ b/src/interfaces/libpq/legacy-pqsignal.c @@ -28,8 +28,16 @@ * with the semantics it had in 9.2; in particular, this has different * behavior for SIGALRM than the version in src/port/pqsignal.c. * - * libpq itself does not use this. + * libpq itself does not use this, nor does anything else in our code. + * + * src/include/port.h #define's pqsignal as pqsignal_fe or pqsignal_be, + * but here we want to export just plain "pqsignal". We can't rely on + * port.h's extern declaration either. (The point of the #define's + * is to ensure that no in-tree code accidentally calls this version.) */ +#undef pqsignal +extern pqsigfunc pqsignal(int signo, pqsigfunc func); + pqsigfunc pqsignal(int signo, pqsigfunc func) { diff --git a/src/port/pqsignal.c b/src/port/pqsignal.c index d328a27279..f707b1c54e 100644 --- a/src/port/pqsignal.c +++ b/src/port/pqsignal.c @@ -123,6 +123,10 @@ wrapper_handler(SIGNAL_ARGS) * function instead of providing potentially-bogus return values. * Unfortunately, that requires modifying the pqsignal() in legacy-pqsignal.c, * which in turn requires an SONAME bump, which is probably not worth it. + * + * Note: the actual name of this function is either pqsignal_fe when + * compiled with -DFRONTEND, or pqsignal_be when compiled without. + * This is to avoid a name collision with libpq's legacy-pqsignal.c. */ pqsigfunc pqsignal(int signo, pqsigfunc func) diff --git a/src/include/port.h b/src/include/port.h index 4d2d911cb6..acc39deac4 100644 --- a/src/include/port.h +++ b/src/include/port.h @@ -515,7 +515,10 @@ extern int pg_check_dir(const char *dir); /* port/pgmkdirp.c */ extern int pg_mkdir_p(char *path, int omode); -/* port/pqsignal.c */ +/* port/pqsignal.c (see also interfaces/libpq/legacy-pqsignal.c) */ +#ifdef FRONTEND +#define pqsignal pqsignal_fe +#endif typedef void (*pqsigfunc) (SIGNAL_ARGS); extern pqsigfunc pqsignal(int signo, pqsigfunc func); diff --git a/src/interfaces/libpq/legacy-pqsignal.c b/src/interfaces/libpq/legacy-pqsignal.c index e8c716ad0f..49755e8acd 100644 --- a/src/interfaces/libpq/legacy-pqsignal.c +++ b/src/interfaces/libpq/legacy-pqsignal.c @@ -28,8 +28,16 @@ * with the semantics it had in 9.2; in particular, this has different * behavior for SIGALRM than the version in src/port/pqsignal.c. * - * libpq itself does not use this. + * libpq itself does not use this, nor does anything else in our code. + * + * src/include/port.h will #define pqsignal as pqsignal_fe, + * but here we want to export just plain "pqsignal". We can't rely on + * port.h's extern declaration either. (The point of the #define + * is to ensure that no in-tree code accidentally calls this version.) */ +#undef pqsignal +extern pqsigfunc pqsignal(int signo, pqsigfunc func); + pqsigfunc pqsignal(int signo, pqsigfunc func) { diff --git a/src/port/pqsignal.c b/src/port/pqsignal.c index bdaa9f10c8..844e37c827 100644 --- a/src/port/pqsignal.c +++ b/src/port/pqsignal.c @@ -123,6 +123,10 @@ wrapper_handler(SIGNAL_ARGS) * function instead of providing potentially-bogus return values. * Unfortunately, that requires modifying the pqsignal() in legacy-pqsignal.c, * which in turn requires an SONAME bump, which is probably not worth it. + * + * Note: the actual name of this function is either pqsignal_fe when + * compiled with -DFRONTEND, or pqsignal when compiled without. + * This is to avoid a name collision with libpq's legacy-pqsignal.c. */ pqsigfunc pqsignal(int signo, pqsigfunc func)
Nathan Bossart <nathandbossart@gmail.com> writes: > On Sat, Jan 11, 2025 at 02:04:13PM -0500, Tom Lane wrote: >> What I propose doing in the released branches is what's shown in >> the attached patch for v17: rename port/pqsignal.c's function to >> pqsignal_fe in frontend, but leave it as pqsignal in the backend. >> Leaving it as pqsignal in the backend preserves ABI for extension >> modules, at the cost that it's not entirely clear which pqsignal >> an extension that's also linked with libpq will get. But that >> hazard affects none of our code, and it existed already for any >> third-party extensions that call pqsignal. In principle using >> "pqsignal_fe" in frontend creates an ABI hazard for outside users of >> libpgport.a. But I think it's not really a problem, because we don't >> support that as a shared library. As long as people build with >> headers and .a files from the same minor release, they'll be OK. > I don't have any concrete reasons to think you are wrong about this, but it > does feel somewhat risky to me. It might be worth testing it with a couple > of third-party projects that use the frontend version of pqsignal(). > codesearch.debian.net shows a couple that may be doing so [0] [1] [2]. It's fair to worry about this, but I don't think my testing that would prove a lot. AFAICS, whether somebody runs into trouble would depend on many factors like their specific build process and what versions of which packages they have installed. In any case, I think we have a very limited amount of wiggle room. We definitely cannot change libpq's ABI without provoking howls of anguish. I have been wondering whether it would help to add something like this at the end of port/pqsignal.c in the back branches: #ifdef FRONTEND #undef pqsignal /* ABI-compatibility wrapper */ pqsigfunc pqsignal(int signo, pqsigfunc func) { return pqsignal_fe(signo, func); } #endif (plus or minus an extern or so, but you get the idea). The point of this is that compiling against old headers and then linking against newer libpgport.a would still work. It does nothing however for the reverse scenario of compiling against new headers and then linking against old libpgport.a. So I haven't persuaded myself that it's worth the trouble -- but I'm happy to include it if others think it would help. regards, tom lane
On Mon, Jan 13, 2025 at 05:51:54PM -0500, Tom Lane wrote: > It's fair to worry about this, but I don't think my testing that would > prove a lot. AFAICS, whether somebody runs into trouble would depend > on many factors like their specific build process and what versions of > which packages they have installed. > > In any case, I think we have a very limited amount of wiggle room. > We definitely cannot change libpq's ABI without provoking howls of > anguish. Fair point. > I have been wondering whether it would help to add something like > this at the end of port/pqsignal.c in the back branches: > > #ifdef FRONTEND > #undef pqsignal > > /* ABI-compatibility wrapper */ > pqsigfunc > pqsignal(int signo, pqsigfunc func) > { > return pqsignal_fe(signo, func); > } > #endif > > (plus or minus an extern or so, but you get the idea). The point of > this is that compiling against old headers and then linking against > newer libpgport.a would still work. It does nothing however for the > reverse scenario of compiling against new headers and then linking > against old libpgport.a. So I haven't persuaded myself that it's > worth the trouble -- but I'm happy to include it if others think > it would help. After sleeping on it, I think I agree that the extra gymnastics aren't worth it to partially fix something that wasn't supported anyway. But I'm not mortally opposed to it if someone feels strongly that it should be included. -- nathan
Nathan Bossart <nathandbossart@gmail.com> writes: > On Mon, Jan 13, 2025 at 05:51:54PM -0500, Tom Lane wrote: >> (plus or minus an extern or so, but you get the idea). The point of >> this is that compiling against old headers and then linking against >> newer libpgport.a would still work. It does nothing however for the >> reverse scenario of compiling against new headers and then linking >> against old libpgport.a. So I haven't persuaded myself that it's >> worth the trouble -- but I'm happy to include it if others think >> it would help. > After sleeping on it, I think I agree that the extra gymnastics aren't > worth it to partially fix something that wasn't supported anyway. But I'm > not mortally opposed to it if someone feels strongly that it should be > included. After more thought I've realized that the asymmetrical detection here isn't all that bad, because the outcomes are different. If we fail to catch old-headers-and-new-library, the result will either be a link failure or (if the extension uses libpq) silently linking to libpq's pqsignal, which was likely not what was intended. If we fail to catch the other case, the result is always a link failure, and that will happen at build time not in the field. So now I'm inclined to include the ABI-compatible wrapper, which will ensure that extensions continue to link to libpgport's pqsignal. regards, tom lane
On Tue, Jan 14, 2025 at 02:52:29PM -0500, Tom Lane wrote: > After more thought I've realized that the asymmetrical detection > here isn't all that bad, because the outcomes are different. > If we fail to catch old-headers-and-new-library, the result will > either be a link failure or (if the extension uses libpq) silently > linking to libpq's pqsignal, which was likely not what was intended. > If we fail to catch the other case, the result is always a link > failure, and that will happen at build time not in the field. Assuming libpgport is only used as a static library, I think that makes sense. My web searches indicate that it is possible to do things like convert a static library to a dynamic one, but perhaps that's far enough beyond the realm of things we care about. > So now I'm inclined to include the ABI-compatible wrapper, which > will ensure that extensions continue to link to libpgport's pqsignal. Fine by me. -- nathan
Nathan Bossart <nathandbossart@gmail.com> writes: > On Tue, Jan 14, 2025 at 02:52:29PM -0500, Tom Lane wrote: >> So now I'm inclined to include the ABI-compatible wrapper, which >> will ensure that extensions continue to link to libpgport's pqsignal. > Fine by me. OK, I'll make it so. Thanks for looking at it! regards, tom lane
I wrote: > OK, I'll make it so. Thanks for looking at it! Or not. My idea worked okay in v17, but not in older branches. Pre-v17, libpq itself can call pqsignal (though only in non- thread-safe builds). With this patch, that would have resulted in pulling src/port/pqsignal.o into libpq. libpq itself is fine with calling that version, but if a stub version of pqsignal() comes along for the ride then we have two candidates for which function will be exported. It would probably be possible to fix that, say by making the wrapper version be a separate .o module within libpgport. But it would be more work and complication, and I couldn't get excited about investing such effort for a hypothetical build problem. So I've pushed the patches as I originally proposed them. If anyone else is excited about doing something more, step right up. regards, tom lane