Re: Suggestion to add --continue-client-on-abort option to pgbench
От | Rintaro Ikeda |
---|---|
Тема | Re: Suggestion to add --continue-client-on-abort option to pgbench |
Дата | |
Msg-id | 2347f414-ca32-4ab1-9de1-fd719a0f7e0e@oss.nttdata.com обсуждение исходный текст |
Ответ на | Re: Suggestion to add --continue-client-on-abort option to pgbench (Fujii Masao <masao.fujii@gmail.com>) |
Ответы |
Re: Suggestion to add --continue-client-on-abort option to pgbench
|
Список | pgsql-hackers |
Hi, On 2025/09/22 11:56, Fujii Masao wrote: > On Sat, Sep 20, 2025 at 12:21 AM Yugo Nagata <nagata@sraoss.co.jp> wrote: >>> While testing, I found that running pgbench with --continue-on-error and >>> pipeline mode triggers the following assertion failure. Could this be >>> a bug in the patch? >>> >>> --------------------------------------------------- >>> $ cat pipeline.pgbench >>> \startpipeline >>> DO $$ >>> BEGIN >>> PERFORM pg_sleep(3); >>> PERFORM pg_terminate_backend(pg_backend_pid()); >>> END $$; >>> \endpipeline >>> >>> $ pgbench -n --debug --verbose-errors -f pipeline.pgbench -c 2 -t 4 -M >>> extended --continue-on-error >>> ... >>> Assertion failed: >>> (sql_script[st->use_file].commands[st->command]->type == 1), function >>> commandError, file pgbench.c, line 3081. >>> Abort trap: 6 >>> --------------------------------------------------- >>> >>> When I ran the same command without --continue-on-error, >>> the assertion failure did not occur. >> >> I think this bug was introduced by commit 4a39f87acd6e, which enabled pgbench >> to retry and added the --verbose-errors option, rather than by this patch itself. >> >> The assertion failure occurs in commandError(), which is called to report an error when >> it can be retried (i.e., serializable failure or deadlock), or when --continue-on-error >> is used after this patch. >> >> Assert(sql_script[st->use_file].commands[st->command]->type == SQL_COMMAND); >> >> This assumes the error is always detected during SQL command execution, but >> that’s not correct, since in pipeline mode, the error can be detected when >> a \endpipeline meta-command is executed. >> >> $ cat deadlock.sql >> \startpipeline >> begin; >> lock b; >> lock a; >> end; >> \endpipeline >> >> $ cat deadlock2.sql >> \startpipeline >> begin; >> lock a; >> lock b; >> end; >> \endpipeline >> >> $ pgbench --verbose-errors -f deadlock.sql -f deadlock2.sql -c 2 -T 3 -M extended >> pgbench (19devel) >> starting vacuum...end. >> pgbench: pgbench.c:3062: commandError: Assertion `sql_script[st->use_file].commands[st->command]->type == 1' failed. >> >> Although one option would be to remove this assertion, if we prefer to keep it, >> the attached patch fixes the issue. > > Thanks for the analysis and the patch! > > I think we should fix the issue rather than just removing the assertion. > I'd like to apply your patch with the following source comment: > > --------------------------- > Errors should only be detected during an SQL command or the \endpipeline > meta command. Any other case triggers an assertion failure. > -------------------------- > > > With your patch and the continue-on-error patches, running the same pgbench > command I used to reproduce the assertion failure upthread causes pgbench > to hang. From my analysis, it enters an infinite loop in discardUntilSync(). > That loop waits for PGRES_PIPELINE_SYNC, but since the connection has already > been closed, it never arrives, leaving pgbench stuck. > > Could this also happen without the continue-on-error patch, or is it a new bug > introduced by it? Either way, it seems pgbench needs to exit the loop when > the result status is PGRES_FATAL_ERROR. > Thank you for the analysis and the patches. I think the issue is a new bug because we have transitioned to CSTATE_ABORT immediately after queries failed, without executing discardUntilSync(). I've attached a patch that fixes the assertion error. The content of v1 patch by Mr. Nagata is also included. I would appreciate it if you review my patch. Regards, Rintaro Ikeda
Вложения
В списке pgsql-hackers по дате отправления: