Re: Parallel worker hangs while handling errors.
От | vignesh C |
---|---|
Тема | Re: Parallel worker hangs while handling errors. |
Дата | |
Msg-id | CALDaNm2g5GgBpzGgdwhFgxD=ur42O+Fh+prp-xiyHp+feB+q=w@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Parallel worker hangs while handling errors. (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Ответы |
Re: Parallel worker hangs while handling errors.
|
Список | pgsql-hackers |
Thanks for reviewing and adding your thoughts, My comments are inline. On Fri, Jul 17, 2020 at 1:21 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > The same hang issue can occur(though I'm not able to back it up with a > use case), in the cases from wherever the EmitErrorReport() is called > from "if (sigsetjmp(local_sigjmp_buf, 1) != 0)" block, such as > autovacuum.c, bgwriter.c, bgworker.c, checkpointer.c, walwriter.c, and > postgres.c. > I'm not sure if this can occur in other cases. > > > > One of the fixes could be to call BackgroundWorkerUnblockSignals just > > after sigsetjmp. I'm not sure if this is the best solution. > > Robert & myself had a discussion about the problem yesterday. We felt > > this is a genuine problem with the parallel worker error handling and > > need to be fixed. > > > > Note that, in all sigsetjmp blocks, we intentionally > HOLD_INTERRUPTS(), to not cause any issues while performing error > handling, I'm concerned here that now, if we directly call > BackgroundWorkerUnblockSignals() which will open up all the signals > and our main intention of holding interrupts or block signals may go > away. > > Since the main problem for this hang issue is because of blocking > SIGUSR1, in sigsetjmp, can't we just only unblock only the SIGUSR1, > instead of unblocking all signals? I tried this with parallel copy > hang, the issue is resolved. > On putting further thoughts on this, I feel just unblocking SIGUSR1 would be the right approach in this case. I'm attaching a new patch which unblocks SIGUSR1 signal. I have verified that the original issue with WIP parallel copy patch gets fixed. I have made changes only in bgworker.c as we require the parallel worker to receive this signal and continue processing. I have not included the changes for other processes as I'm not sure if this scenario is applicable for other processes. Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
Вложения
В списке pgsql-hackers по дате отправления: