conchuela timeouts since 2021-10-09 system upgrade
От | Noah Misch |
---|---|
Тема | conchuela timeouts since 2021-10-09 system upgrade |
Дата | |
Msg-id | 20211024161942.GB3945842@rfd.leadboat.com обсуждение исходный текст |
Ответ на | Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data (Andrey Borodin <x4mmm@yandex-team.ru>) |
Ответы |
Re: conchuela timeouts since 2021-10-09 system upgrade
|
Список | pgsql-bugs |
On Sun, Oct 24, 2021 at 02:45:38PM +0300, Andrey Borodin wrote: > > 24 окт. 2021 г., в 08:00, Noah Misch <noah@leadboat.com> написал(а): > > Buildfarm member conchuela (DragonFly BSD 6.0) has gotten multiple > > "IPC::Run: timeout on timer" in the new tests. No other animal has. > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2021-10-24%2003%3A05%3A09 > > is an example run. The pgbench queries finished quickly, but the > > $pgbench_h->finish() apparently timed out after 180s. I guess this would be > > consistent with pgbench blocking in write(), waiting for something to empty a > > pipe buffer so it can write more. I thought finish() will drain any incoming > > I/O, though. This phenomenon has been appearing regularly via > > src/test/recovery/t/017_shm.pl[1], so this thread doesn't have a duty to > > resolve it. A stack trace of the stuck pgbench should be informative, though. > > Some thoughts: > 0. I doubt that psql\pgbench is stuck in these failures. Got it. If pgbench is a zombie, the fault does lie in IPC::Run or the kernel. > 1. All observed similar failures seem to be related to finish() sub of IPC::Run harness > 2. Finish must pump any pending data from process [0]. But it can hang if process is waiting for something. > 3. There is reported bug of finish [1]. But the description is slightly different. Since that report is about a Perl-process child on Linux, I think we can treat it as unrelated. These failures started on 2021-10-09, the day conchuela updated from DragonFly v4.4.3-RELEASE to DragonFly v6.0.0-RELEASE. It smells like a kernel bug. Since the theorized kernel bug seems not to affect src/test/subscription/t/015_stream.pl, I wonder if we can borrow a workaround from other tests. One thing in common with src/test/recovery/t/017_shm.pl and the newest failure sites is that they don't write anything to the child stdin. Does writing e.g. a single byte (that the child doesn't use) work around the problem? If not, does passing the script via stdin, like "pgbench -f- <script.sql", work around the problem? > [0] https://metacpan.org/dist/IPC-Run/source/lib/IPC/Run.pm#L3481 > [1] https://github.com/toddr/IPC-Run/issues/57
В списке pgsql-bugs по дате отправления: