Re: conchuela timeouts since 2021-10-09 system upgrade
| От | Noah Misch |
|---|---|
| Тема | Re: conchuela timeouts since 2021-10-09 system upgrade |
| Дата | |
| Msg-id | 20211026015157.GA113335@rfd.leadboat.com обсуждение исходный текст |
| Ответ на | Re: conchuela timeouts since 2021-10-09 system upgrade (Tom Lane <tgl@sss.pgh.pa.us>) |
| Ответы |
Re: conchuela timeouts since 2021-10-09 system upgrade
|
| Список | pgsql-bugs |
On Mon, Oct 25, 2021 at 04:59:42PM -0400, Tom Lane wrote: > Andrey Borodin <x4mmm@yandex-team.ru> writes: > > FWIW it's easy to make the issue reproduce faster with following diff > > - '--no-vacuum --client=1 --transactions=100', > > + '--no-vacuum --client=1 --transactions=1', > > Hmm, didn't help here. It seems that even though prairiedog managed to > fail on its first attempt, it's not terribly reproducible there; I've > seen only one failure in about 30 manual attempts. In the one failure, > the non-background pgbench completed fine (as determined by counting > statements in the postmaster's log); but the background one had only > finished about 90 transactions before seemingly getting stuck. No new > SQL commands had been issued after about 10 seconds. Interesting. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2021-10-24%2016%3A05%3A58 also shows a short command count, just 131/200 completed. However, https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2021-10-25%2000%3A35%3A27 shows the full 200/200. I'm starting to think the prairiedog failures have only superficial similarity to the conchuela failures. > Nonetheless, I have a theory and a proposal. This coding pattern > seems pretty silly: > > $pgbench_h->pump_nb; > $pgbench_h->finish(); > > ISTM that if you need to call pump at all, you need a loop not just > one call. So I'm guessing that when it fails, it's for lack of > pumping. The pump_nb() is just unnecessary. We've not added anything destined for stdin, and finish() takes care of pumping outputs. > The other thing I noticed is that at least on prairiedog's host, the > number of invocations of the DROP/CREATE/bt_index_check transaction > is ridiculously out of proportion to the number of invocations of the > other transactions. It can only get through seven or eight iterations > of the index transaction before the other transactions are all done, > which means the last 190 iterations of that transaction are a complete > waste of cycles. That makes sense. > What I think we should do in these two tests is nuke the use of > background_pgbench entirely; that looks like a solution in search > of a problem, and it seems unnecessary here. Why not run > the DROP/CREATE/bt_index_check transaction as one of three script > options in the main pgbench run? The author tried that and got deadlocks: https://postgr.es/m/5E041A70-4946-489C-9B6D-764DF627A92D@yandex-team.ru On prairiedog, the proximate trouble is pgbench getting stuck. IPC::Run is behaving normally given a stuck pgbench. When pgbench stops sending queries, does pg_stat_activity show anything at all running? If so, are those backends waiting on locks? If not, what's the pgbench stack trace at that time?
В списке pgsql-bugs по дате отправления: