Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
От | Tomas Vondra |
---|---|
Тема | Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) |
Дата | |
Msg-id | 2b14a5ae-3bea-1b19-c685-e00ecb245938@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) |
Список | pgsql-hackers |
On 1/29/23 18:53, Andres Freund wrote: > Hi, > > On 2023-01-29 18:39:05 +0100, Tomas Vondra wrote: >> Will do, but I'll wait for another lockup to see how frequent it >> actually is. I'm now at ~90 runs total, and it didn't happen again yet. >> So hitting it after 15 runs might have been a bit of a luck. > > Was there a difference in how much load there was on the machine between > "reproduced in 15 runs" and "not reproed in 90"? If indeed lack of barriers > is related to the issue, an increase in context switches could substantially > change the behaviour (in both directions). More intra-process context > switches can amount to "probabilistic barriers" because that'll be a > barrier. At the same time it can make it more likely that the relatively > narrow window in WaitEventSetWait() is hit, or lead to larger delays > processing signals. > No. The only thing the machine is doing is while /usr/bin/true; do make check done I can't reduce the workload further, because the "join" test is in a separate parallel group (I cut down parallel_schedule). I could make the machine busier, of course. However, the other lockup I saw was when using serial_schedule, so I guess lower concurrency makes it more likely. But who knows ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: