Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
От | Alexander Lakhin |
---|---|
Тема | Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) |
Дата | |
Msg-id | 60bb34ad-a696-c43d-3f7c-1696796e86ce@gmail.com обсуждение исходный текст |
Ответ на | Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) |
Список | pgsql-hackers |
Hello Thomas, 31.08.2023 14:15, Thomas Munro wrote: > We have a signal that is pending and not blocked, so I don't > immediately know why poll() hasn't returned control. When I worked at the Postgres Pro company, we observed a similar lockup under rather specific conditions (we used Elbrus CPU and the specific Elbrus compiler (lcc) based on edg). I managed to reproduce that lockup and Anton Voloshin investigated it. The issue was caused by the compiler optimization in WaitEventSetWait(): waiting = true; ... while (returned_events == 0) { ... if (set->latch && set->latch->is_set) { ... break; } In that case, compiler decided that it may place the read "set->latch->is_set" before the write "waiting = true". (Placing "pg_compiler_barrier();" just after "waiting = true;" fixed the issue for us.) I can't provide more details for now, but maybe you could look at the binary code generated on the target platform to confirm or reject my guess. Best regards, Alexander
В списке pgsql-hackers по дате отправления: