Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

Поиск

Список

Период

Сортировка

От	Alexander Lakhin
Тема	Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
Дата	1 сентября 2023 г. 08:00:00
Msg-id	60bb34ad-a696-c43d-3f7c-1696796e86ce@gmail.com обсуждение исходный текст
Ответ на	Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) (Thomas Munro <thomas.munro@gmail.com>)
Ответы	Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
Список	pgsql-hackers

Дерево обсуждения

Hello Thomas,

31.08.2023 14:15, Thomas Munro wrote:

> We have a signal that is pending and not blocked, so I don't
> immediately know why poll() hasn't returned control.

When I worked at the Postgres Pro company, we observed a similar lockup
under rather specific conditions (we used Elbrus CPU and the specific Elbrus
compiler (lcc) based on edg).
I managed to reproduce that lockup and Anton Voloshin investigated it.
The issue was caused by the compiler optimization in WaitEventSetWait():
     waiting = true;
...
     while (returned_events == 0)
     {
...
         if (set->latch && set->latch->is_set)
         {
...
             break;
         }

In that case, compiler decided that it may place the read
"set->latch->is_set" before the write "waiting = true".
(Placing "pg_compiler_barrier();" just after "waiting = true;" fixed the
issue for us.)
I can't provide more details for now, but maybe you could look at the binary
code generated on the target platform to confirm or reject my guess.

Best regards,
Alexander

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)