Potential deadlock in pgaio_io_wait()
От | Thomas Munro |
---|---|
Тема | Potential deadlock in pgaio_io_wait() |
Дата | |
Msg-id | CA+hUKG+mZYrSdnhk-XrBYO18H829K77S9gMKUsykOiTJtqB43g@mail.gmail.com обсуждение исходный текст |
Список | pgsql-bugs |
Hi, I think this sequence can deadlock if there is a ->wait_one() function needed for progress: P1 P2 acquire lock stage io wait for io cv... submit io acquire lock... It's OK to assume that the state will advance to SUBMITTED, but not any further as the current coding does. And even without a deadlock, P2 might just take a long time to get around to completing, when P1 should ideally handle it. A naive solution would be to broadcast on ioh->cv after pgaio_io_prepare_submit()'s state change, with a case to wait for that. But that's expensive. It might be slightly better to have a dedicated per-backend submit_cv, so the owner only has to broadcast once per batch. If ->submit() does significant per-IO work then maybe that's a bit later than it needs to be, but it's really just one system call anyway, so I doubt it matters. Sketch attached for illustration only. A real improvement would be to avoid the broadcast when there are no waiters, but I got a bit too confused about flags and memory barriers so I'm just sharing this problem report and illustration-grade patch for now. I doubt it's very easy to reproduce with simple queries, but I assume if you had a SQL function that acquires a central LWLock and you ran concurrent queries SELECT COUNT(*) FROM t WHERE locking_function(x) it in a tight loop in two sessions given t big enough to require a BufferAccessStrategy so that we keep evicting the buffers and streaming them back in, you must be able to reach it. It might be easier to write test functions that submit and wait for IO handles directly across two sessions or something.
Вложения
В списке pgsql-bugs по дате отправления: