LWLock deadlock in brinRevmapDesummarizeRange
От | Tomas Vondra |
---|---|
Тема | LWLock deadlock in brinRevmapDesummarizeRange |
Дата | |
Msg-id | 261e68bc-f5f5-5234-fb2c-af4f583513c0@enterprisedb.com обсуждение исходный текст |
Ответы |
Re: LWLock deadlock in brinRevmapDesummarizeRange
|
Список | pgsql-hackers |
Hi, While finalizing some fixes in BRIN, I decided to stress-test the relevant part of the code to check if I missed something. Imagine a simple script that builds BRIN indexes on random data, does random changes and cross-checks the results with/without the index. But instead of I almost immediately ran into a LWLock deadlock :-( I've managed to reproduce this on PG13+, but I believe it's there since the brinRevmapDesummarizeRange was introduced in PG10. I just haven't tried on pre-13 releases. The stress-test-2.sh script (attached .tgz) builds a table, fills it with random data and then runs a mix of updates and (de)summarization DDL of random fraction of the index. The lockup is usually triggered within a couple minutes, but might take longer (I guess it depends on parameters used to generate the random data, so it may take a couple runs to hit the right combination). The root cause is that brin_doupdate and brinRevmapDesummarizeRange end up locking buffers in different orders. Attached is also a .patch that adds a bunch of LOG messages for buffer locking in BRIN code (it's for PG13, but should work on newer releases too). Here's a fairly typical example of the interaction between brin_doupdate and brinRevmapDesummarizeRange: brin_doupdate (from UPDATE query): LOG: brin_doupdate: samepage 0 LOG: number of LWLocks held: 0 LOG: brin_getinsertbuffer: locking 898 lock 0x7f9a99a5af64 LOG: brin_getinsertbuffer: buffer locked LOG: brin_getinsertbuffer B: locking 899 lock 0x7f9a99a5afa4 LOG: brin_getinsertbuffer B: buffer locked LOG: number of LWLocks held: 2 LOG: lock 0 => 0x7f9a99a5af64 LOG: lock 1 => 0x7f9a99a5afa4 LOG: brin_doupdate: locking buffer for update LOG: brinLockRevmapPageForUpdate: locking 158 lock 0x7f9a99a4f664 brinRevmapDesummarizeRange (from brin_desummarize_range): LOG: starting brinRevmapDesummarizeRange LOG: number of LWLocks held: 0 LOG: brinLockRevmapPageForUpdate: locking 158 lock 0x7f9a99a4f664 LOG: brinLockRevmapPageForUpdate: buffer locked LOG: number of LWLocks held: 1 LOG: lock 0 => 0x7f9a99a4f664 LOG: brinRevmapDesummarizeRange: locking 898 lock 0x7f9a99a5af64 So, brin_doupdate starts with no LWLocks, and locks buffers 898, 899 (through getinsertbuffer). And then tries to lock 158. Meanwhile brinRevmapDesummarizeRange locks 158 first, and then tries to lock 898. So, a LWLock deadlock :-( I've now seen a bunch of these traces, with only minor differences. For example brinRevmapDesummarizeRange might gets stuck on the second buffer locked by getinsertbuffer (not the first one like here). I don't have a great idea how to fix this - I guess we need to ensure the buffers are locked in the same order, but that seems tricky. Obviously, people don't call brin_desummarize_range() very often, which likely explains the lack of reports. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления: