VM corruption on standby
От | Andrey Borodin |
---|---|
Тема | VM corruption on standby |
Дата | |
Msg-id | B3C69B86-7F82-4111-B97F-0005497BB745@yandex-team.ru обсуждение исходный текст |
Список | pgsql-hackers |
Hi hackers! I was reviewing the patch about removing xl_heap_visible and found the VM\WAL machinery very interesting. At Yandex we had several incidents with corrupted VM and on pgconf.dev colleagues from AWS confirmed that they saw somethingsimilar too. So I toyed around and accidentally wrote a test that reproduces $subj. I think the corruption happens as follows: 0. we create a table with one frozen tuple 1. next heap_insert() clears VM bit and hangs immediately, nothing was logged yet 2. VM buffer is flushed on disk with checkpointer or bgwriter 3. primary is killed with -9 now we have a page that is ALL_VISIBLE\ALL_FORZEN on standby, but clear VM bits on primary 4. subsequent insert does not set XLH_LOCK_ALL_FROZEN_CLEARED in it's WAL record 5. pg_visibility detects corruption Interestingly, in an off-list conversation Melanie explained me how ALL_VISIBLE is protected from this: WAL-logging dependson PD_ALL_VISIBLE heap page bit, not a state of the VM. But for ALL_FROZEN this is not a case: /* Clear only the all-frozen bit on visibility map if needed */ if (PageIsAllVisible(page) && visibilitymap_clear(relation, block, vmbuffer, VISIBILITYMAP_ALL_FROZEN)) cleared_all_frozen = true; // this won't happen due to flushed VM buffer before a crash Anyway, the test reproduces corruption of both bits. And also reproduces selecting deleted data on standby. The test is not intended to be committed when we fix the problem, so some waits are simulated with sleep(1) and test is placedat modules/test_slru where it was easier to write. But if we ever want something like this - I can design a less hackyversion. And, probably, more generic. Thanks! Best regards, Andrey Borodin.
Вложения
В списке pgsql-hackers по дате отправления: