Lack of PageSetLSN in heap_xlog_visible
От | Konstantin Knizhnik |
---|---|
Тема | Lack of PageSetLSN in heap_xlog_visible |
Дата | |
Msg-id | fed17dac-8cb8-4f5b-d462-1bb4908c029e@garret.ru обсуждение исходный текст |
Ответы |
Re: Lack of PageSetLSN in heap_xlog_visible
|
Список | pgsql-hackers |
Hi hackers! heap_xlog_visible is not bumping heap page LSN when setting all-visible flag in it. There is long comment explaining it: /* * We don't bump the LSN of the heap page when setting the visibility * map bit (unless checksums or wal_hint_bits is enabled, in which * case we must), because that would generate an unworkable volume of * full-page writes. This exposes us to torn page hazards, but since * we're not inspecting the existing page contents in any way, we * don't care. * * However, all operations that clear the visibility map bit *do* bump * the LSN, and those operations will only be replayed if the XLOG LSN * follows the page LSN. Thus, if the page LSN has advanced past our * XLOG record's LSN, we mustn't mark the page all-visible, because * the subsequent update won't be replayed to clear the flag. */ But it still not clear for me that not bumping LSN in this place is correct if wal_log_hints is set. In this case we will have VM page with larger LSN than heap page, because visibilitymap_set bumps LSN of VM page. It means that in theory after recovery we may have page marked as all-visible in VM, but not having PD_ALL_VISIBLE in page header. And it violates VM constraint: * When we *set* a visibility map during VACUUM, we must write WAL. This may * seem counterintuitive, since the bit is basically a hint: if it is clear, * it may still be the case that every tuple on the page is visible to all * transactions; we just don't know that for certain. The difficulty is that * there are two bits which are typically set together: the PD_ALL_VISIBLE bit * on the page itself, and the visibility map bit. If a crash occurs after the * visibility map page makes it to disk and before the updated heap page makes * it to disk, redo must set the bit on the heap page. Otherwise, the next * insert, update, or delete on the heap page will fail to realize that the * visibility map bit must be cleared, possibly causing index-only scans to * return wrong answers.
В списке pgsql-hackers по дате отправления: