buffer locking fix for lazy_scan_heap
От | Robert Haas |
---|---|
Тема | buffer locking fix for lazy_scan_heap |
Дата | |
Msg-id | CA+TgmoajK42vjj-TZK64Vrxq9qf_cJXa9miZ1GNSteZx7VfXkg@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: buffer locking fix for lazy_scan_heap
|
Список | pgsql-hackers |
I discovered when researching the issue of index-only scans and Hot Standby that there's a bug (for which I'm responsible) in lazy_scan_heap[1]. Actually, the code has been like this for a long time, but I needed to change it when I did the crash-safe visibility map work, and I didn't. The problem is that lazy_scan_heap() releases the lock on the buffer it's whacking around before it sets the visibility map bit. Thus, it's possible for the page-level bit to be cleared by some other backend before the visibility map bit gets set. In previous releases that was arguably OK, since the worst thing that could happen is a postponement of vacuuming on that page until the next anti-wraparound cycle; but now that we have index-only scans, it can cause queries to return wrong answers. Attached is a patch to fix the problem, which rearranges things so that we set the bit in the visibility map before releasing the buffer lock. Similar work has already been done for inserts, updates, and deletes, which in previous releases would *clear* the visibility map bit after releasing the page lock, and no longer done. But the vacuum case, which *sets* the bit, was overlooked. Our convention in those related cases is that it's acceptable to hold the buffer lock while setting the visibility map bit and issuing the WAL log record, but there must be no possibility of an I/O to read in the visibility map page while the buffer lock is held. This turned out to be pretty simple because, as it turns out, lazy_scan_heap() is almost always holding a pin on the correct page anyway, so I only had to tweak things slightly to guarantee it. As a pleasant side effect, the new code is actually quite a bit simpler and more readable than the old code, at least IMHO. While I was at it, I made a couple of minor, related changes which I believe to be improvements. First, I arranged things so that, when we pause the first vacuum pass to do an index vac cycle, we release any buffer pin we're holding on the visibility map page. The fact that we haven't done that in the past is probably harmless, but I think there's something to be said for not holding onto buffer pins for long periods of time when it isn't really necessary. Second, I adjusted things so that we print a warning if the visibility map bit is set and the page-level bit is clear, since that should never happen in 9.2+. It is similar to the existing warning which catches the case where a page is marked all-visible despite containing dead tuples. Comments? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company http://archives.postgresql.org/pgsql-hackers/2012-04/msg00682.php
Вложения
В списке pgsql-hackers по дате отправления: