Обсуждение: Re: Fix some inconsistencies with open-coded visibilitymap_set() callers
On Thu, Jun 26, 2025 at 3:56 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > Here is a rebased version of this (it had some conflicts with recent commits). One general complaint is that the commit message complains about the status quo but isn't real clear about what the patch is actually fixing. I'm pretty concerned about this change: /* * If the page is all visible, need to clear that, unless we're only * going to add further frozen rows to it. * * If we're only adding already frozen rows to a previously empty * page, mark it as all-visible. */ if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN)) { all_visible_cleared = true; PageClearAllVisible(page); visibilitymap_clear(relation, BufferGetBlockNumber(buffer), vmbuffer, VISIBILITYMAP_VALID_BITS); } - else if (all_frozen_set) - PageSetAllVisible(page); One problem is that this falsifies the comment -- the second sentence in the comment refers to code that the patch deletes. But also, the all_frozen_set flag is used a bit further down when setting xlrec->flags, and now we'll set XLH_INSERT_ALL_FROZEN_SET when in fact we didn't set the all-frozen bit. It seems like the intention here was that the existing XLOG_HEAP2_MULTI_INSERT record should cover setting the all-frozen bit, if we do that, and it seems like you're changing that somehow, but it's not clear to me why we'd want to change that, and even if we do, and it doesn't appear to me that you've chased down all the loose ends i.e. if that's the plan, then I'd expect the redo routine to be modified and XLH_INSERT_ALL_FROZEN_SET to be deleted. I think this might actually be an underlying design problem with the patch -- heap_page_set_vm_and_log() seems to want to be in charge of both what we do with the heap page and what we do with the corresponding VM bit, but that runs contrary to the caller's need to bundle the VM bit changes with some WAL record that is doing other things to the heap page. heap_page_set_vm_and_log() says that it shouldn't be called during recovery but doesn't assert that it isn't. - /* - * It should never be the case that the visibility map page is set - * while the page-level bit is clear, but the reverse is allowed (if - * checksums are not enabled). Regardless, set both bits so that we - * get back in sync. - * - * NB: If the heap page is all-visible but the VM bit is not set, we - * don't need to dirty the heap page. However, if checksums are - * enabled, we do need to make sure that the heap page is dirtied - * before passing it to visibilitymap_set(), because it may be logged. - * Given that this situation should only happen in rare cases after a - * crash, it is not worth optimizing. - */ - PageSetAllVisible(page); - MarkBufferDirty(buf); - old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf, - InvalidXLogRecPtr, - vmbuffer, presult.vm_conflict_horizon, - flags); + old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf, + vmbuffer, presult.vm_conflict_horizon, + flags); Why are we deleting a huge comment here explaining the intent of the code when, AFAICS, the intent of the new code is the same? Even if the intent of the new code isn't the same, why no comment? - /* - * If data checksums are enabled (or wal_log_hints=on), we - * need to protect the heap page from being torn. - * - * If not, then we must *not* update the heap page's LSN. In - * this case, the FPI for the heap page was omitted from the - * WAL record inserted above, so it would be incorrect to - * update the heap page's LSN. - */ - if (XLogHintBitIsNeeded()) - { - Page heapPage = BufferGetPage(heapBuf); - - PageSetLSN(heapPage, recptr); - } This is another lengthy explanatory comment that goes away with this patch, but this time the code goes, too. But why? Presumably, either (1) this change is wrong, or (2) the preexisting code is wrong, or (3) the patch is trying to change the contract between visibilitymap_set() and its callers. If it's (3), then I think there should be more explanation of that contract change. It is unclear to me why you've rearranged the header comment for visibilitymap_set() in the way that you have: some of the material that used to be at the top is now at the bottom, but it's if anything a little less extensive than before, and certainly not enough for me to understand why or whether this change is correct. + /* + * We must never end up with the VM bit set and the page-level + * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page + * modification would fail to clear the VM bit. Though it is possible for + * the page-level bit to be set and the VM bit to be clear if checksums + * and wal_log_hints are not enabled. + */ The last sentence is not great English and maybe not that relevant, either. I would suggest deleting the whole thing or deleting the word "Though" and putting the rest of the sentence in parentheses. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Jun 30, 2025 at 3:01 PM Robert Haas <robertmhaas@gmail.com> wrote: > > I'm pretty concerned about this change: > > /* > * If the page is all visible, need to clear that, unless we're only > * going to add further frozen rows to it. > * > * If we're only adding already frozen rows to a previously empty > * page, mark it as all-visible. > */ > if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN)) > { > all_visible_cleared = true; > PageClearAllVisible(page); > visibilitymap_clear(relation, > BufferGetBlockNumber(buffer), > vmbuffer, VISIBILITYMAP_VALID_BITS); > } > - else if (all_frozen_set) > - PageSetAllVisible(page); > > One problem is that this falsifies the comment -- the second sentence > in the comment refers to code that the patch deletes. Yep, I should have updated that comment. > But also, the > all_frozen_set flag is used a bit further down when setting > xlrec->flags, and now we'll set XLH_INSERT_ALL_FROZEN_SET when in fact > we didn't set the all-frozen bit. It seems like the intention here was > that the existing XLOG_HEAP2_MULTI_INSERT record should cover setting > the all-frozen bit, if we do that, and it seems like you're changing > that somehow, but it's not clear to me why we'd want to change that, > and even if we do, and it doesn't appear to me that you've chased down > all the loose ends i.e. if that's the plan, then I'd expect the redo > routine to be modified and XLH_INSERT_ALL_FROZEN_SET to be deleted. Right, yes, I accidentally forgot to remove setting PD_ALL_VISIBLE from heap_xlog_multi_insert(). That was not great. As for why I changed the responsibility, I was trying to make COPY FREEZE more consistent with all of the other places we set the VM and mark PD_ALL_VISIBLE. In other places in master, we make the changes to the heap page that render the page all-visible and then separately set PD_ALL_VISIBLE and emit an xl_heap_visible record. Ironically, my larger patch set [1] seeks to eliminate xl_heap_visible and add the changes to the VM to the WAL records that make the changes rendering the page all-visible. I think you are right that the change I made in the patch in this thread is not an improvement to COPY FREEZE. > I think this might actually be an underlying design problem with the > patch -- heap_page_set_vm_and_log() seems to want to be in charge of > both what we do with the heap page and what we do with the > corresponding VM bit, but that runs contrary to the caller's need to > bundle the VM bit changes with some WAL record that is doing other > things to the heap page. The way it is currently, visibilitymap_set() sets the heap page LSN but it doesn't actually set PD_ALL_VISIBLE or mark the heap buffer dirty. There is no way to tell visibilitymap_set() whether or not you modified the heap page -- it just assumes you did. That separation of duties didn't make sense to me. In fact, even if you make visibilitymap_set() aware that you haven't modified the heap page and then don't set the heap page LSN, we'll still register the heap buffer when emitting the xl_heap_visible record and then end up modifying it when replaying that record. > heap_page_set_vm_and_log() says that it shouldn't be called during > recovery but doesn't assert that it isn't. Ah, good point. So, I read and started addressing your other inline code review, but after thinking more about your comment about heap_page_set_vm_and_log() having a design problem, I went back to the drawing board and tried to figure out 1) which of the issues I point out are actually bugs and 2) how I can do my larger patch set without these wrappers. I solved 2, but that's a story for another thread. As for 1, so, I went through all of these cases and tried to figure out what I thought was actually a bug and should be fixed. The only one I think is a bug is the case in lazy_scan_prune() where the page was already set all-visible but not already set all-frozen. The code is as follows: else if (all_visible_according_to_vm && presult.all_visible && presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer)) { ... if (!PageIsAllVisible(page)) { PageSetAllVisible(page); MarkBufferDirty(buf); } ... old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr, vmbuffer, InvalidTransactionId, VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN); In that case, when checksums or wal_log_hints are on, if PD_ALL_VISIBLE is already set and we didn't otherwise dirty the page during heap_page_prune_and_freeze(), visibilitymap_set() will stamp the clean heap page with the LSN of the xl_heap_visible record. I spent quite a bit of time trying to think of what could happen if we advance the LSN of an otherwise clean page and then don't mark it dirty to reflect having made that change. And, honestly, I couldn't come up with something. If someone evicts the page or we crash, we'll lose the LSN, but that seems like it can't hurt anything if nothing else on the page has changed. But, I may be insufficiently creative. I know we are definitely not supposed to advance the LSN of clean pages probably at all and even more so without marking them dirty. One thing we could do is check if the heap buffer is dirty before setting the LSN in visibilitymap_set(): diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c index 745a04ef26e..a4e2e454e1f 100644 --- a/src/backend/access/heap/visibilitymap.c +++ b/src/backend/access/heap/visibilitymap.c @@ -301,7 +301,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf, * WAL record inserted above, so it would be incorrect to * update the heap page's LSN. */ - if (XLogHintBitIsNeeded()) + if (XLogHintBitIsNeeded() && BufferIsDirty(heapBuf)) { Page heapPage = BufferGetPage(heapBuf); Something about this doesn't feel altogether right, though, but I'm not sure why. As for the not bug cases: For all the cases where we modify the heap and VM page not in the same critical section, we could error out after making the heap page changes before setting the VM. But because this would just lead to PD_ALL_VISIBLE being set and the VM bit not being set, which is technically fine. It seems like it would be better to make all of the changes in the same critical section, and, in some of the cases, it wouldn't be hard to do so. But, since I cannot claim a correctness issue, we can leave it the way it is. In lazy_scan_new_or_empty(), it would be better to PD_ALL_VISIBLE before marking the buffer dirty, but it doesn't really matter because no one else could write the buffer out since we have an exclusive lock And in heap_multi_insert() for the COPY FREEZE case, in recovery, we set PD_ALL_VISIBLE and mark the buffer dirty when replaying both the xl_heap_multi_insert record and when replaying the xl_heap_visible record. This involves taking and releasing the content lock on the heap buffer page twice to make redundant changes. Again, not a correctness issue, but I think it makes much more sense to include the VM changes in the xl_heap_multi_insert record. So, in summary, do you think we should do something about the lazy_scan_prune() -> visibilitymap_set() case where we advance the LSN of a clean buffer and don't mark it dirty? - Melanie [1] https://www.postgresql.org/message-id/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
On Wed, Jul 2, 2025 at 7:33 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > > I think this might actually be an underlying design problem with the > > patch -- heap_page_set_vm_and_log() seems to want to be in charge of > > both what we do with the heap page and what we do with the > > corresponding VM bit, but that runs contrary to the caller's need to > > bundle the VM bit changes with some WAL record that is doing other > > things to the heap page. > > The way it is currently, visibilitymap_set() sets the heap page LSN > but it doesn't actually set PD_ALL_VISIBLE or mark the heap buffer > dirty. There is no way to tell visibilitymap_set() whether or not you > modified the heap page -- it just assumes you did. That separation of > duties didn't make sense to me. It doesn't make sense to me, either. I have a strong feeling that visibilitymap_set() isn't well-scoped, but I don't know whether it needs to do less or more or just different stuff. I don't think it can be right for code that doesn't modify the page to set the page LSN; and I suspect it would be best to try to get modifying the page, marking the page dirty, and emitting WAL for the modification if required into a tight segment of code, at least in the ideal world where somebody else does the work and I just get to sit back and nod wisely. > In that case, when checksums or wal_log_hints are on, if > PD_ALL_VISIBLE is already set and we didn't otherwise dirty the page > during heap_page_prune_and_freeze(), visibilitymap_set() will stamp > the clean heap page with the LSN of the xl_heap_visible record. > > I spent quite a bit of time trying to think of what could happen if we > advance the LSN of an otherwise clean page and then don't mark it > dirty to reflect having made that change. And, honestly, I couldn't > come up with something. Either advancing the LSN of the heap page is important for some purpose -- like making crash recovery work properly -- and therefore the buffer needs to be marked dirty -- or it is not, and the LSN shouldn't be set in the first place when the page is otherwise unchanged. I agree with you that it's hard to see how anything goes wrong as a result of bumping the LSN on a clean page without dirtying it, but I think it's playing with fire. At the very least, making the code consistent and intelligible could help future readers of the code to avoid introducing bugs of their own. > One thing we could do is check if the heap buffer is dirty before > setting the LSN in visibilitymap_set(): I don't think this is the way. visibilitymap_set() shouldn't guess what the caller has done or will do; the caller should make it clear explicitly. > As for the not bug cases: > > For all the cases where we modify the heap and VM page not in the same > critical section, we could error out after making the heap page > changes before setting the VM. But because this would just lead to > PD_ALL_VISIBLE being set and the VM bit not being set, which is > technically fine. > > It seems like it would be better to make all of the changes in the > same critical section, and, in some of the cases, it wouldn't be hard > to do so. But, since I cannot claim a correctness issue, we can leave > it the way it is. If it's useful to combine things as a precursor to future work, that may be fair enough, but otherwise I don't see the point. It's not "technically fine;" it's just straight-up fine. It's not desirable, in the sense that we might end up having to do extra page reads later to correct the situation, but crashes won't intervene between those two critical sections often enough to matter. Unifying those critical sections won't change anything about what states code elsewhere in the tree must be prepared to see on disk. > In lazy_scan_new_or_empty(), it would be better to PD_ALL_VISIBLE > before marking the buffer dirty, but it doesn't really matter because > no one else could write the buffer out since we have an exclusive lock Again, I disagree that this "doesn't really matter"; I think it straight-up does not matter. > And in heap_multi_insert() for the COPY FREEZE case, in recovery, we > set PD_ALL_VISIBLE and mark the buffer dirty when replaying both the > xl_heap_multi_insert record and when replaying the xl_heap_visible > record. This involves taking and releasing the content lock on the > heap buffer page twice to make redundant changes. Again, not a > correctness issue, but I think it makes much more sense to include the > VM changes in the xl_heap_multi_insert record. Without looking at the code, if you're saying we could go from 2 WAL records down to 1, that would improve performance, which would be worthwhile, but not a bug fix. > So, in summary, do you think we should do something about the > lazy_scan_prune() -> visibilitymap_set() case where we advance the LSN > of a clean buffer and don't mark it dirty? Yes, I think that would be worth trying to fix. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Jul 7, 2025 at 11:38 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Jul 2, 2025 at 7:33 PM Melanie Plageman > <melanieplageman@gmail.com> wrote: > > > One thing we could do is check if the heap buffer is dirty before > > setting the LSN in visibilitymap_set(): > > I don't think this is the way. visibilitymap_set() shouldn't guess > what the caller has done or will do; the caller should make it clear > explicitly. A direct translation of this would be to add a boolean parameter to visibilitymap_set() like "heap_page_modified" or "set_heap_lsn". Is that along the lines you were thinking? > > For all the cases where we modify the heap and VM page not in the same > > critical section, we could error out after making the heap page > > changes before setting the VM. But because this would just lead to > > PD_ALL_VISIBLE being set and the VM bit not being set, which is > > technically fine. > > > > It seems like it would be better to make all of the changes in the > > same critical section, and, in some of the cases, it wouldn't be hard > > to do so. But, since I cannot claim a correctness issue, we can leave > > it the way it is. > > If it's useful to combine things as a precursor to future work, that > may be fair enough, but otherwise I don't see the point. It's not > "technically fine;" it's just straight-up fine. It's not desirable, in > the sense that we might end up having to do extra page reads later to > correct the situation, but crashes won't intervene between those two > critical sections often enough to matter. Unifying those critical > sections won't change anything about what states code elsewhere in the > tree must be prepared to see on disk. I think one argument for having it in the same critical section is the defensive coding perspective. Our rule is that you make changes to the heap page, mark it dirty, emit WAL, and then stamp it with that LSN all in the same critical section. In the case of PD_ALL_VISIBLE, it is okay to break this rule because it is okay to lose the PD_ALL_VISIBLE hint as long as the VM bit doesn't get set. But, future programmers could see this and make other modifications to the heap page in the same area we set PD_ALL_VISIBLE (the one outside of a critical section). Anyway, I'm now convinced that the way forward for this particular issue is to represent the VM changes in the same WAL record that has the heap page changes, so I won't further belabor the issue. - Melanie
On Thu, Jul 10, 2025 at 9:57 AM Melanie Plageman <melanieplageman@gmail.com> wrote: > A direct translation of this would be to add a boolean parameter to > visibilitymap_set() like "heap_page_modified" or "set_heap_lsn". Is > that along the lines you were thinking? Yeah, something like that. I haven't thought through the details. > > If it's useful to combine things as a precursor to future work, that > > may be fair enough, but otherwise I don't see the point. It's not > > "technically fine;" it's just straight-up fine. It's not desirable, in > > the sense that we might end up having to do extra page reads later to > > correct the situation, but crashes won't intervene between those two > > critical sections often enough to matter. Unifying those critical > > sections won't change anything about what states code elsewhere in the > > tree must be prepared to see on disk. > > I think one argument for having it in the same critical section is the > defensive coding perspective. Our rule is that you make changes to the > heap page, mark it dirty, emit WAL, and then stamp it with that LSN > all in the same critical section. I agree that we need to follow this rule. > In the case of PD_ALL_VISIBLE, it is okay to break this rule because > it is okay to lose the PD_ALL_VISIBLE hint as long as the VM bit > doesn't get set. But, future programmers could see this and make other > modifications to the heap page in the same area we set PD_ALL_VISIBLE > (the one outside of a critical section). But this is saying that PD_ALL_VISIBLE isn't really covered by the WAL record in question -- instead, it's a related hint. Or really semi-hint, since it's only conditionally OK to lose it. So the rule above doesn't necessarily fully apply to this situation. > Anyway, I'm now convinced that the way forward for this particular > issue is to represent the VM changes in the same WAL record that has > the heap page changes, so I won't further belabor the issue. Makes sense. -- Robert Haas EDB: http://www.enterprisedb.com