Обсуждение: Re: Fix some inconsistencies with open-coded visibilitymap_set() callers

Поиск

Список

Период

Сортировка

Re: Fix some inconsistencies with open-coded visibilitymap_set() callers

От

Robert Haas

Дата:

30 июня, 22:01:38

On Thu, Jun 26, 2025 at 3:56 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> Here is a rebased version of this (it had some conflicts with recent commits).

One general complaint is that the commit message complains about the
status quo but isn't real clear about what the patch is actually
fixing.

I'm pretty concerned about this change:

         /*
          * If the page is all visible, need to clear that, unless we're only
          * going to add further frozen rows to it.
          *
          * If we're only adding already frozen rows to a previously empty
          * page, mark it as all-visible.
          */
         if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
         {
             all_visible_cleared = true;
             PageClearAllVisible(page);
             visibilitymap_clear(relation,
                                 BufferGetBlockNumber(buffer),
                                 vmbuffer, VISIBILITYMAP_VALID_BITS);
         }
-        else if (all_frozen_set)
-            PageSetAllVisible(page);

One problem is that this falsifies the comment -- the second sentence
in the comment refers to code that the patch deletes. But also, the
all_frozen_set flag is used a bit further down when setting
xlrec->flags, and now we'll set XLH_INSERT_ALL_FROZEN_SET when in fact
we didn't set the all-frozen bit. It seems like the intention here was
that the existing XLOG_HEAP2_MULTI_INSERT record should cover setting
the all-frozen bit, if we do that, and it seems like you're changing
that somehow, but it's not clear to me why we'd want to change that,
and even if we do, and it doesn't appear to me that you've chased down
all the loose ends i.e. if that's the plan, then I'd expect the redo
routine to be modified and XLH_INSERT_ALL_FROZEN_SET to be deleted.

I think this might actually be an underlying design problem with the
patch -- heap_page_set_vm_and_log() seems to want to be in charge of
both what we do with the heap page and what we do with the
corresponding VM bit, but that runs contrary to the caller's need to
bundle the VM bit changes with some WAL record that is doing other
things to the heap page.

heap_page_set_vm_and_log() says that it shouldn't be called during
recovery but doesn't assert that it isn't.

-        /*
-         * It should never be the case that the visibility map page is set
-         * while the page-level bit is clear, but the reverse is allowed (if
-         * checksums are not enabled).  Regardless, set both bits so that we
-         * get back in sync.
-         *
-         * NB: If the heap page is all-visible but the VM bit is not set, we
-         * don't need to dirty the heap page.  However, if checksums are
-         * enabled, we do need to make sure that the heap page is dirtied
-         * before passing it to visibilitymap_set(), because it may be logged.
-         * Given that this situation should only happen in rare cases after a
-         * crash, it is not worth optimizing.
-         */
-        PageSetAllVisible(page);
-        MarkBufferDirty(buf);
-        old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-                                       InvalidXLogRecPtr,
-                                       vmbuffer, presult.vm_conflict_horizon,
-                                       flags);
+        old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
+                                              vmbuffer,
presult.vm_conflict_horizon,
+                                              flags);

Why are we deleting a huge comment here explaining the intent of the
code when, AFAICS, the intent of the new code is the same? Even if the
intent of the new code isn't the same, why no comment?

-                /*
-                 * If data checksums are enabled (or wal_log_hints=on), we
-                 * need to protect the heap page from being torn.
-                 *
-                 * If not, then we must *not* update the heap page's LSN. In
-                 * this case, the FPI for the heap page was omitted from the
-                 * WAL record inserted above, so it would be incorrect to
-                 * update the heap page's LSN.
-                 */
-                if (XLogHintBitIsNeeded())
-                {
-                    Page        heapPage = BufferGetPage(heapBuf);
-
-                    PageSetLSN(heapPage, recptr);
-                }

This is another lengthy explanatory comment that goes away with this
patch, but this time the code goes, too. But why? Presumably, either
(1) this change is wrong, or (2) the preexisting code is wrong, or (3)
the patch is trying to change the contract between visibilitymap_set()
and its callers. If it's (3), then I think there should be more
explanation of that contract change. It is unclear to me why you've
rearranged the header comment for visibilitymap_set() in the way that
you have: some of the material that used to be at the top is now at
the bottom, but it's if anything a little less extensive than before,
and certainly not enough for me to understand why or whether this
change is correct.

+    /*
+     * We must never end up with the VM bit set and the page-level
+     * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+     * modification would fail to clear the VM bit. Though it is possible for
+     * the page-level bit to be set and the VM bit to be clear if checksums
+     * and wal_log_hints are not enabled.
+     */

The last sentence is not great English and maybe not that relevant,
either. I would suggest deleting the whole thing or deleting the word
"Though" and putting the rest of the sentence in parentheses.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Fix some inconsistencies with open-coded visibilitymap_set() callers

От

Melanie Plageman

Дата:

03 июля, 02:32:57

On Mon, Jun 30, 2025 at 3:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> I'm pretty concerned about this change:
>
>          /*
>           * If the page is all visible, need to clear that, unless we're only
>           * going to add further frozen rows to it.
>           *
>           * If we're only adding already frozen rows to a previously empty
>           * page, mark it as all-visible.
>           */
>          if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
>          {
>              all_visible_cleared = true;
>              PageClearAllVisible(page);
>              visibilitymap_clear(relation,
>                                  BufferGetBlockNumber(buffer),
>                                  vmbuffer, VISIBILITYMAP_VALID_BITS);
>          }
> -        else if (all_frozen_set)
> -            PageSetAllVisible(page);
>
> One problem is that this falsifies the comment -- the second sentence
> in the comment refers to code that the patch deletes.

Yep, I should have updated that comment.

> But also, the
> all_frozen_set flag is used a bit further down when setting
> xlrec->flags, and now we'll set XLH_INSERT_ALL_FROZEN_SET when in fact
> we didn't set the all-frozen bit. It seems like the intention here was
> that the existing XLOG_HEAP2_MULTI_INSERT record should cover setting
> the all-frozen bit, if we do that, and it seems like you're changing
> that somehow, but it's not clear to me why we'd want to change that,
> and even if we do, and it doesn't appear to me that you've chased down
> all the loose ends i.e. if that's the plan, then I'd expect the redo
> routine to be modified and XLH_INSERT_ALL_FROZEN_SET to be deleted.

Right, yes, I accidentally forgot to remove setting PD_ALL_VISIBLE
from heap_xlog_multi_insert(). That was not great.

As for why I changed the responsibility, I was trying to make COPY
FREEZE more consistent with all of the other places we set the VM and
mark PD_ALL_VISIBLE. In other places in master, we make the changes to
the heap page that render the page all-visible and then separately set
PD_ALL_VISIBLE and emit an xl_heap_visible record.

Ironically, my larger patch set [1] seeks to eliminate xl_heap_visible
and add the changes to the VM to the WAL records that make the changes
rendering the page all-visible.

I think you are right that the change I made in the patch in this
thread is not an improvement to COPY FREEZE.

> I think this might actually be an underlying design problem with the
> patch -- heap_page_set_vm_and_log() seems to want to be in charge of
> both what we do with the heap page and what we do with the
> corresponding VM bit, but that runs contrary to the caller's need to
> bundle the VM bit changes with some WAL record that is doing other
> things to the heap page.

The way it is currently, visibilitymap_set() sets the heap page LSN
but it doesn't actually set PD_ALL_VISIBLE or mark the heap buffer
dirty. There is no way to tell visibilitymap_set() whether or not you
modified the heap page -- it just assumes you did. That separation of
duties didn't make sense to me.

In fact, even if you make visibilitymap_set() aware that you haven't
modified the heap page and then don't set the heap page LSN, we'll
still register the heap buffer when emitting the xl_heap_visible
record and then end up modifying it when replaying that record.

> heap_page_set_vm_and_log() says that it shouldn't be called during
> recovery but doesn't assert that it isn't.

Ah, good point.

So, I read and started addressing your other inline code review, but
after thinking more about your comment about
heap_page_set_vm_and_log() having a design problem, I went back to the
drawing board and tried to figure out 1) which of the issues I point
out are actually bugs and 2) how I can do my larger patch set without
these wrappers. I solved 2, but that's a story for another thread.

As for 1, so, I went through all of these cases and tried to figure
out what I thought was actually a bug and should be fixed. The only
one I think is a bug is the case in lazy_scan_prune() where the page
was already set all-visible but not already set all-frozen. The code
is as follows:

    else if (all_visible_according_to_vm && presult.all_visible &&
             presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno,
&vmbuffer))
    {
  ...
        if (!PageIsAllVisible(page))
        {
            PageSetAllVisible(page);
            MarkBufferDirty(buf);
        }

  ...
        old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
                                       InvalidXLogRecPtr,
                                       vmbuffer, InvalidTransactionId,
                                       VISIBILITYMAP_ALL_VISIBLE |
                                       VISIBILITYMAP_ALL_FROZEN);

In that case, when checksums or wal_log_hints are on, if
PD_ALL_VISIBLE is already set and we didn't otherwise dirty the page
during heap_page_prune_and_freeze(), visibilitymap_set() will stamp
the clean heap page with the LSN of the xl_heap_visible record.

I spent quite a bit of time trying to think of what could happen if we
advance the LSN of an otherwise clean page and then don't mark it
dirty to reflect having made that change. And, honestly, I couldn't
come up with something.

If someone evicts the page or we crash, we'll lose the LSN, but that
seems like it can't hurt anything if nothing else on the page has
changed.

But, I may be insufficiently creative. I know we are definitely not
supposed to advance the LSN of clean pages probably at all and even
more so without marking them dirty.

One thing we could do is check if the heap buffer is dirty before
setting the LSN in visibilitymap_set():

diff --git a/src/backend/access/heap/visibilitymap.c
b/src/backend/access/heap/visibilitymap.c
index 745a04ef26e..a4e2e454e1f 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -301,7 +301,7 @@ visibilitymap_set(Relation rel, BlockNumber
heapBlk, Buffer heapBuf,
                 * WAL record inserted above, so it would be incorrect to
                 * update the heap page's LSN.
                 */
-               if (XLogHintBitIsNeeded())
+               if (XLogHintBitIsNeeded() && BufferIsDirty(heapBuf))
                {
                    Page        heapPage = BufferGetPage(heapBuf);

Something about this doesn't feel altogether right, though, but I'm
not sure why.

As for the not bug cases:

For all the cases where we modify the heap and VM page not in the same
critical section, we could error out after making the heap page
changes before setting the VM. But because this would just lead to
PD_ALL_VISIBLE being set and the VM bit not being set, which is
technically fine.

It seems like it would be better to make all of the changes in the
same critical section, and, in some of the cases, it wouldn't be hard
to do so. But, since I cannot claim a correctness issue, we can leave
it the way it is.

In lazy_scan_new_or_empty(), it would be better to PD_ALL_VISIBLE
before marking the buffer dirty, but it doesn't really matter because
no one else could write the buffer out since we have an exclusive lock

And in heap_multi_insert() for the COPY FREEZE case, in recovery, we
set PD_ALL_VISIBLE and mark the buffer dirty when replaying both the
xl_heap_multi_insert record and when replaying the xl_heap_visible
record. This involves taking and releasing the content lock on the
heap buffer page twice to make redundant changes. Again, not a
correctness issue, but I think it makes much more sense to include the
VM changes in the xl_heap_multi_insert record.

So, in summary, do you think we should do something about the
lazy_scan_prune() -> visibilitymap_set() case where we advance the LSN
of a clean buffer and don't mark it dirty?

- Melanie

[1] https://www.postgresql.org/message-id/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com

Re: Fix some inconsistencies with open-coded visibilitymap_set() callers

От

Robert Haas

Дата:

07 июля, 18:37:59

On Wed, Jul 2, 2025 at 7:33 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> > I think this might actually be an underlying design problem with the
> > patch -- heap_page_set_vm_and_log() seems to want to be in charge of
> > both what we do with the heap page and what we do with the
> > corresponding VM bit, but that runs contrary to the caller's need to
> > bundle the VM bit changes with some WAL record that is doing other
> > things to the heap page.
>
> The way it is currently, visibilitymap_set() sets the heap page LSN
> but it doesn't actually set PD_ALL_VISIBLE or mark the heap buffer
> dirty. There is no way to tell visibilitymap_set() whether or not you
> modified the heap page -- it just assumes you did. That separation of
> duties didn't make sense to me.

It doesn't make sense to me, either. I have a strong feeling that
visibilitymap_set() isn't well-scoped, but I don't know whether it
needs to do less or more or just different stuff. I don't think it can
be right for code that doesn't modify the page to set the page LSN;
and I suspect it would be best to try to get modifying the page,
marking the page dirty, and emitting WAL for the modification if
required into a tight segment of code, at least in the ideal world
where somebody else does the work and I just get to sit back and nod
wisely.

> In that case, when checksums or wal_log_hints are on, if
> PD_ALL_VISIBLE is already set and we didn't otherwise dirty the page
> during heap_page_prune_and_freeze(), visibilitymap_set() will stamp
> the clean heap page with the LSN of the xl_heap_visible record.
>
> I spent quite a bit of time trying to think of what could happen if we
> advance the LSN of an otherwise clean page and then don't mark it
> dirty to reflect having made that change. And, honestly, I couldn't
> come up with something.

Either advancing the LSN of the heap page is important for some
purpose -- like making crash recovery work properly -- and therefore
the buffer needs to be marked dirty -- or it is not, and the LSN
shouldn't be set in the first place when the page is otherwise
unchanged. I agree with you that it's hard to see how anything goes
wrong as a result of bumping the LSN on a clean page without dirtying
it, but I think it's playing with fire. At the very least, making the
code consistent and intelligible could help future readers of the code
to avoid introducing bugs of their own.

> One thing we could do is check if the heap buffer is dirty before
> setting the LSN in visibilitymap_set():

I don't think this is the way. visibilitymap_set() shouldn't guess
what the caller has done or will do; the caller should make it clear
explicitly.

> As for the not bug cases:
>
> For all the cases where we modify the heap and VM page not in the same
> critical section, we could error out after making the heap page
> changes before setting the VM. But because this would just lead to
> PD_ALL_VISIBLE being set and the VM bit not being set, which is
> technically fine.
>
> It seems like it would be better to make all of the changes in the
> same critical section, and, in some of the cases, it wouldn't be hard
> to do so. But, since I cannot claim a correctness issue, we can leave
> it the way it is.

If it's useful to combine things as a precursor to future work, that
may be fair enough, but otherwise I don't see the point. It's not
"technically fine;" it's just straight-up fine. It's not desirable, in
the sense that we might end up having to do extra page reads later to
correct the situation, but crashes won't intervene between those two
critical sections often enough to matter. Unifying those critical
sections won't change anything about what states code elsewhere in the
tree must be prepared to see on disk.

> In lazy_scan_new_or_empty(), it would be better to PD_ALL_VISIBLE
> before marking the buffer dirty, but it doesn't really matter because
> no one else could write the buffer out since we have an exclusive lock

Again, I disagree that this "doesn't really matter"; I think it
straight-up does not matter.

> And in heap_multi_insert() for the COPY FREEZE case, in recovery, we
> set PD_ALL_VISIBLE and mark the buffer dirty when replaying both the
> xl_heap_multi_insert record and when replaying the xl_heap_visible
> record. This involves taking and releasing the content lock on the
> heap buffer page twice to make redundant changes. Again, not a
> correctness issue, but I think it makes much more sense to include the
> VM changes in the xl_heap_multi_insert record.

Without looking at the code, if you're saying we could go from 2 WAL
records down to 1, that would improve performance, which would be
worthwhile, but not a bug fix.

> So, in summary, do you think we should do something about the
> lazy_scan_prune() -> visibilitymap_set() case where we advance the LSN
> of a clean buffer and don't mark it dirty?

Yes, I think that would be worth trying to fix.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Fix some inconsistencies with open-coded visibilitymap_set() callers

От

Melanie Plageman

Дата:

10 июля, 16:57:05

On Mon, Jul 7, 2025 at 11:38 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Jul 2, 2025 at 7:33 PM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
>
> > One thing we could do is check if the heap buffer is dirty before
> > setting the LSN in visibilitymap_set():
>
> I don't think this is the way. visibilitymap_set() shouldn't guess
> what the caller has done or will do; the caller should make it clear
> explicitly.

A direct translation of this would be to add a boolean parameter to
visibilitymap_set() like "heap_page_modified" or "set_heap_lsn". Is
that along the lines you were thinking?

> > For all the cases where we modify the heap and VM page not in the same
> > critical section, we could error out after making the heap page
> > changes before setting the VM. But because this would just lead to
> > PD_ALL_VISIBLE being set and the VM bit not being set, which is
> > technically fine.
> >
> > It seems like it would be better to make all of the changes in the
> > same critical section, and, in some of the cases, it wouldn't be hard
> > to do so. But, since I cannot claim a correctness issue, we can leave
> > it the way it is.
>
> If it's useful to combine things as a precursor to future work, that
> may be fair enough, but otherwise I don't see the point. It's not
> "technically fine;" it's just straight-up fine. It's not desirable, in
> the sense that we might end up having to do extra page reads later to
> correct the situation, but crashes won't intervene between those two
> critical sections often enough to matter. Unifying those critical
> sections won't change anything about what states code elsewhere in the
> tree must be prepared to see on disk.

I think one argument for having it in the same critical section is the
defensive coding perspective. Our rule is that you make changes to the
heap page, mark it dirty, emit WAL, and then stamp it with that LSN
all in the same critical section.

In the case of PD_ALL_VISIBLE, it is okay to break this rule because
it is okay to lose the PD_ALL_VISIBLE hint as long as the VM bit
doesn't get set. But, future programmers could see this and make other
modifications to the heap page in the same area we set PD_ALL_VISIBLE
(the one outside of a critical section).

Anyway, I'm now convinced that the way forward for this particular
issue is to represent the VM changes in the same WAL record that has
the heap page changes, so I won't further belabor the issue.

- Melanie

Re: Fix some inconsistencies with open-coded visibilitymap_set() callers

От

Robert Haas

Дата:

10 июля, 17:35:48

On Thu, Jul 10, 2025 at 9:57 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> A direct translation of this would be to add a boolean parameter to
> visibilitymap_set() like "heap_page_modified" or "set_heap_lsn". Is
> that along the lines you were thinking?

Yeah, something like that. I haven't thought through the details.

> > If it's useful to combine things as a precursor to future work, that
> > may be fair enough, but otherwise I don't see the point. It's not
> > "technically fine;" it's just straight-up fine. It's not desirable, in
> > the sense that we might end up having to do extra page reads later to
> > correct the situation, but crashes won't intervene between those two
> > critical sections often enough to matter. Unifying those critical
> > sections won't change anything about what states code elsewhere in the
> > tree must be prepared to see on disk.
>
> I think one argument for having it in the same critical section is the
> defensive coding perspective. Our rule is that you make changes to the
> heap page, mark it dirty, emit WAL, and then stamp it with that LSN
> all in the same critical section.

I agree that we need to follow this rule.

> In the case of PD_ALL_VISIBLE, it is okay to break this rule because
> it is okay to lose the PD_ALL_VISIBLE hint as long as the VM bit
> doesn't get set. But, future programmers could see this and make other
> modifications to the heap page in the same area we set PD_ALL_VISIBLE
> (the one outside of a critical section).

But this is saying that PD_ALL_VISIBLE isn't really covered by the WAL
record in question -- instead, it's a related hint. Or really
semi-hint, since it's only conditionally OK to lose it. So the rule
above doesn't necessarily fully apply to this situation.

> Anyway, I'm now convinced that the way forward for this particular
> issue is to represent the VM changes in the same WAL record that has
> the heap page changes, so I won't further belabor the issue.

Makes sense.

--
Robert Haas
EDB: http://www.enterprisedb.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Re: Fix some inconsistencies with open-coded visibilitymap_set() callers

Re: Fix some inconsistencies with open-coded visibilitymap_set() callers

Re: Fix some inconsistencies with open-coded visibilitymap_set() callers

Re: Fix some inconsistencies with open-coded visibilitymap_set() callers

Re: Fix some inconsistencies with open-coded visibilitymap_set() callers

Re: Fix some inconsistencies with open-coded visibilitymap_set() callers