Обсуждение: Streamify more code paths

Поиск
Список
Период
Сортировка

Streamify more code paths

От
Xuneng Zhou
Дата:
Hi Hackers,

I noticed several additional paths in contrib modules, beyond [1],
that are potentially suitable for streamification:

1) pgstattuple — pgstatapprox.c and parts of pgstattuple_approx_internal
2) Bloom — scan paths in blgetbitmap() and maintenance paths in blbulkdelete()

The following patches streamify those code paths. No benchmarks have
been run yet.

[1] https://www.postgresql.org/message-id/flat/CABPTF7UeN2o-trr9r7K76rZExnO2M4SLfvTfbUY2CwQjCekgnQ%40mail.gmail.com

Feedbacks welcome.

--
Best,
Xuneng

Вложения

Re: Streamify more code paths

От
Xuneng Zhou
Дата:
Hi,

On Thu, Dec 25, 2025 at 1:51 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi Hackers,
>
> I noticed several additional paths in contrib modules, beyond [1],
> that are potentially suitable for streamification:
>
> 1) pgstattuple — pgstatapprox.c and parts of pgstattuple_approx_internal
> 2) Bloom — scan paths in blgetbitmap() and maintenance paths in blbulkdelete()
>
> The following patches streamify those code paths. No benchmarks have
> been run yet.
>
> [1] https://www.postgresql.org/message-id/flat/CABPTF7UeN2o-trr9r7K76rZExnO2M4SLfvTfbUY2CwQjCekgnQ%40mail.gmail.com
>
> Feedbacks welcome.
>

One more in ginvacuumcleanup().

--
Best,
Xuneng

Вложения

Re: Streamify more code paths

От
Nazir Bilal Yavuz
Дата:
Hi,

Thank you for working on this!

On Thu, 25 Dec 2025 at 09:34, Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> On Thu, Dec 25, 2025 at 1:51 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi Hackers,
> >
> > I noticed several additional paths in contrib modules, beyond [1],
> > that are potentially suitable for streamification:
> >
> > 1) pgstattuple — pgstatapprox.c and parts of pgstattuple_approx_internal
> > 2) Bloom — scan paths in blgetbitmap() and maintenance paths in blbulkdelete()
> >
> > The following patches streamify those code paths. No benchmarks have
> > been run yet.
> >
> > [1] https://www.postgresql.org/message-id/flat/CABPTF7UeN2o-trr9r7K76rZExnO2M4SLfvTfbUY2CwQjCekgnQ%40mail.gmail.com
> >
> > Feedbacks welcome.
> >
>
> One more in ginvacuumcleanup().

0001, 0002 and 0004 LGTM.

0003:

+        buf = read_stream_next_buffer(stream, NULL);
+        if (buf == InvalidBuffer)
+            break;

I think we are loosening the check here. We were sure that there were
no InvalidBuffers until the nblocks. Streamified version does not have
this check, it exits from the loop the first time it sees an
InvalidBuffer, which may be wrong. You might want to add
'Assert(p.current_blocknum == nblocks);' before read_stream_end() to
have a similar check.

--
Regards,
Nazir Bilal Yavuz
Microsoft



Re: Streamify more code paths

От
Xuneng Zhou
Дата:
Hi Bilal,

Thanks for your review!

On Fri, Dec 26, 2025 at 6:59 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
>
> Hi,
>
> Thank you for working on this!
>
> On Thu, 25 Dec 2025 at 09:34, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi,
> >
> > On Thu, Dec 25, 2025 at 1:51 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > >
> > > Hi Hackers,
> > >
> > > I noticed several additional paths in contrib modules, beyond [1],
> > > that are potentially suitable for streamification:
> > >
> > > 1) pgstattuple — pgstatapprox.c and parts of pgstattuple_approx_internal
> > > 2) Bloom — scan paths in blgetbitmap() and maintenance paths in blbulkdelete()
> > >
> > > The following patches streamify those code paths. No benchmarks have
> > > been run yet.
> > >
> > > [1]
https://www.postgresql.org/message-id/flat/CABPTF7UeN2o-trr9r7K76rZExnO2M4SLfvTfbUY2CwQjCekgnQ%40mail.gmail.com
> > >
> > > Feedbacks welcome.
> > >
> >
> > One more in ginvacuumcleanup().
>
> 0001, 0002 and 0004 LGTM.
>
> 0003:
>
> +        buf = read_stream_next_buffer(stream, NULL);
> +        if (buf == InvalidBuffer)
> +            break;
>
> I think we are loosening the check here. We were sure that there were
> no InvalidBuffers until the nblocks. Streamified version does not have
> this check, it exits from the loop the first time it sees an
> InvalidBuffer, which may be wrong. You might want to add
> 'Assert(p.current_blocknum == nblocks);' before read_stream_end() to
> have a similar check.
>

Agree. The check has been added in v2 per your suggestion.

--
Best,
Xuneng

Вложения

Re: Streamify more code paths

От
Xuneng Zhou
Дата:
Hi,

On Sat, Dec 27, 2025 at 12:41 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi Bilal,
>
> Thanks for your review!
>
> On Fri, Dec 26, 2025 at 6:59 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
> >
> > Hi,
> >
> > Thank you for working on this!
> >
> > On Thu, 25 Dec 2025 at 09:34, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > On Thu, Dec 25, 2025 at 1:51 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > >
> > > > Hi Hackers,
> > > >
> > > > I noticed several additional paths in contrib modules, beyond [1],
> > > > that are potentially suitable for streamification:
> > > >
> > > > 1) pgstattuple — pgstatapprox.c and parts of pgstattuple_approx_internal
> > > > 2) Bloom — scan paths in blgetbitmap() and maintenance paths in blbulkdelete()
> > > >
> > > > The following patches streamify those code paths. No benchmarks have
> > > > been run yet.
> > > >
> > > > [1]
https://www.postgresql.org/message-id/flat/CABPTF7UeN2o-trr9r7K76rZExnO2M4SLfvTfbUY2CwQjCekgnQ%40mail.gmail.com
> > > >
> > > > Feedbacks welcome.
> > > >
> > >
> > > One more in ginvacuumcleanup().
> >
> > 0001, 0002 and 0004 LGTM.
> >
> > 0003:
> >
> > +        buf = read_stream_next_buffer(stream, NULL);
> > +        if (buf == InvalidBuffer)
> > +            break;
> >
> > I think we are loosening the check here. We were sure that there were
> > no InvalidBuffers until the nblocks. Streamified version does not have
> > this check, it exits from the loop the first time it sees an
> > InvalidBuffer, which may be wrong. You might want to add
> > 'Assert(p.current_blocknum == nblocks);' before read_stream_end() to
> > have a similar check.
> >
>
> Agree. The check has been added in v2 per your suggestion.
>

Two more to go:
patch 5: Streamify log_newpage_range() WAL logging path
patch 6: Streamify hash index VACUUM primary bucket page reads

Benchmarks will be conducted soon.


--
Best,
Xuneng

Вложения

Re: Streamify more code paths

От
Xuneng Zhou
Дата:
Hi,

On Sun, Dec 28, 2025 at 7:41 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> On Sat, Dec 27, 2025 at 12:41 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi Bilal,
> >
> > Thanks for your review!
> >
> > On Fri, Dec 26, 2025 at 6:59 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > Thank you for working on this!
> > >
> > > On Thu, 25 Dec 2025 at 09:34, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Thu, Dec 25, 2025 at 1:51 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > > >
> > > > > Hi Hackers,
> > > > >
> > > > > I noticed several additional paths in contrib modules, beyond [1],
> > > > > that are potentially suitable for streamification:
> > > > >
> > > > > 1) pgstattuple — pgstatapprox.c and parts of pgstattuple_approx_internal
> > > > > 2) Bloom — scan paths in blgetbitmap() and maintenance paths in blbulkdelete()
> > > > >
> > > > > The following patches streamify those code paths. No benchmarks have
> > > > > been run yet.
> > > > >
> > > > > [1]
https://www.postgresql.org/message-id/flat/CABPTF7UeN2o-trr9r7K76rZExnO2M4SLfvTfbUY2CwQjCekgnQ%40mail.gmail.com
> > > > >
> > > > > Feedbacks welcome.
> > > > >
> > > >
> > > > One more in ginvacuumcleanup().
> > >
> > > 0001, 0002 and 0004 LGTM.
> > >
> > > 0003:
> > >
> > > +        buf = read_stream_next_buffer(stream, NULL);
> > > +        if (buf == InvalidBuffer)
> > > +            break;
> > >
> > > I think we are loosening the check here. We were sure that there were
> > > no InvalidBuffers until the nblocks. Streamified version does not have
> > > this check, it exits from the loop the first time it sees an
> > > InvalidBuffer, which may be wrong. You might want to add
> > > 'Assert(p.current_blocknum == nblocks);' before read_stream_end() to
> > > have a similar check.
> > >
> >
> > Agree. The check has been added in v2 per your suggestion.
> >
>
> Two more to go:
> patch 5: Streamify log_newpage_range() WAL logging path
> patch 6: Streamify hash index VACUUM primary bucket page reads
>
> Benchmarks will be conducted soon.
>

v6 in the last message has a problem and has not been updated. Attach
the right one again. Sorry for the noise.

--
Best,
Xuneng

Вложения

Re: Streamify more code paths

От
Nazir Bilal Yavuz
Дата:
Hi,

On Sun, 28 Dec 2025 at 14:46, Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
> >
> > Two more to go:
> > patch 5: Streamify log_newpage_range() WAL logging path
> > patch 6: Streamify hash index VACUUM primary bucket page reads
> >
> > Benchmarks will be conducted soon.
> >
>
> v6 in the last message has a problem and has not been updated. Attach
> the right one again. Sorry for the noise.

0003 and 0006:

You need to add 'StatApproxReadStreamPrivate' and
'HashBulkDeleteStreamPrivate' to the typedefs.list.

0005:

@@ -1321,8 +1341,10 @@ log_newpage_range(Relation rel, ForkNumber forknum,
         nbufs = 0;
         while (nbufs < XLR_MAX_BLOCK_ID && blkno < endblk)
         {
-            Buffer        buf = ReadBufferExtended(rel, forknum, blkno,
-                                                 RBM_NORMAL, NULL);
+            Buffer        buf = read_stream_next_buffer(stream, NULL);
+
+            if (!BufferIsValid(buf))
+                break;

We are loosening a check here, there should not be a invalid buffer in
the stream until the endblk. I think you can remove this
BufferIsValid() check, then we can learn if something goes wrong.

0006:

You can use read_stream_reset() instead of read_stream_end(), then you
can use the same stream with different variables, I believe this is
the preferred way.

Rest LGTM!

-- 
Regards,
Nazir Bilal Yavuz
Microsoft



Re: Streamify more code paths

От
Xuneng Zhou
Дата:
Hi,

Thanks for looking into this.

On Mon, Dec 29, 2025 at 6:58 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
>
> Hi,
>
> On Sun, 28 Dec 2025 at 14:46, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi,
> > >
> > > Two more to go:
> > > patch 5: Streamify log_newpage_range() WAL logging path
> > > patch 6: Streamify hash index VACUUM primary bucket page reads
> > >
> > > Benchmarks will be conducted soon.
> > >
> >
> > v6 in the last message has a problem and has not been updated. Attach
> > the right one again. Sorry for the noise.
>
> 0003 and 0006:
>
> You need to add 'StatApproxReadStreamPrivate' and
> 'HashBulkDeleteStreamPrivate' to the typedefs.list.

Done.

> 0005:
>
> @@ -1321,8 +1341,10 @@ log_newpage_range(Relation rel, ForkNumber forknum,
>          nbufs = 0;
>          while (nbufs < XLR_MAX_BLOCK_ID && blkno < endblk)
>          {
> -            Buffer        buf = ReadBufferExtended(rel, forknum, blkno,
> -                                                 RBM_NORMAL, NULL);
> +            Buffer        buf = read_stream_next_buffer(stream, NULL);
> +
> +            if (!BufferIsValid(buf))
> +                break;
>
> We are loosening a check here, there should not be a invalid buffer in
> the stream until the endblk. I think you can remove this
> BufferIsValid() check, then we can learn if something goes wrong.

My concern before for not adding assert at the end of streaming is the
potential early break in here:

/* Nothing more to do if all remaining blocks were empty. */
if (nbufs == 0)
    break;

After looking more closely, it turns out to be a misunderstanding of the logic.

> 0006:
>
> You can use read_stream_reset() instead of read_stream_end(), then you
> can use the same stream with different variables, I believe this is
> the preferred way.
>
> Rest LGTM!
>

Yeah, reset seems a more proper way here.

--
Best,
Xuneng

Вложения

Re: Streamify more code paths

От
Xuneng Zhou
Дата:
Hi,

On Tue, Dec 30, 2025 at 9:51 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> Thanks for looking into this.
>
> On Mon, Dec 29, 2025 at 6:58 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
> >
> > Hi,
> >
> > On Sun, 28 Dec 2025 at 14:46, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > >
> > > Hi,
> > > >
> > > > Two more to go:
> > > > patch 5: Streamify log_newpage_range() WAL logging path
> > > > patch 6: Streamify hash index VACUUM primary bucket page reads
> > > >
> > > > Benchmarks will be conducted soon.
> > > >
> > >
> > > v6 in the last message has a problem and has not been updated. Attach
> > > the right one again. Sorry for the noise.
> >
> > 0003 and 0006:
> >
> > You need to add 'StatApproxReadStreamPrivate' and
> > 'HashBulkDeleteStreamPrivate' to the typedefs.list.
>
> Done.
>
> > 0005:
> >
> > @@ -1321,8 +1341,10 @@ log_newpage_range(Relation rel, ForkNumber forknum,
> >          nbufs = 0;
> >          while (nbufs < XLR_MAX_BLOCK_ID && blkno < endblk)
> >          {
> > -            Buffer        buf = ReadBufferExtended(rel, forknum, blkno,
> > -                                                 RBM_NORMAL, NULL);
> > +            Buffer        buf = read_stream_next_buffer(stream, NULL);
> > +
> > +            if (!BufferIsValid(buf))
> > +                break;
> >
> > We are loosening a check here, there should not be a invalid buffer in
> > the stream until the endblk. I think you can remove this
> > BufferIsValid() check, then we can learn if something goes wrong.
>
> My concern before for not adding assert at the end of streaming is the
> potential early break in here:
>
> /* Nothing more to do if all remaining blocks were empty. */
> if (nbufs == 0)
>     break;
>
> After looking more closely, it turns out to be a misunderstanding of the logic.
>
> > 0006:
> >
> > You can use read_stream_reset() instead of read_stream_end(), then you
> > can use the same stream with different variables, I believe this is
> > the preferred way.
> >
> > Rest LGTM!
> >
>
> Yeah, reset seems a more proper way here.
>

Run pgindent using the updated typedefs.list.

--
Best,
Xuneng

Вложения