Обсуждение: Opportunistically pruning page before update

Поиск
Список
Период
Сортировка

Opportunistically pruning page before update

От
James Coleman
Дата:
Hello,

While at PGCon I was chatting with Andres (and I think Peter G. and a
few others who I can't remember at the moment, apologies) and Andres
noted that while we opportunistically prune a page when inserting a
tuple (before deciding we need a new page) we don't do the same for
updates.

Attached is a patch series to do the following:

0001: Make it possible to call heap_page_prune_opt already holding an
exclusive lock on the buffer.
0002: Opportunistically prune pages on update when the current tuple's
page has no free space. If this frees up enough space, then we
continue to put the new tuple on that page; if not, then we take the
existing code path and get a new page.

One would plausibly expect the following improvements:
- Reduced table bloat
- Increased HOT update rate
- Improved performance on updates

I started to work on benchmarking this, but haven't had time to devote
properly to that, so I'm wondering if there's anyone who might be
interested in collaborating on that part.

Other TODOs:
- Audit other callers of RelationSetTargetBlock() to ensure they don't
hold pointers into the page.

Regards,
James Coleman

Вложения

Re: Opportunistically pruning page before update

От
Melanie Plageman
Дата:
On Wed, Jun 21, 2023 at 8:51 AM James Coleman <jtc331@gmail.com> wrote:
> While at PGCon I was chatting with Andres (and I think Peter G. and a
> few others who I can't remember at the moment, apologies) and Andres
> noted that while we opportunistically prune a page when inserting a
> tuple (before deciding we need a new page) we don't do the same for
> updates.
>
> Attached is a patch series to do the following:
>
> 0001: Make it possible to call heap_page_prune_opt already holding an
> exclusive lock on the buffer.
> 0002: Opportunistically prune pages on update when the current tuple's
> page has no free space. If this frees up enough space, then we
> continue to put the new tuple on that page; if not, then we take the
> existing code path and get a new page.

I've reviewed these patches and have questions.

Under what conditions would this be exercised for UPDATE? Could you
provide an example?

With your patch applied, when I create a table, the first time I update
it heap_page_prune_opt() will return before actually doing any pruning
because the page prune_xid hadn't been set (it is set after pruning as
well as later in heap_update() after RelationGetBufferForTuple() is
called).

I actually added an additional parameter to heap_page_prune() and
heap_page_prune_opt() to identify if heap_page_prune() was called from
RelationGetBufferForTuple() and logged a message when this was true.
Running the test suite, I didn't see any UPDATEs executing
heap_page_prune() from RelationGetBufferForTuple(). I did, however, see
other statement types doing so (see RelationGetBufferForTuple()'s other
callers). Was that intended?

> I started to work on benchmarking this, but haven't had time to devote
> properly to that, so I'm wondering if there's anyone who might be
> interested in collaborating on that part.

I'm interested in this feature and in helping with it/helping with
benchmarking it, but I don't yet understand the design in its current
form.

- Melanie



Re: Opportunistically pruning page before update

От
James Coleman
Дата:
On Tue, Sep 5, 2023 at 1:40 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Wed, Jun 21, 2023 at 8:51 AM James Coleman <jtc331@gmail.com> wrote:
> > While at PGCon I was chatting with Andres (and I think Peter G. and a
> > few others who I can't remember at the moment, apologies) and Andres
> > noted that while we opportunistically prune a page when inserting a
> > tuple (before deciding we need a new page) we don't do the same for
> > updates.
> >
> > Attached is a patch series to do the following:
> >
> > 0001: Make it possible to call heap_page_prune_opt already holding an
> > exclusive lock on the buffer.
> > 0002: Opportunistically prune pages on update when the current tuple's
> > page has no free space. If this frees up enough space, then we
> > continue to put the new tuple on that page; if not, then we take the
> > existing code path and get a new page.
>
> I've reviewed these patches and have questions.
>
> Under what conditions would this be exercised for UPDATE? Could you
> provide an example?
>
> With your patch applied, when I create a table, the first time I update
> it heap_page_prune_opt() will return before actually doing any pruning
> because the page prune_xid hadn't been set (it is set after pruning as
> well as later in heap_update() after RelationGetBufferForTuple() is
> called).
>
> I actually added an additional parameter to heap_page_prune() and
> heap_page_prune_opt() to identify if heap_page_prune() was called from
> RelationGetBufferForTuple() and logged a message when this was true.
> Running the test suite, I didn't see any UPDATEs executing
> heap_page_prune() from RelationGetBufferForTuple(). I did, however, see
> other statement types doing so (see RelationGetBufferForTuple()'s other
> callers). Was that intended?
>
> > I started to work on benchmarking this, but haven't had time to devote
> > properly to that, so I'm wondering if there's anyone who might be
> > interested in collaborating on that part.
>
> I'm interested in this feature and in helping with it/helping with
> benchmarking it, but I don't yet understand the design in its current
> form.

Hi Melanie,

Thanks for taking a look at this! Apologies for the long delay in
replying: I started to take a look at your questions earlier, and it
turned into more of a rabbit hole than I'd anticipated. I've since
been distracted by other things. So -- I don't have any conclusions
here yet, but I'm hoping at or after PGConf NYC that I'll be able to
dedicate the time this deserves.

Thanks,
James Coleman



Re: Opportunistically pruning page before update

От
James Coleman
Дата:
On Tue, Sep 26, 2023 at 8:30 AM James Coleman <jtc331@gmail.com> wrote:
>
> On Tue, Sep 5, 2023 at 1:40 PM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> >
> > On Wed, Jun 21, 2023 at 8:51 AM James Coleman <jtc331@gmail.com> wrote:
> > > While at PGCon I was chatting with Andres (and I think Peter G. and a
> > > few others who I can't remember at the moment, apologies) and Andres
> > > noted that while we opportunistically prune a page when inserting a
> > > tuple (before deciding we need a new page) we don't do the same for
> > > updates.
> > >
> > > Attached is a patch series to do the following:
> > >
> > > 0001: Make it possible to call heap_page_prune_opt already holding an
> > > exclusive lock on the buffer.
> > > 0002: Opportunistically prune pages on update when the current tuple's
> > > page has no free space. If this frees up enough space, then we
> > > continue to put the new tuple on that page; if not, then we take the
> > > existing code path and get a new page.
> >
> > I've reviewed these patches and have questions.
> >
> > Under what conditions would this be exercised for UPDATE? Could you
> > provide an example?
> >
> > With your patch applied, when I create a table, the first time I update
> > it heap_page_prune_opt() will return before actually doing any pruning
> > because the page prune_xid hadn't been set (it is set after pruning as
> > well as later in heap_update() after RelationGetBufferForTuple() is
> > called).
> >
> > I actually added an additional parameter to heap_page_prune() and
> > heap_page_prune_opt() to identify if heap_page_prune() was called from
> > RelationGetBufferForTuple() and logged a message when this was true.
> > Running the test suite, I didn't see any UPDATEs executing
> > heap_page_prune() from RelationGetBufferForTuple(). I did, however, see
> > other statement types doing so (see RelationGetBufferForTuple()'s other
> > callers). Was that intended?
> >
> > > I started to work on benchmarking this, but haven't had time to devote
> > > properly to that, so I'm wondering if there's anyone who might be
> > > interested in collaborating on that part.
> >
> > I'm interested in this feature and in helping with it/helping with
> > benchmarking it, but I don't yet understand the design in its current
> > form.
>
> Hi Melanie,
>
> Thanks for taking a look at this! Apologies for the long delay in
> replying: I started to take a look at your questions earlier, and it
> turned into more of a rabbit hole than I'd anticipated. I've since
> been distracted by other things. So -- I don't have any conclusions
> here yet, but I'm hoping at or after PGConf NYC that I'll be able to
> dedicate the time this deserves.

Hi,

I poked at this a decent amount last night and uncovered a couple of
things (whether or not Andres and I had discussed these details at
PGCon...I don't remember):

1. We don't ever opportunistically prune on INSERT, but we do
(somewhat, see below) on UPDATE, since we call it the first time we
read the page with the to-be-updated tuple on it.
2. The reason that original testing on v1 didn't see any real changes
is because PageClearHasFreeLinePointers() wasn't the right fastpath
gate on this; I should have been using !PageIsFull().

With the change to use !PageIsFull() I can trivially show that there
is improvement functionally. Consider the following commands:

drop table if exists foo;
create table foo(pk serial primary key, t text);
insert into foo(t) select repeat('a', 250) from generate_series(1, 27);
select pg_relation_size('foo');
delete from foo where pk <= 10;
insert into foo(t) select repeat('b', 250) from generate_series(1, 10);
select pg_relation_size('foo');

On master this will result in a final relation size of 16384 while
with the patch applied the final relation size is 8192.

I talked to Andres and Peter again today, and out of that conversation
I have some observations and ideas for future improvements.

1. The most trivial case where this is useful is INSERT: we have a
target page, and it may have dead tuples, so trying to prune may
result in us being able to use the target page rather than getting a
new page.
2. The next most trivial case is where UPDATE (potentially after
failing to find space for a HOT tuple on the source tuple's page);
much like the INSERT case our backend's target page may benefit from
pruning.
3. A more complex UPDATE case occurs when we check the tuple's page
for space in order to insert a HOT tuple and fail to find enough
space. While we've already opportunistically pruned the page on
initial read of the tuple, in complex queries this might be some time
in the past, so it may be worth attempting again. Beyond that context
is key: if we already know we could otherwise do a HOT update but for
the lack of free space on the page, then spending extra cycles
rescuing that failed attempt is easier to justify. In order to do that
we ought to invent an "aggressive" flag to heap_page_prune_opt telling
it that it doesn't need to be quite so careful about exiting fast.
Perhaps we can rescue the HOT update optimization by pruning
aggressively.
4. We can prune the target page when the current backend recently
aborted a transaction. Additionally we could prune the target page
immediately on rollback (potentially we could even get into the
complexity of doing retail index tuple deletion when a transaction
aborts).

It may or may not be the case that I end up pursuing all of these in
this particular patch series, but I wanted to at least get it written
down here for history's sake.

The attached v2 patch series handles case 1 and likely case 2 (though
I haven't tested case 2 yet). The "log when pruning" patch files may
or may not be useful to you: they add a bunch of logging to make it
easier to observe what's happening while playing around in psql.

Regards,
James

Вложения

Re: Opportunistically pruning page before update

От
Dilip Kumar
Дата:
On Thu, Oct 5, 2023 at 2:35 AM James Coleman <jtc331@gmail.com> wrote:
>
> I talked to Andres and Peter again today, and out of that conversation
> I have some observations and ideas for future improvements.
>
> 1. The most trivial case where this is useful is INSERT: we have a
> target page, and it may have dead tuples, so trying to prune may
> result in us being able to use the target page rather than getting a
> new page.
> 2. The next most trivial case is where UPDATE (potentially after
> failing to find space for a HOT tuple on the source tuple's page);
> much like the INSERT case our backend's target page may benefit from
> pruning.

By looking at the patch I believe that v2-0003 is implementing these 2
ideas.  So my question is are we planning to prune the backend's
current target page only or if we can not find space in that then we
are targetting to prune the other target pages as well which we are
getting from FSM?  Because in the patch you have put a check in a loop
it will try to prune every page it gets from the FSM not just the
current target page of the backend.  Just wanted to understand if this
is intentional.

In general, all 4 ideas look promising.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Opportunistically pruning page before update

От
James Coleman
Дата:
Hi,

Thanks for taking a look!

On Fri, Oct 6, 2023 at 1:18 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Oct 5, 2023 at 2:35 AM James Coleman <jtc331@gmail.com> wrote:
> >
> > I talked to Andres and Peter again today, and out of that conversation
> > I have some observations and ideas for future improvements.
> >
> > 1. The most trivial case where this is useful is INSERT: we have a
> > target page, and it may have dead tuples, so trying to prune may
> > result in us being able to use the target page rather than getting a
> > new page.
> > 2. The next most trivial case is where UPDATE (potentially after
> > failing to find space for a HOT tuple on the source tuple's page);
> > much like the INSERT case our backend's target page may benefit from
> > pruning.
>
> By looking at the patch I believe that v2-0003 is implementing these 2
> ideas.  So my question is are we planning to prune the backend's
> current target page only or if we can not find space in that then we
> are targetting to prune the other target pages as well which we are
> getting from FSM?  Because in the patch you have put a check in a loop
> it will try to prune every page it gets from the FSM not just the
> current target page of the backend.  Just wanted to understand if this
> is intentional.

Yes, just like with our opportunistically pruning on each read during
a select I think we should at least check when we have a new target
page. This seems particularly true since we're hoping to write to the
page anyway, and the cost of additionally making pruning changes to
that page is low. I looked at freespace.c, and it doesn't appear that
getting the block from the FSM does this already, so we're not
duplicating any existing work.

Regards,
James



Re: Opportunistically pruning page before update

От
Peter Smith
Дата:
2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
like there was some CFbot test failure last time it was run [2].
Please have a look and post an updated version if necessary.

======
[1] https://commitfest.postgresql.org/46/4384//
[2] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4384



Re: Opportunistically pruning page before update

От
James Coleman
Дата:
On Sun, Jan 21, 2024 at 9:58 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> 2024-01 Commitfest.
>
> Hi, This patch has a CF status of "Needs Review" [1], but it seems
> like there was some CFbot test failure last time it was run [2].
> Please have a look and post an updated version if necessary.
>
> ======
> [1] https://commitfest.postgresql.org/46/4384//
> [2] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4384

See rebased patch attached.

Thanks,
James Coleman

Вложения

Re: Opportunistically pruning page before update

От
James Coleman
Дата:
On Mon, Jan 22, 2024 at 8:21 PM James Coleman <jtc331@gmail.com> wrote:
>
> See rebased patch attached.

I just realized I left a change in during the rebase that wasn't necessary.

v4 attached.

Regards,
James Coleman

Вложения

Re: Opportunistically pruning page before update

От
Dilip Kumar
Дата:
On Tue, Jan 23, 2024 at 7:18 AM James Coleman <jtc331@gmail.com> wrote:
>
> On Mon, Jan 22, 2024 at 8:21 PM James Coleman <jtc331@gmail.com> wrote:
> >
> > See rebased patch attached.
>
> I just realized I left a change in during the rebase that wasn't necessary.
>
> v4 attached.

I have noticed that you are performing the opportunistic pruning after
we decided that the updated tuple can not fit in the current page and
then we are performing the pruning on the new target page.  Why don't
we first perform the pruning on the existing page of the tuple itself?
 Or this is already being done before this patch? I could not find
such existing pruning so got this question because such pruning can
convert many non-hot updates to the HOT update right?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Opportunistically pruning page before update

От
James Coleman
Дата:
On Tue, Jan 23, 2024 at 2:46 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Jan 23, 2024 at 7:18 AM James Coleman <jtc331@gmail.com> wrote:
> >
> > On Mon, Jan 22, 2024 at 8:21 PM James Coleman <jtc331@gmail.com> wrote:
> > >
> > > See rebased patch attached.
> >
> > I just realized I left a change in during the rebase that wasn't necessary.
> >
> > v4 attached.
>
> I have noticed that you are performing the opportunistic pruning after
> we decided that the updated tuple can not fit in the current page and
> then we are performing the pruning on the new target page.  Why don't
> we first perform the pruning on the existing page of the tuple itself?
>  Or this is already being done before this patch? I could not find
> such existing pruning so got this question because such pruning can
> convert many non-hot updates to the HOT update right?

First off I noticed that I accidentally sent a different version of
the patch I'd originally worked on. Here's the one from the proper
branch. It's still similar, but I want to make sure the right one is
being reviewed.

I'm working on a demo case for updates (to go along with the insert
case I sent earlier) to test out your question, and I'll reply when I
have that.

Regards,
James Coleman

Вложения

Re: Opportunistically pruning page before update

От
James Coleman
Дата:
On Fri, Jan 26, 2024 at 8:33 PM James Coleman <jtc331@gmail.com> wrote:
>
> On Tue, Jan 23, 2024 at 2:46 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Tue, Jan 23, 2024 at 7:18 AM James Coleman <jtc331@gmail.com> wrote:
> > >
> > > On Mon, Jan 22, 2024 at 8:21 PM James Coleman <jtc331@gmail.com> wrote:
> > > >
> > > > See rebased patch attached.
> > >
> > > I just realized I left a change in during the rebase that wasn't necessary.
> > >
> > > v4 attached.
> >
> > I have noticed that you are performing the opportunistic pruning after
> > we decided that the updated tuple can not fit in the current page and
> > then we are performing the pruning on the new target page.  Why don't
> > we first perform the pruning on the existing page of the tuple itself?
> >  Or this is already being done before this patch? I could not find
> > such existing pruning so got this question because such pruning can
> > convert many non-hot updates to the HOT update right?
>
> First off I noticed that I accidentally sent a different version of
> the patch I'd originally worked on. Here's the one from the proper
> branch. It's still similar, but I want to make sure the right one is
> being reviewed.
>
> I'm working on a demo case for updates (to go along with the insert
> case I sent earlier) to test out your question, and I'll reply when I
> have that.

All right, getting all this loaded back into my head, as you noted
earlier the patch currently implements points 1 and 2 of my list of
possible improvements:

> 1. The most trivial case where this is useful is INSERT: we have a
> target page, and it may have dead tuples, so trying to prune may
> result in us being able to use the target page rather than getting a
> new page.
> 2. The next most trivial case is where UPDATE (potentially after
> failing to find space for a HOT tuple on the source tuple's page);
> much like the INSERT case our backend's target page may benefit from
> pruning.

What you're describing above would be implementing (at least part of) point 3:

> 3. A more complex UPDATE case occurs when we check the tuple's page
> for space in order to insert a HOT tuple and fail to find enough
> space. While we've already opportunistically pruned the page on
> initial read of the tuple, in complex queries this might be some time
> in the past, so it may be worth attempting again.
> ...

If we try to design a simple test case for updates (like my insert
test case above) we might end up with something like:

drop table if exists foo;
create table foo(pk serial primary key, t text);
insert into foo(t) select repeat('a', 250) from generate_series(1, 27);
select pg_relation_size('foo');
delete from foo where pk = 1;
update foo set t = repeat('b', 250) where pk = 2;
select pg_relation_size('foo');

But that actually works as expected on master, because we call
heap_page_prune_opt from heapam_index_fetch_tuple as part of the index
scan that drives the update query.

I was theorizing that if there are concurrent writes to the page we
might being able to trigger the need to re-prune a page in the for
loop in heap_update(), and I tried to both regular pgbench and a
custom pgbench script with inserts/deletes/updates (including some
artificial delays).

What I concluded what this isn't isn't likely to be fruitful: we need
the buffer to be local to our backend (no other pins) to be able to
clean it, but since we've already pruned it on read, we need to have
had another backend modify the page (and dropped its pin!) between our
read and our write.

If someone believes there's a scenario that would demonstrate
otherwise, I would of course be interested to hear any ideas, but at
this point I think it's probably worth focusing on the first two cases
this patch already addresses.

Regards,
James Coleman



Re: Opportunistically pruning page before update

От
Melanie Plageman
Дата:
On Fri, Jan 26, 2024 at 8:33 PM James Coleman <jtc331@gmail.com> wrote:
>
> On Tue, Jan 23, 2024 at 2:46 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Tue, Jan 23, 2024 at 7:18 AM James Coleman <jtc331@gmail.com> wrote:
> > >
> > > On Mon, Jan 22, 2024 at 8:21 PM James Coleman <jtc331@gmail.com> wrote:
> > > >
> > > > See rebased patch attached.
> > >
> > > I just realized I left a change in during the rebase that wasn't necessary.
> > >
> > > v4 attached.
> >
> > I have noticed that you are performing the opportunistic pruning after
> > we decided that the updated tuple can not fit in the current page and
> > then we are performing the pruning on the new target page.  Why don't
> > we first perform the pruning on the existing page of the tuple itself?
> >  Or this is already being done before this patch? I could not find
> > such existing pruning so got this question because such pruning can
> > convert many non-hot updates to the HOT update right?
>
> First off I noticed that I accidentally sent a different version of
> the patch I'd originally worked on. Here's the one from the proper
> branch. It's still similar, but I want to make sure the right one is
> being reviewed.

I finally got back around to looking at this. Sorry for the delay.

I don't feel confident enough to say at a high level whether or not it
is a good idea in the general case to try pruning every block
RelationGetBufferForTuple() considers as a target block.

But, I did have a few thoughts on the implementation:

heap_page_prune_opt() checks PageGetHeapFreeSpace() twice. You will
have already done that in RelationGetBufferForTuple(). And you don't
need even need to do it both of those times because you have a lock
(which is why heap_page_prune_opt() does it twice). This all seems a
bit wasteful. And then, you do it again after pruning.

This made me think, vacuum cares how much space heap_page_prune() (now
heap_page_prune_and_freeze()) freed up. Now if we add another caller
who cares how much space pruning freed up, perhaps it is worth
calculating this while pruning and returning it. I know
PageGetHeapFreeSpace() isn't the most expensive function in the world,
but it also seems like a useful, relevant piece of information to
inform the caller of.

You don't have to implement the above, it was just something I was
thinking about.

Looking at the code also made me wonder if it is worth changing
RelationGetBufferForTuple() to call PageGetHeapFreeSpace() before
taking a lock (which won't give a totally accurate result, but that's
probably okay) and then call heap_page_prune_opt() without a lock when
PageGetHeapFreeSpace() says there isn't enough space.

Also do we want to do GetVisibilityMapPins() before calling
heap_page_prune_opt()? I don't quite get why we do that before knowing
if we are going to actually use the target block in the current code.

Anyway, I'm not sure I like just adding a parameter to
heap_page_prune_opt() to indicate it already has an exclusive lock. It
does a bunch of stuff that was always done without a lock and now you
are doing it with an exclusive lock.

- Melanie