Обсуждение: Opportunistically pruning page before update
Hello, While at PGCon I was chatting with Andres (and I think Peter G. and a few others who I can't remember at the moment, apologies) and Andres noted that while we opportunistically prune a page when inserting a tuple (before deciding we need a new page) we don't do the same for updates. Attached is a patch series to do the following: 0001: Make it possible to call heap_page_prune_opt already holding an exclusive lock on the buffer. 0002: Opportunistically prune pages on update when the current tuple's page has no free space. If this frees up enough space, then we continue to put the new tuple on that page; if not, then we take the existing code path and get a new page. One would plausibly expect the following improvements: - Reduced table bloat - Increased HOT update rate - Improved performance on updates I started to work on benchmarking this, but haven't had time to devote properly to that, so I'm wondering if there's anyone who might be interested in collaborating on that part. Other TODOs: - Audit other callers of RelationSetTargetBlock() to ensure they don't hold pointers into the page. Regards, James Coleman
Вложения
On Wed, Jun 21, 2023 at 8:51 AM James Coleman <jtc331@gmail.com> wrote: > While at PGCon I was chatting with Andres (and I think Peter G. and a > few others who I can't remember at the moment, apologies) and Andres > noted that while we opportunistically prune a page when inserting a > tuple (before deciding we need a new page) we don't do the same for > updates. > > Attached is a patch series to do the following: > > 0001: Make it possible to call heap_page_prune_opt already holding an > exclusive lock on the buffer. > 0002: Opportunistically prune pages on update when the current tuple's > page has no free space. If this frees up enough space, then we > continue to put the new tuple on that page; if not, then we take the > existing code path and get a new page. I've reviewed these patches and have questions. Under what conditions would this be exercised for UPDATE? Could you provide an example? With your patch applied, when I create a table, the first time I update it heap_page_prune_opt() will return before actually doing any pruning because the page prune_xid hadn't been set (it is set after pruning as well as later in heap_update() after RelationGetBufferForTuple() is called). I actually added an additional parameter to heap_page_prune() and heap_page_prune_opt() to identify if heap_page_prune() was called from RelationGetBufferForTuple() and logged a message when this was true. Running the test suite, I didn't see any UPDATEs executing heap_page_prune() from RelationGetBufferForTuple(). I did, however, see other statement types doing so (see RelationGetBufferForTuple()'s other callers). Was that intended? > I started to work on benchmarking this, but haven't had time to devote > properly to that, so I'm wondering if there's anyone who might be > interested in collaborating on that part. I'm interested in this feature and in helping with it/helping with benchmarking it, but I don't yet understand the design in its current form. - Melanie
On Tue, Sep 5, 2023 at 1:40 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > > On Wed, Jun 21, 2023 at 8:51 AM James Coleman <jtc331@gmail.com> wrote: > > While at PGCon I was chatting with Andres (and I think Peter G. and a > > few others who I can't remember at the moment, apologies) and Andres > > noted that while we opportunistically prune a page when inserting a > > tuple (before deciding we need a new page) we don't do the same for > > updates. > > > > Attached is a patch series to do the following: > > > > 0001: Make it possible to call heap_page_prune_opt already holding an > > exclusive lock on the buffer. > > 0002: Opportunistically prune pages on update when the current tuple's > > page has no free space. If this frees up enough space, then we > > continue to put the new tuple on that page; if not, then we take the > > existing code path and get a new page. > > I've reviewed these patches and have questions. > > Under what conditions would this be exercised for UPDATE? Could you > provide an example? > > With your patch applied, when I create a table, the first time I update > it heap_page_prune_opt() will return before actually doing any pruning > because the page prune_xid hadn't been set (it is set after pruning as > well as later in heap_update() after RelationGetBufferForTuple() is > called). > > I actually added an additional parameter to heap_page_prune() and > heap_page_prune_opt() to identify if heap_page_prune() was called from > RelationGetBufferForTuple() and logged a message when this was true. > Running the test suite, I didn't see any UPDATEs executing > heap_page_prune() from RelationGetBufferForTuple(). I did, however, see > other statement types doing so (see RelationGetBufferForTuple()'s other > callers). Was that intended? > > > I started to work on benchmarking this, but haven't had time to devote > > properly to that, so I'm wondering if there's anyone who might be > > interested in collaborating on that part. > > I'm interested in this feature and in helping with it/helping with > benchmarking it, but I don't yet understand the design in its current > form. Hi Melanie, Thanks for taking a look at this! Apologies for the long delay in replying: I started to take a look at your questions earlier, and it turned into more of a rabbit hole than I'd anticipated. I've since been distracted by other things. So -- I don't have any conclusions here yet, but I'm hoping at or after PGConf NYC that I'll be able to dedicate the time this deserves. Thanks, James Coleman
On Tue, Sep 26, 2023 at 8:30 AM James Coleman <jtc331@gmail.com> wrote: > > On Tue, Sep 5, 2023 at 1:40 PM Melanie Plageman > <melanieplageman@gmail.com> wrote: > > > > On Wed, Jun 21, 2023 at 8:51 AM James Coleman <jtc331@gmail.com> wrote: > > > While at PGCon I was chatting with Andres (and I think Peter G. and a > > > few others who I can't remember at the moment, apologies) and Andres > > > noted that while we opportunistically prune a page when inserting a > > > tuple (before deciding we need a new page) we don't do the same for > > > updates. > > > > > > Attached is a patch series to do the following: > > > > > > 0001: Make it possible to call heap_page_prune_opt already holding an > > > exclusive lock on the buffer. > > > 0002: Opportunistically prune pages on update when the current tuple's > > > page has no free space. If this frees up enough space, then we > > > continue to put the new tuple on that page; if not, then we take the > > > existing code path and get a new page. > > > > I've reviewed these patches and have questions. > > > > Under what conditions would this be exercised for UPDATE? Could you > > provide an example? > > > > With your patch applied, when I create a table, the first time I update > > it heap_page_prune_opt() will return before actually doing any pruning > > because the page prune_xid hadn't been set (it is set after pruning as > > well as later in heap_update() after RelationGetBufferForTuple() is > > called). > > > > I actually added an additional parameter to heap_page_prune() and > > heap_page_prune_opt() to identify if heap_page_prune() was called from > > RelationGetBufferForTuple() and logged a message when this was true. > > Running the test suite, I didn't see any UPDATEs executing > > heap_page_prune() from RelationGetBufferForTuple(). I did, however, see > > other statement types doing so (see RelationGetBufferForTuple()'s other > > callers). Was that intended? > > > > > I started to work on benchmarking this, but haven't had time to devote > > > properly to that, so I'm wondering if there's anyone who might be > > > interested in collaborating on that part. > > > > I'm interested in this feature and in helping with it/helping with > > benchmarking it, but I don't yet understand the design in its current > > form. > > Hi Melanie, > > Thanks for taking a look at this! Apologies for the long delay in > replying: I started to take a look at your questions earlier, and it > turned into more of a rabbit hole than I'd anticipated. I've since > been distracted by other things. So -- I don't have any conclusions > here yet, but I'm hoping at or after PGConf NYC that I'll be able to > dedicate the time this deserves. Hi, I poked at this a decent amount last night and uncovered a couple of things (whether or not Andres and I had discussed these details at PGCon...I don't remember): 1. We don't ever opportunistically prune on INSERT, but we do (somewhat, see below) on UPDATE, since we call it the first time we read the page with the to-be-updated tuple on it. 2. The reason that original testing on v1 didn't see any real changes is because PageClearHasFreeLinePointers() wasn't the right fastpath gate on this; I should have been using !PageIsFull(). With the change to use !PageIsFull() I can trivially show that there is improvement functionally. Consider the following commands: drop table if exists foo; create table foo(pk serial primary key, t text); insert into foo(t) select repeat('a', 250) from generate_series(1, 27); select pg_relation_size('foo'); delete from foo where pk <= 10; insert into foo(t) select repeat('b', 250) from generate_series(1, 10); select pg_relation_size('foo'); On master this will result in a final relation size of 16384 while with the patch applied the final relation size is 8192. I talked to Andres and Peter again today, and out of that conversation I have some observations and ideas for future improvements. 1. The most trivial case where this is useful is INSERT: we have a target page, and it may have dead tuples, so trying to prune may result in us being able to use the target page rather than getting a new page. 2. The next most trivial case is where UPDATE (potentially after failing to find space for a HOT tuple on the source tuple's page); much like the INSERT case our backend's target page may benefit from pruning. 3. A more complex UPDATE case occurs when we check the tuple's page for space in order to insert a HOT tuple and fail to find enough space. While we've already opportunistically pruned the page on initial read of the tuple, in complex queries this might be some time in the past, so it may be worth attempting again. Beyond that context is key: if we already know we could otherwise do a HOT update but for the lack of free space on the page, then spending extra cycles rescuing that failed attempt is easier to justify. In order to do that we ought to invent an "aggressive" flag to heap_page_prune_opt telling it that it doesn't need to be quite so careful about exiting fast. Perhaps we can rescue the HOT update optimization by pruning aggressively. 4. We can prune the target page when the current backend recently aborted a transaction. Additionally we could prune the target page immediately on rollback (potentially we could even get into the complexity of doing retail index tuple deletion when a transaction aborts). It may or may not be the case that I end up pursuing all of these in this particular patch series, but I wanted to at least get it written down here for history's sake. The attached v2 patch series handles case 1 and likely case 2 (though I haven't tested case 2 yet). The "log when pruning" patch files may or may not be useful to you: they add a bunch of logging to make it easier to observe what's happening while playing around in psql. Regards, James
Вложения
On Thu, Oct 5, 2023 at 2:35 AM James Coleman <jtc331@gmail.com> wrote: > > I talked to Andres and Peter again today, and out of that conversation > I have some observations and ideas for future improvements. > > 1. The most trivial case where this is useful is INSERT: we have a > target page, and it may have dead tuples, so trying to prune may > result in us being able to use the target page rather than getting a > new page. > 2. The next most trivial case is where UPDATE (potentially after > failing to find space for a HOT tuple on the source tuple's page); > much like the INSERT case our backend's target page may benefit from > pruning. By looking at the patch I believe that v2-0003 is implementing these 2 ideas. So my question is are we planning to prune the backend's current target page only or if we can not find space in that then we are targetting to prune the other target pages as well which we are getting from FSM? Because in the patch you have put a check in a loop it will try to prune every page it gets from the FSM not just the current target page of the backend. Just wanted to understand if this is intentional. In general, all 4 ideas look promising. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Hi, Thanks for taking a look! On Fri, Oct 6, 2023 at 1:18 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Oct 5, 2023 at 2:35 AM James Coleman <jtc331@gmail.com> wrote: > > > > I talked to Andres and Peter again today, and out of that conversation > > I have some observations and ideas for future improvements. > > > > 1. The most trivial case where this is useful is INSERT: we have a > > target page, and it may have dead tuples, so trying to prune may > > result in us being able to use the target page rather than getting a > > new page. > > 2. The next most trivial case is where UPDATE (potentially after > > failing to find space for a HOT tuple on the source tuple's page); > > much like the INSERT case our backend's target page may benefit from > > pruning. > > By looking at the patch I believe that v2-0003 is implementing these 2 > ideas. So my question is are we planning to prune the backend's > current target page only or if we can not find space in that then we > are targetting to prune the other target pages as well which we are > getting from FSM? Because in the patch you have put a check in a loop > it will try to prune every page it gets from the FSM not just the > current target page of the backend. Just wanted to understand if this > is intentional. Yes, just like with our opportunistically pruning on each read during a select I think we should at least check when we have a new target page. This seems particularly true since we're hoping to write to the page anyway, and the cost of additionally making pruning changes to that page is low. I looked at freespace.c, and it doesn't appear that getting the block from the FSM does this already, so we're not duplicating any existing work. Regards, James
2024-01 Commitfest. Hi, This patch has a CF status of "Needs Review" [1], but it seems like there was some CFbot test failure last time it was run [2]. Please have a look and post an updated version if necessary. ====== [1] https://commitfest.postgresql.org/46/4384// [2] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4384
On Sun, Jan 21, 2024 at 9:58 PM Peter Smith <smithpb2250@gmail.com> wrote: > > 2024-01 Commitfest. > > Hi, This patch has a CF status of "Needs Review" [1], but it seems > like there was some CFbot test failure last time it was run [2]. > Please have a look and post an updated version if necessary. > > ====== > [1] https://commitfest.postgresql.org/46/4384// > [2] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4384 See rebased patch attached. Thanks, James Coleman
Вложения
On Mon, Jan 22, 2024 at 8:21 PM James Coleman <jtc331@gmail.com> wrote: > > See rebased patch attached. I just realized I left a change in during the rebase that wasn't necessary. v4 attached. Regards, James Coleman
Вложения
On Tue, Jan 23, 2024 at 7:18 AM James Coleman <jtc331@gmail.com> wrote: > > On Mon, Jan 22, 2024 at 8:21 PM James Coleman <jtc331@gmail.com> wrote: > > > > See rebased patch attached. > > I just realized I left a change in during the rebase that wasn't necessary. > > v4 attached. I have noticed that you are performing the opportunistic pruning after we decided that the updated tuple can not fit in the current page and then we are performing the pruning on the new target page. Why don't we first perform the pruning on the existing page of the tuple itself? Or this is already being done before this patch? I could not find such existing pruning so got this question because such pruning can convert many non-hot updates to the HOT update right? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, Jan 23, 2024 at 2:46 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Jan 23, 2024 at 7:18 AM James Coleman <jtc331@gmail.com> wrote: > > > > On Mon, Jan 22, 2024 at 8:21 PM James Coleman <jtc331@gmail.com> wrote: > > > > > > See rebased patch attached. > > > > I just realized I left a change in during the rebase that wasn't necessary. > > > > v4 attached. > > I have noticed that you are performing the opportunistic pruning after > we decided that the updated tuple can not fit in the current page and > then we are performing the pruning on the new target page. Why don't > we first perform the pruning on the existing page of the tuple itself? > Or this is already being done before this patch? I could not find > such existing pruning so got this question because such pruning can > convert many non-hot updates to the HOT update right? First off I noticed that I accidentally sent a different version of the patch I'd originally worked on. Here's the one from the proper branch. It's still similar, but I want to make sure the right one is being reviewed. I'm working on a demo case for updates (to go along with the insert case I sent earlier) to test out your question, and I'll reply when I have that. Regards, James Coleman
Вложения
On Fri, Jan 26, 2024 at 8:33 PM James Coleman <jtc331@gmail.com> wrote: > > On Tue, Jan 23, 2024 at 2:46 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Tue, Jan 23, 2024 at 7:18 AM James Coleman <jtc331@gmail.com> wrote: > > > > > > On Mon, Jan 22, 2024 at 8:21 PM James Coleman <jtc331@gmail.com> wrote: > > > > > > > > See rebased patch attached. > > > > > > I just realized I left a change in during the rebase that wasn't necessary. > > > > > > v4 attached. > > > > I have noticed that you are performing the opportunistic pruning after > > we decided that the updated tuple can not fit in the current page and > > then we are performing the pruning on the new target page. Why don't > > we first perform the pruning on the existing page of the tuple itself? > > Or this is already being done before this patch? I could not find > > such existing pruning so got this question because such pruning can > > convert many non-hot updates to the HOT update right? > > First off I noticed that I accidentally sent a different version of > the patch I'd originally worked on. Here's the one from the proper > branch. It's still similar, but I want to make sure the right one is > being reviewed. > > I'm working on a demo case for updates (to go along with the insert > case I sent earlier) to test out your question, and I'll reply when I > have that. All right, getting all this loaded back into my head, as you noted earlier the patch currently implements points 1 and 2 of my list of possible improvements: > 1. The most trivial case where this is useful is INSERT: we have a > target page, and it may have dead tuples, so trying to prune may > result in us being able to use the target page rather than getting a > new page. > 2. The next most trivial case is where UPDATE (potentially after > failing to find space for a HOT tuple on the source tuple's page); > much like the INSERT case our backend's target page may benefit from > pruning. What you're describing above would be implementing (at least part of) point 3: > 3. A more complex UPDATE case occurs when we check the tuple's page > for space in order to insert a HOT tuple and fail to find enough > space. While we've already opportunistically pruned the page on > initial read of the tuple, in complex queries this might be some time > in the past, so it may be worth attempting again. > ... If we try to design a simple test case for updates (like my insert test case above) we might end up with something like: drop table if exists foo; create table foo(pk serial primary key, t text); insert into foo(t) select repeat('a', 250) from generate_series(1, 27); select pg_relation_size('foo'); delete from foo where pk = 1; update foo set t = repeat('b', 250) where pk = 2; select pg_relation_size('foo'); But that actually works as expected on master, because we call heap_page_prune_opt from heapam_index_fetch_tuple as part of the index scan that drives the update query. I was theorizing that if there are concurrent writes to the page we might being able to trigger the need to re-prune a page in the for loop in heap_update(), and I tried to both regular pgbench and a custom pgbench script with inserts/deletes/updates (including some artificial delays). What I concluded what this isn't isn't likely to be fruitful: we need the buffer to be local to our backend (no other pins) to be able to clean it, but since we've already pruned it on read, we need to have had another backend modify the page (and dropped its pin!) between our read and our write. If someone believes there's a scenario that would demonstrate otherwise, I would of course be interested to hear any ideas, but at this point I think it's probably worth focusing on the first two cases this patch already addresses. Regards, James Coleman
On Fri, Jan 26, 2024 at 8:33 PM James Coleman <jtc331@gmail.com> wrote: > > On Tue, Jan 23, 2024 at 2:46 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Tue, Jan 23, 2024 at 7:18 AM James Coleman <jtc331@gmail.com> wrote: > > > > > > On Mon, Jan 22, 2024 at 8:21 PM James Coleman <jtc331@gmail.com> wrote: > > > > > > > > See rebased patch attached. > > > > > > I just realized I left a change in during the rebase that wasn't necessary. > > > > > > v4 attached. > > > > I have noticed that you are performing the opportunistic pruning after > > we decided that the updated tuple can not fit in the current page and > > then we are performing the pruning on the new target page. Why don't > > we first perform the pruning on the existing page of the tuple itself? > > Or this is already being done before this patch? I could not find > > such existing pruning so got this question because such pruning can > > convert many non-hot updates to the HOT update right? > > First off I noticed that I accidentally sent a different version of > the patch I'd originally worked on. Here's the one from the proper > branch. It's still similar, but I want to make sure the right one is > being reviewed. I finally got back around to looking at this. Sorry for the delay. I don't feel confident enough to say at a high level whether or not it is a good idea in the general case to try pruning every block RelationGetBufferForTuple() considers as a target block. But, I did have a few thoughts on the implementation: heap_page_prune_opt() checks PageGetHeapFreeSpace() twice. You will have already done that in RelationGetBufferForTuple(). And you don't need even need to do it both of those times because you have a lock (which is why heap_page_prune_opt() does it twice). This all seems a bit wasteful. And then, you do it again after pruning. This made me think, vacuum cares how much space heap_page_prune() (now heap_page_prune_and_freeze()) freed up. Now if we add another caller who cares how much space pruning freed up, perhaps it is worth calculating this while pruning and returning it. I know PageGetHeapFreeSpace() isn't the most expensive function in the world, but it also seems like a useful, relevant piece of information to inform the caller of. You don't have to implement the above, it was just something I was thinking about. Looking at the code also made me wonder if it is worth changing RelationGetBufferForTuple() to call PageGetHeapFreeSpace() before taking a lock (which won't give a totally accurate result, but that's probably okay) and then call heap_page_prune_opt() without a lock when PageGetHeapFreeSpace() says there isn't enough space. Also do we want to do GetVisibilityMapPins() before calling heap_page_prune_opt()? I don't quite get why we do that before knowing if we are going to actually use the target block in the current code. Anyway, I'm not sure I like just adding a parameter to heap_page_prune_opt() to indicate it already has an exclusive lock. It does a bunch of stuff that was always done without a lock and now you are doing it with an exclusive lock. - Melanie