Обсуждение: Changing types of block and chunk sizes in memory contexts
Hi hackers, In memory contexts, block and chunk sizes are likely to be limited by some upper bounds. Some examples of those bounds can be MEMORYCHUNK_MAX_BLOCKOFFSET and MEMORYCHUNK_MAX_VALUE. Both values are only 1 less than 1GB. This makes memory contexts to have blocks/chunks with sizes less than 1GB. Such sizes can be stored in 32-bits. Currently, "Size" type, which is 64-bit, is used, but 32-bit integers should be enough to store any value less than 1GB. Attached patch is an attempt to change the types of some fields to uint32 from Size in aset, slab and generation memory contexts. I tried to find most of the places that needed to be changed to uint32, but I probably missed some. I can add more places if you feel like it. I would appreciate any feedback. Thanks, -- Melih Mutlu Microsoft
Вложения
> In memory contexts, block and chunk sizes are likely to be limited by > some upper bounds. Some examples of those bounds can be > MEMORYCHUNK_MAX_BLOCKOFFSET and MEMORYCHUNK_MAX_VALUE. Both values are > only 1 less than 1GB. > This makes memory contexts to have blocks/chunks with sizes less than > 1GB. Such sizes can be stored in 32-bits. Currently, "Size" type, > which is 64-bit, is used, but 32-bit integers should be enough to > store any value less than 1GB. size_t (= Size) is the correct type in C to store the size of an object in memory. This is partially a self-documentation issue: If I see size_t in a function signature, I know what is intended; if I see uint32, I have to wonder what the intent was. You could make an argument that using shorter types would save space for some internal structs, but then you'd have to show some more information where and why that would be beneficial. (But again, self-documentation: If one were to do that, I would argue for introducing a custom type like pg_short_size_t.) Absent any strong performance argument, I don't see the benefit of this change. People might well want to experiment with MEMORYCHUNK_... settings larger than 1GB.
On Wed, 28 Jun 2023 at 20:13, Peter Eisentraut <peter@eisentraut.org> wrote: > size_t (= Size) is the correct type in C to store the size of an object > in memory. This is partially a self-documentation issue: If I see > size_t in a function signature, I know what is intended; if I see > uint32, I have to wonder what the intent was. Perhaps it's ok to leave the context creation functions with Size typed parameters and then just Assert the passed-in sizes are not larger than 1GB within the context creation function. That way we could keep this change self contained in the .c file for the given memory context. That would mean there's no less readability. If we ever wanted to lift the 1GB limit on block sizes then we'd not need to switch the function signature again. There's documentation where the struct's field is declared, so having a smaller type in the struct itself does not seem like a reduction of documentation quality. > You could make an argument that using shorter types would save space for > some internal structs, but then you'd have to show some more information > where and why that would be beneficial. I think there's not much need to go proving this speeds something up. There's just simply no point in the struct fields being changed in Melih's patch to be bigger than 32 bits as we never need to store more than 1GB in them. Reducing these down means we may have to touch fewer cache lines and we'll also have more space on the keeper blocks to store allocations. Memory allocation performance is fairly fundamental to Postgres's performance. In my view, we shouldn't have fields that are twice as large as they need to be in code as hot as this. > Absent any strong performance argument, I don't see the benefit of this > change. People might well want to experiment with MEMORYCHUNK_... > settings larger than 1GB. Anyone doing so will be editing C code anyway. They can adjust these fields then. David
David Rowley <dgrowleyml@gmail.com> writes: > Perhaps it's ok to leave the context creation functions with Size > typed parameters and then just Assert the passed-in sizes are not > larger than 1GB within the context creation function. Yes, I'm strongly opposed to not using Size/size_t in the mmgr APIs. If we go that road, we're going to have a problem when someone inevitably wants to pass a larger-than-GB value for some context type. What happens in semi-private structs is a different matter, although I'm a little dubious that shaving a couple of bytes from context headers is a useful activity. The self-documentation argument still has some force there, so I agree with Peter that some positive benefit has to be shown. regards, tom lane
On 6/28/23 12:59, Tom Lane wrote: > David Rowley <dgrowleyml@gmail.com> writes: >> Perhaps it's ok to leave the context creation functions with Size >> typed parameters and then just Assert the passed-in sizes are not >> larger than 1GB within the context creation function. > > Yes, I'm strongly opposed to not using Size/size_t in the mmgr APIs. > If we go that road, we're going to have a problem when someone > inevitably wants to pass a larger-than-GB value for some context > type. +1 > What happens in semi-private structs is a different matter, although > I'm a little dubious that shaving a couple of bytes from context > headers is a useful activity. The self-documentation argument > still has some force there, so I agree with Peter that some positive > benefit has to be shown. > Yeah. FWIW I was interested what the patch does in practice, so I checked what pahole says about impact on struct sizes: AllocSetContext 224B -> 208B (4 cachelines) GenerationContext 152B -> 136B (3 cachelines) SlabContext 200B -> 200B (no change, adds 4B hole) Nothing else changes, AFAICS. I find it hard to believe this could have any sort of positive benefit - I doubt we ever have enough contexts for this to matter. When I first saw the patch I was thinking it's probably changing how we store the per-chunk requested_size. Maybe that'd make a difference, although 4B is tiny compared to what we waste due to the doubling. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > ... 4B is tiny compared to what we waste due to the doubling. Yeah. I've occasionally wondered if we should rethink aset.c's "only power-of-2 chunk sizes" rule. Haven't had the bandwidth to pursue the idea though. regards, tom lane
Hi, On 2023-06-28 23:26:00 +0200, Tomas Vondra wrote: > Yeah. FWIW I was interested what the patch does in practice, so I > checked what pahole says about impact on struct sizes: > > AllocSetContext 224B -> 208B (4 cachelines) > GenerationContext 152B -> 136B (3 cachelines) > SlabContext 200B -> 200B (no change, adds 4B hole) > > Nothing else changes, AFAICS. I find it hard to believe this could have > any sort of positive benefit - I doubt we ever have enough contexts for > this to matter. I don't think it's that hard to believe. We create a lot of memory contexts that we never or just barely use. Just reducing the number of cachelines touched for that can't hurt. This does't quite get us to reducing the size to a lower number of cachelines, but it's a good step. There are a few other fields that we can get rid of. - Afaics AllocSet->keeper is unnecessary these days, as it is always allocated together with the context itself. Saves 8 bytes. - The set of memory context types isn't runtime extensible. We could replace MemoryContextData->methods with a small integer index into mcxt_methods. I think that might actually end up being as-cheap or even cheaper than the current approach. Saves 8 bytes. Tthat's sufficient for going to 3 cachelines. - We could store the power of 2 for initBlockSize, nextBlockSize, maxBlockSize, instead of the "raw" value. That'd make them one byte each. Which also would get rid of the concerns around needing a "mini_size_t" type. - freeListIndex could be a single byte as well (saving 7 bytes, as right now we loose 4 trailing bytes due to padding). That would save another 12 bytes, if I calculate correctly. 25% shrinkage together ain't bad. Greetings, Andres Freund
Hi, On 2023-06-28 17:56:55 -0400, Tom Lane wrote: > Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > > ... 4B is tiny compared to what we waste due to the doubling. > > Yeah. I've occasionally wondered if we should rethink aset.c's > "only power-of-2 chunk sizes" rule. Haven't had the bandwidth > to pursue the idea though. Me too. It'd not be trivial to do without also incurring performance overhead. A somewhat easier thing we could try is to carve the "rounding up" space into smaller chunks, similar to what we do for full blocks. It wouldn't make sense to do that for the smaller size classes, but above 64-256 bytes or such, I think the wins might be big enough to outweight the costs? Of course that doesn't guarantee that that memory in those smaller size classes is going to be used... Greetings, Andres Freund
On Thu, 29 Jun 2023 at 09:26, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > AllocSetContext 224B -> 208B (4 cachelines) > GenerationContext 152B -> 136B (3 cachelines) > SlabContext 200B -> 200B (no change, adds 4B hole) > > Nothing else changes, AFAICS. I don't think a lack of a reduction in the number of cache lines is the important part. Allowing more space on the keeper block, which is at the end of the context struct seems more useful. I understand that the proposal is just to shave off 12 bytes and that's not exactly huge when it's just once per context, but we do create quite a large number of contexts with ALLOCSET_SMALL_SIZES which have a 1KB initial block size. 12 bytes in 1024 is not terrible. It's not exactly an invasive change. It does not add any complexity to the code and as far as I can see, about zero risk of it slowing anything down. David
On 6/29/23 01:34, Andres Freund wrote: > Hi, > > On 2023-06-28 23:26:00 +0200, Tomas Vondra wrote: >> Yeah. FWIW I was interested what the patch does in practice, so I >> checked what pahole says about impact on struct sizes: >> >> AllocSetContext 224B -> 208B (4 cachelines) >> GenerationContext 152B -> 136B (3 cachelines) >> SlabContext 200B -> 200B (no change, adds 4B hole) >> >> Nothing else changes, AFAICS. I find it hard to believe this could have >> any sort of positive benefit - I doubt we ever have enough contexts for >> this to matter. > > I don't think it's that hard to believe. We create a lot of memory contexts > that we never or just barely use. Just reducing the number of cachelines > touched for that can't hurt. This does't quite get us to reducing the size to > a lower number of cachelines, but it's a good step. > > There are a few other fields that we can get rid of. > > - Afaics AllocSet->keeper is unnecessary these days, as it is always allocated > together with the context itself. Saves 8 bytes. > > - The set of memory context types isn't runtime extensible. We could replace > MemoryContextData->methods with a small integer index into mcxt_methods. I > think that might actually end up being as-cheap or even cheaper than the > current approach. Saves 8 bytes. > > Tthat's sufficient for going to 3 cachelines. > > > - We could store the power of 2 for initBlockSize, nextBlockSize, > maxBlockSize, instead of the "raw" value. That'd make them one byte > each. Which also would get rid of the concerns around needing a > "mini_size_t" type. > > - freeListIndex could be a single byte as well (saving 7 bytes, as right now > we loose 4 trailing bytes due to padding). > > That would save another 12 bytes, if I calculate correctly. 25% shrinkage > together ain't bad. > I don't oppose these changes, but I still don't quite believe it'll make a measurable difference (even if we manage to save a cacheline or two). I'd definitely like to see some measurements demonstrating it's worth the extra complexity. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2023-06-29 11:58:27 +0200, Tomas Vondra wrote: > On 6/29/23 01:34, Andres Freund wrote: > > On 2023-06-28 23:26:00 +0200, Tomas Vondra wrote: > >> Yeah. FWIW I was interested what the patch does in practice, so I > >> checked what pahole says about impact on struct sizes: > >> > >> AllocSetContext 224B -> 208B (4 cachelines) > >> GenerationContext 152B -> 136B (3 cachelines) > >> SlabContext 200B -> 200B (no change, adds 4B hole) > ... > > That would save another 12 bytes, if I calculate correctly. 25% shrinkage > > together ain't bad. > > > > I don't oppose these changes, but I still don't quite believe it'll make > a measurable difference (even if we manage to save a cacheline or two). > I'd definitely like to see some measurements demonstrating it's worth > the extra complexity. I hacked (emphasis on that) a version together that shrinks AllocSetContext down to 176 bytes. There seem to be some minor performance gains, and some not too shabby memory savings. E.g. a backend after running readonly pgbench goes from (results repeat precisely across runs): pgbench: Grand total: 1361528 bytes in 289 blocks; 367480 free (206 chunks); 994048 used to: pgbench: Grand total: 1339000 bytes in 278 blocks; 352352 free (188 chunks); 986648 used Running a total over all connections in the main regression tests gives less of a win (best of three): backends grand blocks free chunks used 690 1046956664 111373 370680728 291436 676275936 to: backends grand blocks free chunks used 690 1045226056 111099 372972120 297969 672253936 the latter is produced with this beauty: ninja && m test --suite setup --no-rebuild && m test --no-rebuild --print-errorlogs regress/regress -v && grep "Grand total"testrun/regress/regress/log/postmaster.log|sed -E -e 's/.*Grand total: (.*) bytes in (.*) blocks; (.*) free \((.*)chunks\); (.*) used/\1\t\2\t\3\t\4\t\5/'|awk '{backends += 1; grand += $1; blocks += $2; free += $3; chunks += $4;used += $5} END{print backends, grand, blocks, free, chunks, used}' There's more to get. The overhead of AllocSetBlock also plays into this. Both due to the keeper block and obviously separate blocks getting allocated subsequently. We e.g. don't need AllocBlockData->next,prev as 8 byte pointers (some trickiness would be required for external blocks, but they could combine both). Greetings, Andres Freund
Hi, Thanks for your comments. Tom Lane <tgl@sss.pgh.pa.us>, 28 Haz 2023 Çar, 13:59 tarihinde şunu yazdı: > > David Rowley <dgrowleyml@gmail.com> writes: > > Perhaps it's ok to leave the context creation functions with Size > > typed parameters and then just Assert the passed-in sizes are not > > larger than 1GB within the context creation function. > > Yes, I'm strongly opposed to not using Size/size_t in the mmgr APIs. > If we go that road, we're going to have a problem when someone > inevitably wants to pass a larger-than-GB value for some context > type. I reverted changes in the context creation functions and only changed the types in structs. I believe there are already lines to assert whether the sizes are less than 1GB, so we should be safe there. Andres Freund <andres@anarazel.de>, 29 Haz 2023 Per, 02:34 tarihinde şunu yazdı: > There are a few other fields that we can get rid of. > > - Afaics AllocSet->keeper is unnecessary these days, as it is always allocated > together with the context itself. Saves 8 bytes. This seemed like a safe change and removed the keeper field in AllocSet and Generation contexts. It saves an additional 8 bytes. Best, -- Melih Mutlu Microsoft
Вложения
On Tue, 11 Jul 2023 at 02:41, Melih Mutlu <m.melihmutlu@gmail.com> wrote: > > - Afaics AllocSet->keeper is unnecessary these days, as it is always allocated > > together with the context itself. Saves 8 bytes. > > This seemed like a safe change and removed the keeper field in > AllocSet and Generation contexts. It saves an additional 8 bytes. Seems like a good idea for an additional 8-bytes. I looked at your v2 patch. The only thing that really looked wrong were the (Size) casts in the context creation functions. These should have been casts to uint32 rather than Size. Basically, all the casts do is say to the compiler "Yes, I know this could cause truncation due to assigning to a size smaller than the source type's size". Some compilers will likely warn without that and the cast will stop them. We know there can't be any truncation due to the Asserts. There's also the fundamental limitation that MemoryChunk can't store block offsets larger than 1GBs anyway, so things will go bad if we tried to have blocks bigger than 1GB. Aside from that, I thought that a couple of other slab.c fields could be shrunken to uint32 as the v2 patch just reduces the size of 1 field which just creates a 4-byte hole in SlabContext. The fullChunkSize field is just the MAXALIGN(chunkSize) + sizeof(MemoryChunk). We should never be using slab contexts for any chunks anywhere near that size. aset.c would be a better context for that, so it seems fine to me to further restrict the maximum supported chunk size by another 8 bytes. I've attached your patch again along with a small delta of what I adjusted. My thoughts on these changes are that it's senseless to have Size typed fields for storing a value that's never larger than 2^30. Getting rid of the keeper pointer seems like a cleanup as it's pretty much a redundant field. For small sized contexts like the ones used for storing index relcache entries, I think it makes sense to save 20 more bytes. Each backend can have many thousand of those and there could be many hundred backends. If we can fit more allocations on that initial 1 kilobyte keeper block without having to allocate any additional blocks, then that's great. I feel that Andres's results showing several hundred fewer block allocations shows this working. Albeit, his patch reduced the size of the structs even further than what v3 does. I think v3 is enough for now as the additional changes Andres mentioned require some more invasive code changes to make work. If nobody objects or has other ideas about this, modulo commit message, I plan to push the attached on Monday. David
Вложения
Hi David,
David Rowley <dgrowleyml@gmail.com>, 13 Tem 2023 Per, 08:04 tarihinde şunu yazdı:
I looked at your v2 patch. The only thing that really looked wrong
were the (Size) casts in the context creation functions. These should
have been casts to uint32 rather than Size. Basically, all the casts
do is say to the compiler "Yes, I know this could cause truncation due
to assigning to a size smaller than the source type's size". Some
compilers will likely warn without that and the cast will stop them.
We know there can't be any truncation due to the Asserts. There's also
the fundamental limitation that MemoryChunk can't store block offsets
larger than 1GBs anyway, so things will go bad if we tried to have
blocks bigger than 1GB.
Right! I don't know why I cast them to Size. Thanks for the fix.
Best,
Melih Mutlu
Microsoft
On Fri, 14 Jul 2023 at 18:53, Melih Mutlu <m.melihmutlu@gmail.com> wrote: > David Rowley <dgrowleyml@gmail.com>, 13 Tem 2023 Per, 08:04 tarihinde şunu yazdı: >> >> I looked at your v2 patch. The only thing that really looked wrong >> were the (Size) casts in the context creation functions. These should >> have been casts to uint32 rather than Size. Basically, all the casts >> do is say to the compiler "Yes, I know this could cause truncation due >> to assigning to a size smaller than the source type's size". Some >> compilers will likely warn without that and the cast will stop them. >> We know there can't be any truncation due to the Asserts. There's also >> the fundamental limitation that MemoryChunk can't store block offsets >> larger than 1GBs anyway, so things will go bad if we tried to have >> blocks bigger than 1GB. > > > Right! I don't know why I cast them to Size. Thanks for the fix. Pushed. David