Re: Changing shared_buffers without restart
От | Ashutosh Bapat |
---|---|
Тема | Re: Changing shared_buffers without restart |
Дата | |
Msg-id | CAExHW5vB8sAmDtkEN5dcYYeBok3D8eAzMFCOH1k+krxht1yFjA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Changing shared_buffers without restart (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>) |
Ответы |
Re: Changing shared_buffers without restart
Re: Changing shared_buffers without restart |
Список | pgsql-hackers |
On Mon, Jun 16, 2025 at 6:09 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > > > > Buffer lookup table resizing > > ------------------------------------ I looked at the interaction of shared buffer lookup table with buffer resizing as per the patches in [0]. Here's list of my findings, issues and fixes. 1. The basic structure of buffer lookup table (directory and control area etc.) is allocated in a shared memory segment dedicated to the buffer lookup table. However, the entries are allocated in the shared memory using ShmemAllocNoError() which allocates the entries in the main memory segment. In order for ShmemAllocNoError() to allocate entries in the dedicated shared memory segment, it should know the shared memory segment. We could do that by setting the segment number in element_alloc() before calling hashp->alloc(). This is similar to how ShmemAllocNoError() knows the memory context in which to allocate the entries on heap. But read on ... 2. When the buffer pool is expanded, an "out of shared memory" error is thrown when more entries are added to the buffer look up table. We could temporarily adjust that flag and allocate more entries. But the directory also needs to be expanded proportionately otherwise it may lead to more contention. Expanding directory is non-trivial since it's a contiguous chunk of memory, followed by other data structures. Further, expanding directory would require rehashing all the existing entries, which may impact the time taken by the resizing operation and how long other backends remain blocked. 3. When the buffer pool is shrunk, there is no way to free the extra entries in such a way that a contiguous chunk of shared memory can be given back to the OS. In case we implement it, we will need some way to compact the shrunk entries in contiguous chunk of memory and unmap remaining chunk. That's some significant code. Given these things, I think we should set up the buffer lookup table to hold maximum entries required to expand the buffer pool to its maximum, right at the beginning. The maximum size to which buffer pool can grow is given by GUC max_available_memory (which is a misnomer and should be renamed to max_shared_buffers or something), introduced by previous set of patches [0]. We don't shrink or expand the buffer lookup table as we shrink and expand the buffer pool. With that the buffer lookup table can be located in the main memory segment itself and we don't have to fix ShmemAllocNoError(). This has two side effects: 1. larger hash table makes hash table operations slower [2]. Its impact on actual queries needs to be studied. 2. There's increase in the total shared memory allocated upfront. Currently we allocate 150MB memory with all default GUC values. With this change we will allocate 250MB memory since max_available_memory (or rather max_shared_buffers) defaults to allow 524288 shared buffers. If we make max_shared_buffers to default to shared_buffers, it won't be a problem. However, when a user sets max_shared_buffers themselves, they have to be conscious of the fact that it will allocate more memory than necessary with given shared_buffers value. This fix is part of patch 0015. The patchset contains more fixes and improvements as described below. Per TODO in the prologue of CalculateShmemSize(), more than necessary shared memory was mapped and allocated in the buffer manager related memory segments because of an error in that function; the amount of memory to be allocated in the main shared memory segment was added to every other shared memory segment. Thus shrinking those memory segments didn't actually affect the objects allocated in those. Because of that, we were not seeing SIGBUS even when the objects supposedly shrunk were accessed, masking bugs in the patches. In this patchset I have a working fix for CalculateShmemSize(). With that fix in place we see server crashing with SIGBUS in some resizing operations. Those cases need to be investigated. The fix changes its minions to a. return size of shared memory objects to be allocated in the main memory segment and b. add sizes of the shared memory objects to be allocated in other memory segments in the respective AnonymousMapping structures. This assymetry between main segment and other segment exists so as not to change a lot the minions of CalculateShmemSize(). But I think we should eliminate the assymetry and change every minion to add sizes in the respective segment's AnonymousMapping structure. The patch proposed at [3] would simplify CalculateShmemSize() which should help eliminating the assymetry. Along with refactoring CalculateShmemSize() I have added small fixes to update the total size and end address of shared memory mapping after resizing them and also to update the new allocated_sizes of resized structures in ShmemIndex entry. Patch 0009 includes these changes. I found that the shared memory resizing synchronization is triggered even before setting up the shared buffers the first time after starting the server. That's not required and also can lead to issues because of trying to resize shared buffers which do not exist. A WIP fix is included as patch 0012. A TODO in the patch needs to be addressed. It should be squashed into an earlier patch 0011 when appropriate. While debugging the above mentioned issues, I found it useful to have an insight into the contents of buffer lookup table. Hence I added a system view exposing the contents of the buffer lookup table. This is added as patch 0001 in the attached patchset. I think it's useful to have this independent of this patchset to investigate inconsistencies between the contents of shared buffer pool and buffer lookup table. Again for debugging purposes, I have added a new column "segment" in pg_shmem_allocations reporting the shared memory segment in which the given allocation has happened. I have also added another view pg_shmem_segments to provide information about the shared memory segments. This view definition will change as we design shared memory mappings and shared memory segments better. So it's WIP and needs doc changes as well. I have included it in the patchset as patch 0011 since it will be helpful to debug issues found in the patch when testing. The patch should be merged into patch 0007. Last but not the least, patch 0016 contains two tests a. stress test to run buffer resizing while pgbench is running, b. a SQL test to test the sizes of segments and shared memory allocations after resizing. The stress test polls "show shared_buffers" output to know when the resizing is finished. I think we need a better interface to know when resizing has finished. Thanks a lot my colleague Palak Chaturvedi for providing initial draft of the test case. The patches are rebased on top of the latest master, which includes changes to remove free buffer list. That led to removing all the code in these patches dealing with free buffer list. I am intentionally keeping my changes (patches 0001, 0008 to 0012, 0012 to 0016) separate from Dmitry's changes so that Dmitry can review them easily. The patches are arranged so that my patches are nearer to Dmitry's patches, into which, they should be squashed. Dmitry, I found that max_available_memory is PGC_SIGHUP. Is that intentional? I thought it's PGC_POSTMASTER since we can not reserve more address space without restarting postmaster. Left a TODO for this. I think we also need to change the name and description to better reflect its actual functionality. [0] https://www.postgresql.org/message-id/my4hukmejato53ef465ev7lk3sqiqvneh7436rz64wmtc7rbfj@hmuxsf2ngov2 [1] https://www.postgresql.org/message-id/CAExHW5v0jh3F_wj86yC%3DqBfWk0uiT94qy%3DZ41uzAHLHh0SerRA%40mail.gmail.com [2] https://ashutoshpg.blogspot.com/2025/07/efficiency-of-sparse-hash-table.html [3] https://commitfest.postgresql.org/patch/5997/ -- Best Wishes, Ashutosh Bapat
Вложения
- 0004-Introduce-pss_barrierReceivedGeneration-20250918.patch
- 0002-Process-config-reload-in-AIO-workers-20250918.patch
- 0003-Introduce-pending-flag-for-GUC-assign-hooks-20250918.patch
- 0005-Allow-to-use-multiple-shared-memory-mapping-20250918.patch
- 0001-Add-system-view-for-shared-buffer-lookup-ta-20250918.patch
- 0006-Address-space-reservation-for-shared-memory-20250918.patch
- 0008-Fix-compilation-failures-from-previous-comm-20250918.patch
- 0007-Introduce-multiple-shmem-segments-for-share-20250918.patch
- 0009-Refactor-CalculateShmemSize-20250918.patch
- 0010-WIP-Monitoring-views-20250918.patch
- 0013-Update-sizes-and-addresses-of-shared-memory-20250918.patch
- 0011-Allow-to-resize-shared-memory-without-resta-20250918.patch
- 0012-Initial-value-of-shared_buffers-or-NBuffers-20250918.patch
- 0014-Support-shrinking-shared-buffers-20250918.patch
- 0015-Reinitialize-StrategyControl-after-resizing-20250918.patch
- 0016-Tests-for-dynamic-shared_buffers-resizing-20250918.patch
В списке pgsql-hackers по дате отправления: