Обсуждение: Refactor PROCLOCK hash table into partitioned list allocator
Hi hackers, Following up on the Discord discussion about the PROCLOCK hash table being a "weird allocator" that we never actually use for lookups - I took a stab at replacing it with a simpler partitioned free list approach as was suggested. I was doing this mostly to educate myself on Lock Manager internals. The current implementation uses LockMethodProcLockHash purely as an allocator. We never do hash lookups by key; we only allocate entries, link them to the lock's procLocks list, and free them later. Using a full hash table for this adds unnecessary complexity and maybe even overhead (I did not measure this). The attached patch replaces this with: - ProcLockArray: A fixed-size array of all PROCLOCK structs (allocated at startup) - ProcLockFreeList: Partitioned free lists, one per lock partition to reduce contention - ProcLockAlloc/Free: Simple push/pop operations on the free lists - PROCLOCK lookup: Linear traversal of lock->procLocks (see LockRefindAndRelease() and FastPathGetRelationLockEntry()) The last point bothers me most. It seems like this traversals are expected to be short. But I'm not 100% sure. This patch removes the proclock_hash() function and ProcLockHashCode() entirely, and eliminates all hash_search() calls for PROCLOCKs. The allocation/deallocation is now just dlist operations instead of hash table management. Would appreciate your thoughts on this approach. Is this the direction worth working on? Best regards, Andrey Borodin.
Вложения
On 30/12/2025 14:37, Andrey Borodin wrote: > Hi hackers, > > Following up on the Discord discussion about the PROCLOCK hash table being > a "weird allocator" that we never actually use for lookups - I took a stab at > replacing it with a simpler partitioned free list approach as was suggested. > I was doing this mostly to educate myself on Lock Manager internals. > > The current implementation uses LockMethodProcLockHash purely as an allocator. > We never do hash lookups by key; we only allocate entries, link them to the lock's > procLocks list, and free them later. Using a full hash table for this adds > unnecessary complexity and maybe even overhead (I did not measure this). > > The attached patch replaces this with: > - ProcLockArray: A fixed-size array of all PROCLOCK structs (allocated at startup) > - ProcLockFreeList: Partitioned free lists, one per lock partition to reduce contention > - ProcLockAlloc/Free: Simple push/pop operations on the free lists > - PROCLOCK lookup: Linear traversal of lock->procLocks (see LockRefindAndRelease() > and FastPathGetRelationLockEntry()) > > The last point bothers me most. It seems like this traversals are expected to be short. > But I'm not 100% sure. Hmm, yeah the last point contradicts the premise that the hash table is used purely as an allocator. It *is* used for lookups, and you're replacing them with linear scans. That doesn't seem like an improvement. - Heikki
On Tue, 6 Jan 2026 at 15:24, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > On 30/12/2025 14:37, Andrey Borodin wrote: > > Hi hackers, > > > > Following up on the Discord discussion about the PROCLOCK hash table being > > a "weird allocator" that we never actually use for lookups - I took a stab at > > replacing it with a simpler partitioned free list approach as was suggested. > > I was doing this mostly to educate myself on Lock Manager internals. > > > > The current implementation uses LockMethodProcLockHash purely as an allocator. > > We never do hash lookups by key; we only allocate entries, link them to the lock's > > procLocks list, and free them later. Using a full hash table for this adds > > unnecessary complexity and maybe even overhead (I did not measure this). > > > > The attached patch replaces this with: > > - ProcLockArray: A fixed-size array of all PROCLOCK structs (allocated at startup) > > - ProcLockFreeList: Partitioned free lists, one per lock partition to reduce contention > > - ProcLockAlloc/Free: Simple push/pop operations on the free lists > > - PROCLOCK lookup: Linear traversal of lock->procLocks (see LockRefindAndRelease() > > and FastPathGetRelationLockEntry()) > > > > The last point bothers me most. It seems like this traversals are expected to be short. > > But I'm not 100% sure. > > Hmm, yeah the last point contradicts the premise that the hash table is > used purely as an allocator. It *is* used for lookups, and you're > replacing them with linear scans. That doesn't seem like an improvement. There are 2 types of PROCLOCK lookup used in LockRefindAndRelease and FastPathGetRelationLockEntry: - An active backend's PROCLOCK entries (in both LRAR and FPGRLE). - Prepared transaction's PROCLOCK entries (only in LRAR, called from lock_twophase_postcommit). For the backend's PROCLOCK entry lookup, we can use a backend-local hash table, which only keeps track of where this backend's entries are. For prepared transactions, I don't see any code that would indicate more than one lock being released through this code (lock_twophase_postcommit only releases one lock), which to me indicates there is no risk of O(N^2)-related performance sinks. In the case that there are more locks in the 2PC's PROCLOCK list, we could "just" make sure to put the lock we're releasing in transaction wind down at the head of the list; as that would also keep the lookup O(1). Kind regards, Matthias van de Meent Databricks (https://www.databricks.com)
On Tue, 06 Jan 2026 at 16:23, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 30/12/2025 14:37, Andrey Borodin wrote: >> Hi hackers, >> Following up on the Discord discussion about the PROCLOCK hash table >> being >> a "weird allocator" that we never actually use for lookups - I took a stab at >> replacing it with a simpler partitioned free list approach as was suggested. >> I was doing this mostly to educate myself on Lock Manager internals. >> The current implementation uses LockMethodProcLockHash purely as an >> allocator. >> We never do hash lookups by key; we only allocate entries, link them to the lock's >> procLocks list, and free them later. Using a full hash table for this adds >> unnecessary complexity and maybe even overhead (I did not measure this). >> The attached patch replaces this with: >> - ProcLockArray: A fixed-size array of all PROCLOCK structs (allocated at startup) >> - ProcLockFreeList: Partitioned free lists, one per lock partition to reduce contention >> - ProcLockAlloc/Free: Simple push/pop operations on the free lists >> - PROCLOCK lookup: Linear traversal of lock->procLocks (see LockRefindAndRelease() >> and FastPathGetRelationLockEntry()) >> The last point bothers me most. It seems like this traversals are >> expected to be short. >> But I'm not 100% sure. > > Hmm, yeah the last point contradicts the premise that the hash table > is used purely as an allocator. It *is* used for lookups, and you're > replacing them with linear scans. That doesn't seem like an > improvement. > > - Heikki I tested the patch on a Loongson 3C6000/D system with 128 vCPUs using BenchmarkSQL 5.0 (100 warehouses, 100 clients). Here are the results: | | tpmC | tpmTotal | |---------|-----------|-----------| | master | 248199.09 | 551387.46 | | | 243660.35 | 541902.31 | | | 244418.30 | 542867.57 | | patched | 247330.65 | 549949.25 | | | 242953.79 | 539620.65 | | | 237883.19 | 528491.66 | Not sure if this is useful, but throwing it out there. -- Regards, Japin Li ChengDu WenWu Information Technology Co., Ltd.
On 06/01/2026 16:58, Matthias van de Meent wrote: > On Tue, 6 Jan 2026 at 15:24, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> >> On 30/12/2025 14:37, Andrey Borodin wrote: >>> Hi hackers, >>> >>> Following up on the Discord discussion about the PROCLOCK hash table being >>> a "weird allocator" that we never actually use for lookups - I took a stab at >>> replacing it with a simpler partitioned free list approach as was suggested. >>> I was doing this mostly to educate myself on Lock Manager internals. >>> >>> The current implementation uses LockMethodProcLockHash purely as an allocator. >>> We never do hash lookups by key; we only allocate entries, link them to the lock's >>> procLocks list, and free them later. Using a full hash table for this adds >>> unnecessary complexity and maybe even overhead (I did not measure this). >>> >>> The attached patch replaces this with: >>> - ProcLockArray: A fixed-size array of all PROCLOCK structs (allocated at startup) >>> - ProcLockFreeList: Partitioned free lists, one per lock partition to reduce contention >>> - ProcLockAlloc/Free: Simple push/pop operations on the free lists >>> - PROCLOCK lookup: Linear traversal of lock->procLocks (see LockRefindAndRelease() >>> and FastPathGetRelationLockEntry()) >>> >>> The last point bothers me most. It seems like this traversals are expected to be short. >>> But I'm not 100% sure. >> >> Hmm, yeah the last point contradicts the premise that the hash table is >> used purely as an allocator. It *is* used for lookups, and you're >> replacing them with linear scans. That doesn't seem like an improvement. > > There are 2 types of PROCLOCK lookup used in LockRefindAndRelease and > FastPathGetRelationLockEntry: > - An active backend's PROCLOCK entries (in both LRAR and FPGRLE). > - Prepared transaction's PROCLOCK entries (only in LRAR, called from > lock_twophase_postcommit). There are also lookups in SetupLockInTable and in LockRelease. > For the backend's PROCLOCK entry lookup, we can use a backend-local > hash table, which only keeps track of where this backend's entries > are. Hmm, good point. In fact we already have that: there's a pointer to the current process's PROCLOCK entry in LOCALLOCK already. Can we use that to avoid the linear scans? There's this LockAcquireExtended: > /* > * Find or create lock and proclock entries with this tag > * > * Note: if the locallock object already existed, it might have a pointer > * to the lock already ... but we should not assume that that pointer is > * valid, since a lock object with zero hold and request counts can go > * away anytime. So we have to use SetupLockInTable() to recompute the > * lock and proclock pointers, even if they're already set. > */ > proclock = SetupLockInTable(lockMethodTable, MyProc, locktag, > hashcode, lockmode); So that comment suggests that the 'proclock' pointer cannot be trusted here. I don't remember how all this works, so I'm not sure if that is a show-stopper or if we could somehow leverage the 'proclock' pointer in LOCALLOCK anyway. - Heikki