Обсуждение: Refactor PROCLOCK hash table into partitioned list allocator

Поиск

Список

Период

Сортировка

Refactor PROCLOCK hash table into partitioned list allocator

От

Andrey Borodin

Дата:

30 декабря 2025 г., 15:37:21

Hi hackers,

Following up on the Discord discussion about the PROCLOCK hash table being
a "weird allocator" that we never actually use for lookups - I took a stab at
replacing it with a simpler partitioned free list approach as was suggested.
I was doing this mostly to educate myself on Lock Manager internals.

The current implementation uses LockMethodProcLockHash purely as an allocator.
We never do hash lookups by key; we only allocate entries, link them to the lock's
procLocks list, and free them later. Using a full hash table for this adds
unnecessary complexity and maybe even overhead (I did not measure this).

The attached patch replaces this with:
- ProcLockArray: A fixed-size array of all PROCLOCK structs (allocated at startup)
- ProcLockFreeList: Partitioned free lists, one per lock partition to reduce contention
- ProcLockAlloc/Free: Simple push/pop operations on the free lists
- PROCLOCK lookup: Linear traversal of lock->procLocks (see LockRefindAndRelease()
  and FastPathGetRelationLockEntry())

The last point bothers me most. It seems like this traversals are expected to be short.
But I'm not 100% sure.

This patch removes the proclock_hash() function and ProcLockHashCode() entirely, and
eliminates all hash_search() calls for PROCLOCKs. The allocation/deallocation
is now just dlist operations instead of hash table management.

Would appreciate your thoughts on this approach. Is this the direction worth working on?


Best regards, Andrey Borodin.

Вложения

v1-0001-Replace-PROCLOCK-hash-table-with-partitioned-free.patch

Re: Refactor PROCLOCK hash table into partitioned list allocator

От

Heikki Linnakangas

Дата:

06 января, 17:23:43

On 30/12/2025 14:37, Andrey Borodin wrote:
> Hi hackers,
> 
> Following up on the Discord discussion about the PROCLOCK hash table being
> a "weird allocator" that we never actually use for lookups - I took a stab at
> replacing it with a simpler partitioned free list approach as was suggested.
> I was doing this mostly to educate myself on Lock Manager internals.
> 
> The current implementation uses LockMethodProcLockHash purely as an allocator.
> We never do hash lookups by key; we only allocate entries, link them to the lock's
> procLocks list, and free them later. Using a full hash table for this adds
> unnecessary complexity and maybe even overhead (I did not measure this).
> 
> The attached patch replaces this with:
> - ProcLockArray: A fixed-size array of all PROCLOCK structs (allocated at startup)
> - ProcLockFreeList: Partitioned free lists, one per lock partition to reduce contention
> - ProcLockAlloc/Free: Simple push/pop operations on the free lists
> - PROCLOCK lookup: Linear traversal of lock->procLocks (see LockRefindAndRelease()
>    and FastPathGetRelationLockEntry())
> 
> The last point bothers me most. It seems like this traversals are expected to be short.
> But I'm not 100% sure.

Hmm, yeah the last point contradicts the premise that the hash table is 
used purely as an allocator. It *is* used for lookups, and you're 
replacing them with linear scans. That doesn't seem like an improvement.

- Heikki

Re: Refactor PROCLOCK hash table into partitioned list allocator

От

Matthias van de Meent

Дата:

06 января, 17:58:05

On Tue, 6 Jan 2026 at 15:24, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> On 30/12/2025 14:37, Andrey Borodin wrote:
> > Hi hackers,
> >
> > Following up on the Discord discussion about the PROCLOCK hash table being
> > a "weird allocator" that we never actually use for lookups - I took a stab at
> > replacing it with a simpler partitioned free list approach as was suggested.
> > I was doing this mostly to educate myself on Lock Manager internals.
> >
> > The current implementation uses LockMethodProcLockHash purely as an allocator.
> > We never do hash lookups by key; we only allocate entries, link them to the lock's
> > procLocks list, and free them later. Using a full hash table for this adds
> > unnecessary complexity and maybe even overhead (I did not measure this).
> >
> > The attached patch replaces this with:
> > - ProcLockArray: A fixed-size array of all PROCLOCK structs (allocated at startup)
> > - ProcLockFreeList: Partitioned free lists, one per lock partition to reduce contention
> > - ProcLockAlloc/Free: Simple push/pop operations on the free lists
> > - PROCLOCK lookup: Linear traversal of lock->procLocks (see LockRefindAndRelease()
> >    and FastPathGetRelationLockEntry())
> >
> > The last point bothers me most. It seems like this traversals are expected to be short.
> > But I'm not 100% sure.
>
> Hmm, yeah the last point contradicts the premise that the hash table is
> used purely as an allocator. It *is* used for lookups, and you're
> replacing them with linear scans. That doesn't seem like an improvement.

There are 2 types of PROCLOCK lookup used in LockRefindAndRelease and
FastPathGetRelationLockEntry:
- An active backend's PROCLOCK entries (in both LRAR and FPGRLE).
- Prepared transaction's PROCLOCK entries (only in LRAR, called from
lock_twophase_postcommit).

For the backend's PROCLOCK entry lookup, we can use a backend-local
hash table, which only keeps track of where this backend's entries
are.

For prepared transactions, I don't see any code that would indicate
more than one lock being released through this code
(lock_twophase_postcommit only releases one lock), which to me
indicates there is no risk of O(N^2)-related performance sinks. In the
case that there are more locks in the 2PC's PROCLOCK list, we could
"just" make sure to put the lock we're releasing in transaction wind
down at the head of the list; as that would also keep the lookup O(1).

Kind regards,

Matthias van de Meent
Databricks (https://www.databricks.com)

Re: Refactor PROCLOCK hash table into partitioned list allocator

От

Japin Li

Дата:

06 января, 18:07:45

On Tue, 06 Jan 2026 at 16:23, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On 30/12/2025 14:37, Andrey Borodin wrote:
>> Hi hackers,
>> Following up on the Discord discussion about the PROCLOCK hash table
>> being
>> a "weird allocator" that we never actually use for lookups - I took a stab at
>> replacing it with a simpler partitioned free list approach as was suggested.
>> I was doing this mostly to educate myself on Lock Manager internals.
>> The current implementation uses LockMethodProcLockHash purely as an
>> allocator.
>> We never do hash lookups by key; we only allocate entries, link them to the lock's
>> procLocks list, and free them later. Using a full hash table for this adds
>> unnecessary complexity and maybe even overhead (I did not measure this).
>> The attached patch replaces this with:
>> - ProcLockArray: A fixed-size array of all PROCLOCK structs (allocated at startup)
>> - ProcLockFreeList: Partitioned free lists, one per lock partition to reduce contention
>> - ProcLockAlloc/Free: Simple push/pop operations on the free lists
>> - PROCLOCK lookup: Linear traversal of lock->procLocks (see LockRefindAndRelease()
>>    and FastPathGetRelationLockEntry())
>> The last point bothers me most. It seems like this traversals are
>> expected to be short.
>> But I'm not 100% sure.
>
> Hmm, yeah the last point contradicts the premise that the hash table
> is used purely as an allocator. It *is* used for lookups, and you're
> replacing them with linear scans. That doesn't seem like an
> improvement.
>
> - Heikki

I tested the patch on a Loongson 3C6000/D system with 128 vCPUs using
BenchmarkSQL 5.0 (100 warehouses, 100 clients).

Here are the results:

|         | tpmC      | tpmTotal  |
|---------|-----------|-----------|
| master  | 248199.09 | 551387.46 |
|         | 243660.35 | 541902.31 |
|         | 244418.30 | 542867.57 |
| patched | 247330.65 | 549949.25 |
|         | 242953.79 | 539620.65 |
|         | 237883.19 | 528491.66 |

Not sure if this is useful, but throwing it out there.

-- 
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

Re: Refactor PROCLOCK hash table into partitioned list allocator

От

Heikki Linnakangas

Дата:

06 января, 18:17:13

On 06/01/2026 16:58, Matthias van de Meent wrote:
> On Tue, 6 Jan 2026 at 15:24, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>
>> On 30/12/2025 14:37, Andrey Borodin wrote:
>>> Hi hackers,
>>>
>>> Following up on the Discord discussion about the PROCLOCK hash table being
>>> a "weird allocator" that we never actually use for lookups - I took a stab at
>>> replacing it with a simpler partitioned free list approach as was suggested.
>>> I was doing this mostly to educate myself on Lock Manager internals.
>>>
>>> The current implementation uses LockMethodProcLockHash purely as an allocator.
>>> We never do hash lookups by key; we only allocate entries, link them to the lock's
>>> procLocks list, and free them later. Using a full hash table for this adds
>>> unnecessary complexity and maybe even overhead (I did not measure this).
>>>
>>> The attached patch replaces this with:
>>> - ProcLockArray: A fixed-size array of all PROCLOCK structs (allocated at startup)
>>> - ProcLockFreeList: Partitioned free lists, one per lock partition to reduce contention
>>> - ProcLockAlloc/Free: Simple push/pop operations on the free lists
>>> - PROCLOCK lookup: Linear traversal of lock->procLocks (see LockRefindAndRelease()
>>>     and FastPathGetRelationLockEntry())
>>>
>>> The last point bothers me most. It seems like this traversals are expected to be short.
>>> But I'm not 100% sure.
>>
>> Hmm, yeah the last point contradicts the premise that the hash table is
>> used purely as an allocator. It *is* used for lookups, and you're
>> replacing them with linear scans. That doesn't seem like an improvement.
> 
> There are 2 types of PROCLOCK lookup used in LockRefindAndRelease and
> FastPathGetRelationLockEntry:
> - An active backend's PROCLOCK entries (in both LRAR and FPGRLE).
> - Prepared transaction's PROCLOCK entries (only in LRAR, called from
> lock_twophase_postcommit).

There are also lookups in SetupLockInTable and in LockRelease.

> For the backend's PROCLOCK entry lookup, we can use a backend-local
> hash table, which only keeps track of where this backend's entries
> are.

Hmm, good point. In fact we already have that: there's a pointer to the 
current process's PROCLOCK entry in LOCALLOCK already. Can we use that 
to avoid the linear scans? There's this LockAcquireExtended:

>     /*
>      * Find or create lock and proclock entries with this tag
>      *
>      * Note: if the locallock object already existed, it might have a pointer
>      * to the lock already ... but we should not assume that that pointer is
>      * valid, since a lock object with zero hold and request counts can go
>      * away anytime.  So we have to use SetupLockInTable() to recompute the
>      * lock and proclock pointers, even if they're already set.
>      */
>     proclock = SetupLockInTable(lockMethodTable, MyProc, locktag,
>                                 hashcode, lockmode);

So that comment suggests that the 'proclock' pointer cannot be trusted 
here. I don't remember how all this works, so I'm not sure if that is a 
show-stopper or if we could somehow leverage the 'proclock' pointer in 
LOCALLOCK anyway.

- Heikki

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Refactor PROCLOCK hash table into partitioned list allocator

Refactor PROCLOCK hash table into partitioned list allocator

Вложения

Re: Refactor PROCLOCK hash table into partitioned list allocator

Re: Refactor PROCLOCK hash table into partitioned list allocator

Re: Refactor PROCLOCK hash table into partitioned list allocator

Re: Refactor PROCLOCK hash table into partitioned list allocator