Обсуждение: Re: [RFC] Lock-free XLog Reservation from WAL

Поиск
Список
Период
Сортировка

Re: [RFC] Lock-free XLog Reservation from WAL

От
Japin Li
Дата:
On Sun, 19 Jan 2025 at 17:56, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
> 17.01.2025 17:00, Zhou, Zhiguo пишет:
>> On 1/16/2025 10:00 PM, Yura Sokolov wrote:
>>>
>>> Good day, Zhiguo.
>>>
>>> Excuse me, I feel sneaky a bit, but I've started another thread
>>> just about increase of NUM_XLOGINSERT_LOCK, because I can measure
>>> its effect even on my working notebook (it is another one: Ryzen
>>> 5825U limited to @2GHz).
>>>
>>> http://postgr.es/m/flat/3b11fdc2-9793-403d-
>>> b3d4-67ff9a00d447%40postgrespro.ru
>>>
>>> -----
>>>
>>> regards
>>> Yura Sokolov aka funny-falcon
>>>
>>>
>> Good day, Yura!
>> Thank you for keeping me informed. I appreciate your proactive
>> approach and understand the importance of exploring different angles
>> for optimization. Your patch is indeed fundamental to our ongoing
>> work on the lock-free xlog reservation, and I'm eager to see how it
>> can further enhance our efforts.
>> I will proceed to test the performance impact of your latest patch
>> when combined with the lock-free xlog reservation patch. This will
>> help us determine if there's potential for additional
>> optimization. Concurrently, with your permission, I'll try to refine
>> the hash-table- based implementation for your further review. WDYT?
>>
>
> Good day, Zhiguo
>
> Here's version of "hash-table reservation" with both 32bit and 64bit
> operations (depending on PG_HAVE_ATOMIC_U64_SIMULATION, or may be
> switched by hand).
>
> 64bit version uses other protocol with a bit lesser atomic
> operations. I suppose it could be a bit faster. But I can't prove it
> now.
>
> btw, you wrote:
>
>>> Major issue:
>>>     - `SetPrevRecPtr` and `GetPrevRecPtr` do non-atomic write/read
>          with on
>>>     platforms where MAXALIGN != 8 or without native 64
>        load/store. Branch
>>>     with 'memcpy` is rather obvious, but even pointer de-referencing on
>>>     "lucky case" is not safe either.
>>>
>>>     I have no idea how to fix it at the moment.
>>>
>>
>> Indeed, non-atomic write/read operations can lead to safety issues in
>> some situations. My initial thought is to define a bit near the
>> prev-link to flag the completion of the update. In this way, we could
>> allow non-atomic or even discontinuous write/read operations on the
>> prev-link, while simultaneously guaranteeing its atomicity through
>> atomic operations (as well as memory barriers) on the flag bit. What
>> do you think of this as a viable solution?
>
> There is a way to order operations:
> - since SetPrevRecPtr stores start of record as LSN, its lower 32bits
>   are certainly non-zero (record could not start at the beginning of a
>  page).
> - so SetPrevRecPtr should write high 32bits, issue write barrier, and
>   then write lower 32bits,
> - and then GetPrevRecPtr should first read lower 32bits, and if it is
>   not zero, then issue read barrier and read upper 32bits.
>
> This way you will always read correct prev-rec-ptr on platform without
> 64bit atomics. (because MAXALING >= 4 and PostgreSQL requires 4 byte
> atomicity for several years).
>

Hi, Yura Sokolov

Thanks for updating the patch.
I test the v2 patch using BenchmarkSQL 1000 warehouse, and here is the tpmC
result:

 case               | min        | avg        | max
--------------------+------------+------------+--------------
master (patched)    | 988,461.89 | 994,916.50 | 1,000,362.40
master (44b61efb79) | 857,028.07 | 863,174.59 | 873,856.92

The patch provides a significant improvement.

I just looked through the patch, here are some comments.

1.
The v2 patch can't be applied cleanly.

Applying: Lock-free XLog Reservation using lock-free hash-table
.git/rebase-apply/patch:33: trailing whitespace.

.git/rebase-apply/patch:37: space before tab in indent.
        {
.git/rebase-apply/patch:38: space before tab in indent.
                int                     i;
.git/rebase-apply/patch:39: trailing whitespace.

.git/rebase-apply/patch:46: space before tab in indent.
                        LWLockReleaseClearVar(&WALInsertLocks[i].l.lock,
warning: squelched 4 whitespace errors
warning: 9 lines add whitespace errors.

2.
And there is a typo:

+     * PrevLinksHash is a lock-free hash table based on Cuckoo algorith. It is
+     * mostly 4 way: for every element computed two positions h1, h2, and

s/algorith/algorithm/g

--
Regrads,
Japin Li



Re: [RFC] Lock-free XLog Reservation from WAL

От
Yura Sokolov
Дата:
22.01.2025 09:09, Japin Li пишет:
> On Sun, 19 Jan 2025 at 17:56, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
>> 17.01.2025 17:00, Zhou, Zhiguo пишет:
>>> On 1/16/2025 10:00 PM, Yura Sokolov wrote:
>>>>
>>>> Good day, Zhiguo.
>>>>
>>>> Excuse me, I feel sneaky a bit, but I've started another thread
>>>> just about increase of NUM_XLOGINSERT_LOCK, because I can measure
>>>> its effect even on my working notebook (it is another one: Ryzen
>>>> 5825U limited to @2GHz).
>>>>
>>>> http://postgr.es/m/flat/3b11fdc2-9793-403d-
>>>> b3d4-67ff9a00d447%40postgrespro.ru
>>>>
>>>> -----
>>>>
>>>> regards
>>>> Yura Sokolov aka funny-falcon
>>>>
>>>>
>>> Good day, Yura!
>>> Thank you for keeping me informed. I appreciate your proactive
>>> approach and understand the importance of exploring different angles
>>> for optimization. Your patch is indeed fundamental to our ongoing
>>> work on the lock-free xlog reservation, and I'm eager to see how it
>>> can further enhance our efforts.
>>> I will proceed to test the performance impact of your latest patch
>>> when combined with the lock-free xlog reservation patch. This will
>>> help us determine if there's potential for additional
>>> optimization. Concurrently, with your permission, I'll try to refine
>>> the hash-table- based implementation for your further review. WDYT?
>>>
>>
>> Good day, Zhiguo
>>
>> Here's version of "hash-table reservation" with both 32bit and 64bit
>> operations (depending on PG_HAVE_ATOMIC_U64_SIMULATION, or may be
>> switched by hand).
>>
>> 64bit version uses other protocol with a bit lesser atomic
>> operations. I suppose it could be a bit faster. But I can't prove it
>> now.
>>
>> btw, you wrote:
>>
>>>> Major issue:
>>>>      - `SetPrevRecPtr` and `GetPrevRecPtr` do non-atomic write/read
>>           with on
>>>>      platforms where MAXALIGN != 8 or without native 64
>>         load/store. Branch
>>>>      with 'memcpy` is rather obvious, but even pointer de-referencing on
>>>>      "lucky case" is not safe either.
>>>>
>>>>      I have no idea how to fix it at the moment.
>>>>
>>>
>>> Indeed, non-atomic write/read operations can lead to safety issues in
>>> some situations. My initial thought is to define a bit near the
>>> prev-link to flag the completion of the update. In this way, we could
>>> allow non-atomic or even discontinuous write/read operations on the
>>> prev-link, while simultaneously guaranteeing its atomicity through
>>> atomic operations (as well as memory barriers) on the flag bit. What
>>> do you think of this as a viable solution?
>>
>> There is a way to order operations:
>> - since SetPrevRecPtr stores start of record as LSN, its lower 32bits
>>    are certainly non-zero (record could not start at the beginning of a
>>   page).
>> - so SetPrevRecPtr should write high 32bits, issue write barrier, and
>>    then write lower 32bits,
>> - and then GetPrevRecPtr should first read lower 32bits, and if it is
>>    not zero, then issue read barrier and read upper 32bits.
>>
>> This way you will always read correct prev-rec-ptr on platform without
>> 64bit atomics. (because MAXALING >= 4 and PostgreSQL requires 4 byte
>> atomicity for several years).
>>
> 
> Hi, Yura Sokolov
> 
> Thanks for updating the patch.
> I test the v2 patch using BenchmarkSQL 1000 warehouse, and here is the tpmC
> result:
> 
>   case               | min        | avg        | max
> --------------------+------------+------------+--------------
> master (patched)    | 988,461.89 | 994,916.50 | 1,000,362.40
> master (44b61efb79) | 857,028.07 | 863,174.59 | 873,856.92
> 
> The patch provides a significant improvement.
> 
> I just looked through the patch, here are some comments.
> 
> 1.
> The v2 patch can't be applied cleanly.
> 
> Applying: Lock-free XLog Reservation using lock-free hash-table
> .git/rebase-apply/patch:33: trailing whitespace.
> 
> .git/rebase-apply/patch:37: space before tab in indent.
>          {
> .git/rebase-apply/patch:38: space before tab in indent.
>                  int                     i;
> .git/rebase-apply/patch:39: trailing whitespace.
> 
> .git/rebase-apply/patch:46: space before tab in indent.
>                          LWLockReleaseClearVar(&WALInsertLocks[i].l.lock,
> warning: squelched 4 whitespace errors
> warning: 9 lines add whitespace errors.
> 
> 2.
> And there is a typo:
> 
> +     * PrevLinksHash is a lock-free hash table based on Cuckoo algorith. It is
> +     * mostly 4 way: for every element computed two positions h1, h2, and
> 
> s/algorith/algorithm/g

Hi, Japin

Thank you a lot for measuring and comments.

May I ask you to compare not only against master, but against straight 
increase of NUM_XLOGINSERT_LOCKS to 128 as well?
This way the profit from added complexity will be more obvious: does it 
pay for self or not.

-------

regards
Yura



Re: [RFC] Lock-free XLog Reservation from WAL

От
Japin Li
Дата:
On Wed, 22 Jan 2025 at 10:25, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
> 22.01.2025 09:09, Japin Li пишет:
>> Hi, Yura Sokolov
>> Thanks for updating the patch.
>> I test the v2 patch using BenchmarkSQL 1000 warehouse, and here is the tpmC
>> result:
>>   case               | min        | avg        | max
>> --------------------+------------+------------+--------------
>> master (patched)    | 988,461.89 | 994,916.50 | 1,000,362.40
>> master (44b61efb79) | 857,028.07 | 863,174.59 | 873,856.92
>> The patch provides a significant improvement.
>> I just looked through the patch, here are some comments.
>> 1.
>> The v2 patch can't be applied cleanly.
>> Applying: Lock-free XLog Reservation using lock-free hash-table
>> .git/rebase-apply/patch:33: trailing whitespace.
>> .git/rebase-apply/patch:37: space before tab in indent.
>>          {
>> .git/rebase-apply/patch:38: space before tab in indent.
>>                  int                     i;
>> .git/rebase-apply/patch:39: trailing whitespace.
>> .git/rebase-apply/patch:46: space before tab in indent.
>>                          LWLockReleaseClearVar(&WALInsertLocks[i].l.lock,
>> warning: squelched 4 whitespace errors
>> warning: 9 lines add whitespace errors.
>> 2.
>> And there is a typo:
>> +     * PrevLinksHash is a lock-free hash table based on Cuckoo
>> algorith. It is
>> +     * mostly 4 way: for every element computed two positions h1, h2, and
>> s/algorith/algorithm/g
>
> Hi, Japin
>
> Thank you a lot for measuring and comments.
>
> May I ask you to compare not only against master, but against straight
> increase of NUM_XLOGINSERT_LOCKS to 128 as well?
> This way the profit from added complexity will be more obvious: does
> it pay for self or not.

The above test already increases NUM_XLOGINSERT_LOCKS to 64; I will try 128
and update the result later.

--
Regrads,
Japin Li



Re: [RFC] Lock-free XLog Reservation from WAL

От
Yura Sokolov
Дата:
22.01.2025 10:54, Japin Li пишет:
> On Wed, 22 Jan 2025 at 10:25, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
>> 22.01.2025 09:09, Japin Li пишет:
>>> Hi, Yura Sokolov
>>> Thanks for updating the patch.
>>> I test the v2 patch using BenchmarkSQL 1000 warehouse, and here is the tpmC
>>> result:
>>>    case               | min        | avg        | max
>>> --------------------+------------+------------+--------------
>>> master (patched)    | 988,461.89 | 994,916.50 | 1,000,362.40
>>> master (44b61efb79) | 857,028.07 | 863,174.59 | 873,856.92
>>> The patch provides a significant improvement.
>>> I just looked through the patch, here are some comments.
>>> 1.
>>> The v2 patch can't be applied cleanly.
>>> Applying: Lock-free XLog Reservation using lock-free hash-table
>>> .git/rebase-apply/patch:33: trailing whitespace.
>>> .git/rebase-apply/patch:37: space before tab in indent.
>>>           {
>>> .git/rebase-apply/patch:38: space before tab in indent.
>>>                   int                     i;
>>> .git/rebase-apply/patch:39: trailing whitespace.
>>> .git/rebase-apply/patch:46: space before tab in indent.
>>>                           LWLockReleaseClearVar(&WALInsertLocks[i].l.lock,
>>> warning: squelched 4 whitespace errors
>>> warning: 9 lines add whitespace errors.
>>> 2.
>>> And there is a typo:
>>> +     * PrevLinksHash is a lock-free hash table based on Cuckoo
>>> algorith. It is
>>> +     * mostly 4 way: for every element computed two positions h1, h2, and
>>> s/algorith/algorithm/g
>>
>> Hi, Japin
>>
>> Thank you a lot for measuring and comments.
>>
>> May I ask you to compare not only against master, but against straight
>> increase of NUM_XLOGINSERT_LOCKS to 128 as well?
>> This way the profit from added complexity will be more obvious: does
>> it pay for self or not.
> 
> The above test already increases NUM_XLOGINSERT_LOCKS to 64;

Ok, that is good.
Did you just increased number of locks, or applied 
"several-attempts-to-lock"
from [1] as well? It will be interesting how it affects performance in this
case. And it is orthogonal to "lock-free reservation", so they could 
applied simultaneously.

> I will try 128 and update the result later.

Thank you.

[1] 
https://www.postgresql.org/message-id/3b11fdc2-9793-403d-b3d4-67ff9a00d447%40postgrespro.ru

------
regards
Yura



Re: [RFC] Lock-free XLog Reservation from WAL

От
Japin Li
Дата:
On Wed, 22 Jan 2025 at 11:22, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
> 22.01.2025 10:54, Japin Li пишет:
>> On Wed, 22 Jan 2025 at 10:25, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
>>> 22.01.2025 09:09, Japin Li пишет:
>>>> Hi, Yura Sokolov
>>>> Thanks for updating the patch.
>>>> I test the v2 patch using BenchmarkSQL 1000 warehouse, and here is the tpmC
>>>> result:
>>>>    case               | min        | avg        | max
>>>> --------------------+------------+------------+--------------
>>>> master (patched)    | 988,461.89 | 994,916.50 | 1,000,362.40
>>>> master (44b61efb79) | 857,028.07 | 863,174.59 | 873,856.92
>>>> The patch provides a significant improvement.
>>>> I just looked through the patch, here are some comments.
>>>> 1.
>>>> The v2 patch can't be applied cleanly.
>>>> Applying: Lock-free XLog Reservation using lock-free hash-table
>>>> .git/rebase-apply/patch:33: trailing whitespace.
>>>> .git/rebase-apply/patch:37: space before tab in indent.
>>>>           {
>>>> .git/rebase-apply/patch:38: space before tab in indent.
>>>>                   int                     i;
>>>> .git/rebase-apply/patch:39: trailing whitespace.
>>>> .git/rebase-apply/patch:46: space before tab in indent.
>>>>                           LWLockReleaseClearVar(&WALInsertLocks[i].l.lock,
>>>> warning: squelched 4 whitespace errors
>>>> warning: 9 lines add whitespace errors.
>>>> 2.
>>>> And there is a typo:
>>>> +     * PrevLinksHash is a lock-free hash table based on Cuckoo
>>>> algorith. It is
>>>> +     * mostly 4 way: for every element computed two positions h1, h2, and
>>>> s/algorith/algorithm/g
>>>
>>> Hi, Japin
>>>
>>> Thank you a lot for measuring and comments.
>>>
>>> May I ask you to compare not only against master, but against straight
>>> increase of NUM_XLOGINSERT_LOCKS to 128 as well?
>>> This way the profit from added complexity will be more obvious: does
>>> it pay for self or not.
>> The above test already increases NUM_XLOGINSERT_LOCKS to 64;
>
> Ok, that is good.
> Did you just increased number of locks, or applied
> "several-attempts-to-lock"
> from [1] as well? It will be interesting how it affects performance in this
> case. And it is orthogonal to "lock-free reservation", so they could
> applied simultaneously.

I apply the following two patches:

1. Lock-free XLog Reservation using lock-free hash-table
2. Increase NUM_XLOGINSERT_LOCKS to 64

I noticed the patch from the [1]. However, I haven't tested it independently.

--
Regrads,
Japin Li



Re: [RFC] Lock-free XLog Reservation from WAL

От
Yura Sokolov
Дата:
22.01.2025 10:54, Japin Li wrote:
> On Wed, 22 Jan 2025 at 10:25, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
>> 22.01.2025 09:09, Japin Li пишет:
>>> Hi, Yura Sokolov
>>> Thanks for updating the patch.
>>> I test the v2 patch using BenchmarkSQL 1000 warehouse, and here is the tpmC
>>> result:
>>>    case               | min        | avg        | max
>>> --------------------+------------+------------+--------------
>>> master (patched)    | 988,461.89 | 994,916.50 | 1,000,362.40
>>> master (44b61efb79) | 857,028.07 | 863,174.59 | 873,856.92
>>> The patch provides a significant improvement.
>>> I just looked through the patch, here are some comments.
>>> 1.
>>> The v2 patch can't be applied cleanly.
>>> Applying: Lock-free XLog Reservation using lock-free hash-table
>>> .git/rebase-apply/patch:33: trailing whitespace.
>>> .git/rebase-apply/patch:37: space before tab in indent.
>>>           {
>>> .git/rebase-apply/patch:38: space before tab in indent.
>>>                   int                     i;
>>> .git/rebase-apply/patch:39: trailing whitespace.
>>> .git/rebase-apply/patch:46: space before tab in indent.
>>>                           LWLockReleaseClearVar(&WALInsertLocks[i].l.lock,
>>> warning: squelched 4 whitespace errors
>>> warning: 9 lines add whitespace errors.
>>> 2.
>>> And there is a typo:
>>> +     * PrevLinksHash is a lock-free hash table based on Cuckoo
>>> algorith. It is
>>> +     * mostly 4 way: for every element computed two positions h1, h2, and
>>> s/algorith/algorithm/g
>>
>> Hi, Japin
>>
>> Thank you a lot for measuring and comments.
>>
>> May I ask you to compare not only against master, but against straight
>> increase of NUM_XLOGINSERT_LOCKS to 128 as well?
>> This way the profit from added complexity will be more obvious: does
>> it pay for self or not.
> 
> The above test already increases NUM_XLOGINSERT_LOCKS to 64; I will try 128
> and update the result later.

Oh, I see: I forgot that I removed increase of NUM_XLOGINSERT_LOCKS from 
v2 patch.