Обсуждение: Re: [RFC] Lock-free XLog Reservation from WAL
On Sun, 19 Jan 2025 at 17:56, Yura Sokolov <y.sokolov@postgrespro.ru> wrote: > 17.01.2025 17:00, Zhou, Zhiguo пишет: >> On 1/16/2025 10:00 PM, Yura Sokolov wrote: >>> >>> Good day, Zhiguo. >>> >>> Excuse me, I feel sneaky a bit, but I've started another thread >>> just about increase of NUM_XLOGINSERT_LOCK, because I can measure >>> its effect even on my working notebook (it is another one: Ryzen >>> 5825U limited to @2GHz). >>> >>> http://postgr.es/m/flat/3b11fdc2-9793-403d- >>> b3d4-67ff9a00d447%40postgrespro.ru >>> >>> ----- >>> >>> regards >>> Yura Sokolov aka funny-falcon >>> >>> >> Good day, Yura! >> Thank you for keeping me informed. I appreciate your proactive >> approach and understand the importance of exploring different angles >> for optimization. Your patch is indeed fundamental to our ongoing >> work on the lock-free xlog reservation, and I'm eager to see how it >> can further enhance our efforts. >> I will proceed to test the performance impact of your latest patch >> when combined with the lock-free xlog reservation patch. This will >> help us determine if there's potential for additional >> optimization. Concurrently, with your permission, I'll try to refine >> the hash-table- based implementation for your further review. WDYT? >> > > Good day, Zhiguo > > Here's version of "hash-table reservation" with both 32bit and 64bit > operations (depending on PG_HAVE_ATOMIC_U64_SIMULATION, or may be > switched by hand). > > 64bit version uses other protocol with a bit lesser atomic > operations. I suppose it could be a bit faster. But I can't prove it > now. > > btw, you wrote: > >>> Major issue: >>> - `SetPrevRecPtr` and `GetPrevRecPtr` do non-atomic write/read > with on >>> platforms where MAXALIGN != 8 or without native 64 > load/store. Branch >>> with 'memcpy` is rather obvious, but even pointer de-referencing on >>> "lucky case" is not safe either. >>> >>> I have no idea how to fix it at the moment. >>> >> >> Indeed, non-atomic write/read operations can lead to safety issues in >> some situations. My initial thought is to define a bit near the >> prev-link to flag the completion of the update. In this way, we could >> allow non-atomic or even discontinuous write/read operations on the >> prev-link, while simultaneously guaranteeing its atomicity through >> atomic operations (as well as memory barriers) on the flag bit. What >> do you think of this as a viable solution? > > There is a way to order operations: > - since SetPrevRecPtr stores start of record as LSN, its lower 32bits > are certainly non-zero (record could not start at the beginning of a > page). > - so SetPrevRecPtr should write high 32bits, issue write barrier, and > then write lower 32bits, > - and then GetPrevRecPtr should first read lower 32bits, and if it is > not zero, then issue read barrier and read upper 32bits. > > This way you will always read correct prev-rec-ptr on platform without > 64bit atomics. (because MAXALING >= 4 and PostgreSQL requires 4 byte > atomicity for several years). > Hi, Yura Sokolov Thanks for updating the patch. I test the v2 patch using BenchmarkSQL 1000 warehouse, and here is the tpmC result: case | min | avg | max --------------------+------------+------------+-------------- master (patched) | 988,461.89 | 994,916.50 | 1,000,362.40 master (44b61efb79) | 857,028.07 | 863,174.59 | 873,856.92 The patch provides a significant improvement. I just looked through the patch, here are some comments. 1. The v2 patch can't be applied cleanly. Applying: Lock-free XLog Reservation using lock-free hash-table .git/rebase-apply/patch:33: trailing whitespace. .git/rebase-apply/patch:37: space before tab in indent. { .git/rebase-apply/patch:38: space before tab in indent. int i; .git/rebase-apply/patch:39: trailing whitespace. .git/rebase-apply/patch:46: space before tab in indent. LWLockReleaseClearVar(&WALInsertLocks[i].l.lock, warning: squelched 4 whitespace errors warning: 9 lines add whitespace errors. 2. And there is a typo: + * PrevLinksHash is a lock-free hash table based on Cuckoo algorith. It is + * mostly 4 way: for every element computed two positions h1, h2, and s/algorith/algorithm/g -- Regrads, Japin Li
22.01.2025 09:09, Japin Li пишет: > On Sun, 19 Jan 2025 at 17:56, Yura Sokolov <y.sokolov@postgrespro.ru> wrote: >> 17.01.2025 17:00, Zhou, Zhiguo пишет: >>> On 1/16/2025 10:00 PM, Yura Sokolov wrote: >>>> >>>> Good day, Zhiguo. >>>> >>>> Excuse me, I feel sneaky a bit, but I've started another thread >>>> just about increase of NUM_XLOGINSERT_LOCK, because I can measure >>>> its effect even on my working notebook (it is another one: Ryzen >>>> 5825U limited to @2GHz). >>>> >>>> http://postgr.es/m/flat/3b11fdc2-9793-403d- >>>> b3d4-67ff9a00d447%40postgrespro.ru >>>> >>>> ----- >>>> >>>> regards >>>> Yura Sokolov aka funny-falcon >>>> >>>> >>> Good day, Yura! >>> Thank you for keeping me informed. I appreciate your proactive >>> approach and understand the importance of exploring different angles >>> for optimization. Your patch is indeed fundamental to our ongoing >>> work on the lock-free xlog reservation, and I'm eager to see how it >>> can further enhance our efforts. >>> I will proceed to test the performance impact of your latest patch >>> when combined with the lock-free xlog reservation patch. This will >>> help us determine if there's potential for additional >>> optimization. Concurrently, with your permission, I'll try to refine >>> the hash-table- based implementation for your further review. WDYT? >>> >> >> Good day, Zhiguo >> >> Here's version of "hash-table reservation" with both 32bit and 64bit >> operations (depending on PG_HAVE_ATOMIC_U64_SIMULATION, or may be >> switched by hand). >> >> 64bit version uses other protocol with a bit lesser atomic >> operations. I suppose it could be a bit faster. But I can't prove it >> now. >> >> btw, you wrote: >> >>>> Major issue: >>>> - `SetPrevRecPtr` and `GetPrevRecPtr` do non-atomic write/read >> with on >>>> platforms where MAXALIGN != 8 or without native 64 >> load/store. Branch >>>> with 'memcpy` is rather obvious, but even pointer de-referencing on >>>> "lucky case" is not safe either. >>>> >>>> I have no idea how to fix it at the moment. >>>> >>> >>> Indeed, non-atomic write/read operations can lead to safety issues in >>> some situations. My initial thought is to define a bit near the >>> prev-link to flag the completion of the update. In this way, we could >>> allow non-atomic or even discontinuous write/read operations on the >>> prev-link, while simultaneously guaranteeing its atomicity through >>> atomic operations (as well as memory barriers) on the flag bit. What >>> do you think of this as a viable solution? >> >> There is a way to order operations: >> - since SetPrevRecPtr stores start of record as LSN, its lower 32bits >> are certainly non-zero (record could not start at the beginning of a >> page). >> - so SetPrevRecPtr should write high 32bits, issue write barrier, and >> then write lower 32bits, >> - and then GetPrevRecPtr should first read lower 32bits, and if it is >> not zero, then issue read barrier and read upper 32bits. >> >> This way you will always read correct prev-rec-ptr on platform without >> 64bit atomics. (because MAXALING >= 4 and PostgreSQL requires 4 byte >> atomicity for several years). >> > > Hi, Yura Sokolov > > Thanks for updating the patch. > I test the v2 patch using BenchmarkSQL 1000 warehouse, and here is the tpmC > result: > > case | min | avg | max > --------------------+------------+------------+-------------- > master (patched) | 988,461.89 | 994,916.50 | 1,000,362.40 > master (44b61efb79) | 857,028.07 | 863,174.59 | 873,856.92 > > The patch provides a significant improvement. > > I just looked through the patch, here are some comments. > > 1. > The v2 patch can't be applied cleanly. > > Applying: Lock-free XLog Reservation using lock-free hash-table > .git/rebase-apply/patch:33: trailing whitespace. > > .git/rebase-apply/patch:37: space before tab in indent. > { > .git/rebase-apply/patch:38: space before tab in indent. > int i; > .git/rebase-apply/patch:39: trailing whitespace. > > .git/rebase-apply/patch:46: space before tab in indent. > LWLockReleaseClearVar(&WALInsertLocks[i].l.lock, > warning: squelched 4 whitespace errors > warning: 9 lines add whitespace errors. > > 2. > And there is a typo: > > + * PrevLinksHash is a lock-free hash table based on Cuckoo algorith. It is > + * mostly 4 way: for every element computed two positions h1, h2, and > > s/algorith/algorithm/g Hi, Japin Thank you a lot for measuring and comments. May I ask you to compare not only against master, but against straight increase of NUM_XLOGINSERT_LOCKS to 128 as well? This way the profit from added complexity will be more obvious: does it pay for self or not. ------- regards Yura
On Wed, 22 Jan 2025 at 10:25, Yura Sokolov <y.sokolov@postgrespro.ru> wrote: > 22.01.2025 09:09, Japin Li пишет: >> Hi, Yura Sokolov >> Thanks for updating the patch. >> I test the v2 patch using BenchmarkSQL 1000 warehouse, and here is the tpmC >> result: >> case | min | avg | max >> --------------------+------------+------------+-------------- >> master (patched) | 988,461.89 | 994,916.50 | 1,000,362.40 >> master (44b61efb79) | 857,028.07 | 863,174.59 | 873,856.92 >> The patch provides a significant improvement. >> I just looked through the patch, here are some comments. >> 1. >> The v2 patch can't be applied cleanly. >> Applying: Lock-free XLog Reservation using lock-free hash-table >> .git/rebase-apply/patch:33: trailing whitespace. >> .git/rebase-apply/patch:37: space before tab in indent. >> { >> .git/rebase-apply/patch:38: space before tab in indent. >> int i; >> .git/rebase-apply/patch:39: trailing whitespace. >> .git/rebase-apply/patch:46: space before tab in indent. >> LWLockReleaseClearVar(&WALInsertLocks[i].l.lock, >> warning: squelched 4 whitespace errors >> warning: 9 lines add whitespace errors. >> 2. >> And there is a typo: >> + * PrevLinksHash is a lock-free hash table based on Cuckoo >> algorith. It is >> + * mostly 4 way: for every element computed two positions h1, h2, and >> s/algorith/algorithm/g > > Hi, Japin > > Thank you a lot for measuring and comments. > > May I ask you to compare not only against master, but against straight > increase of NUM_XLOGINSERT_LOCKS to 128 as well? > This way the profit from added complexity will be more obvious: does > it pay for self or not. The above test already increases NUM_XLOGINSERT_LOCKS to 64; I will try 128 and update the result later. -- Regrads, Japin Li
22.01.2025 10:54, Japin Li пишет: > On Wed, 22 Jan 2025 at 10:25, Yura Sokolov <y.sokolov@postgrespro.ru> wrote: >> 22.01.2025 09:09, Japin Li пишет: >>> Hi, Yura Sokolov >>> Thanks for updating the patch. >>> I test the v2 patch using BenchmarkSQL 1000 warehouse, and here is the tpmC >>> result: >>> case | min | avg | max >>> --------------------+------------+------------+-------------- >>> master (patched) | 988,461.89 | 994,916.50 | 1,000,362.40 >>> master (44b61efb79) | 857,028.07 | 863,174.59 | 873,856.92 >>> The patch provides a significant improvement. >>> I just looked through the patch, here are some comments. >>> 1. >>> The v2 patch can't be applied cleanly. >>> Applying: Lock-free XLog Reservation using lock-free hash-table >>> .git/rebase-apply/patch:33: trailing whitespace. >>> .git/rebase-apply/patch:37: space before tab in indent. >>> { >>> .git/rebase-apply/patch:38: space before tab in indent. >>> int i; >>> .git/rebase-apply/patch:39: trailing whitespace. >>> .git/rebase-apply/patch:46: space before tab in indent. >>> LWLockReleaseClearVar(&WALInsertLocks[i].l.lock, >>> warning: squelched 4 whitespace errors >>> warning: 9 lines add whitespace errors. >>> 2. >>> And there is a typo: >>> + * PrevLinksHash is a lock-free hash table based on Cuckoo >>> algorith. It is >>> + * mostly 4 way: for every element computed two positions h1, h2, and >>> s/algorith/algorithm/g >> >> Hi, Japin >> >> Thank you a lot for measuring and comments. >> >> May I ask you to compare not only against master, but against straight >> increase of NUM_XLOGINSERT_LOCKS to 128 as well? >> This way the profit from added complexity will be more obvious: does >> it pay for self or not. > > The above test already increases NUM_XLOGINSERT_LOCKS to 64; Ok, that is good. Did you just increased number of locks, or applied "several-attempts-to-lock" from [1] as well? It will be interesting how it affects performance in this case. And it is orthogonal to "lock-free reservation", so they could applied simultaneously. > I will try 128 and update the result later. Thank you. [1] https://www.postgresql.org/message-id/3b11fdc2-9793-403d-b3d4-67ff9a00d447%40postgrespro.ru ------ regards Yura
On Wed, 22 Jan 2025 at 11:22, Yura Sokolov <y.sokolov@postgrespro.ru> wrote: > 22.01.2025 10:54, Japin Li пишет: >> On Wed, 22 Jan 2025 at 10:25, Yura Sokolov <y.sokolov@postgrespro.ru> wrote: >>> 22.01.2025 09:09, Japin Li пишет: >>>> Hi, Yura Sokolov >>>> Thanks for updating the patch. >>>> I test the v2 patch using BenchmarkSQL 1000 warehouse, and here is the tpmC >>>> result: >>>> case | min | avg | max >>>> --------------------+------------+------------+-------------- >>>> master (patched) | 988,461.89 | 994,916.50 | 1,000,362.40 >>>> master (44b61efb79) | 857,028.07 | 863,174.59 | 873,856.92 >>>> The patch provides a significant improvement. >>>> I just looked through the patch, here are some comments. >>>> 1. >>>> The v2 patch can't be applied cleanly. >>>> Applying: Lock-free XLog Reservation using lock-free hash-table >>>> .git/rebase-apply/patch:33: trailing whitespace. >>>> .git/rebase-apply/patch:37: space before tab in indent. >>>> { >>>> .git/rebase-apply/patch:38: space before tab in indent. >>>> int i; >>>> .git/rebase-apply/patch:39: trailing whitespace. >>>> .git/rebase-apply/patch:46: space before tab in indent. >>>> LWLockReleaseClearVar(&WALInsertLocks[i].l.lock, >>>> warning: squelched 4 whitespace errors >>>> warning: 9 lines add whitespace errors. >>>> 2. >>>> And there is a typo: >>>> + * PrevLinksHash is a lock-free hash table based on Cuckoo >>>> algorith. It is >>>> + * mostly 4 way: for every element computed two positions h1, h2, and >>>> s/algorith/algorithm/g >>> >>> Hi, Japin >>> >>> Thank you a lot for measuring and comments. >>> >>> May I ask you to compare not only against master, but against straight >>> increase of NUM_XLOGINSERT_LOCKS to 128 as well? >>> This way the profit from added complexity will be more obvious: does >>> it pay for self or not. >> The above test already increases NUM_XLOGINSERT_LOCKS to 64; > > Ok, that is good. > Did you just increased number of locks, or applied > "several-attempts-to-lock" > from [1] as well? It will be interesting how it affects performance in this > case. And it is orthogonal to "lock-free reservation", so they could > applied simultaneously. I apply the following two patches: 1. Lock-free XLog Reservation using lock-free hash-table 2. Increase NUM_XLOGINSERT_LOCKS to 64 I noticed the patch from the [1]. However, I haven't tested it independently. -- Regrads, Japin Li
22.01.2025 10:54, Japin Li wrote: > On Wed, 22 Jan 2025 at 10:25, Yura Sokolov <y.sokolov@postgrespro.ru> wrote: >> 22.01.2025 09:09, Japin Li пишет: >>> Hi, Yura Sokolov >>> Thanks for updating the patch. >>> I test the v2 patch using BenchmarkSQL 1000 warehouse, and here is the tpmC >>> result: >>> case | min | avg | max >>> --------------------+------------+------------+-------------- >>> master (patched) | 988,461.89 | 994,916.50 | 1,000,362.40 >>> master (44b61efb79) | 857,028.07 | 863,174.59 | 873,856.92 >>> The patch provides a significant improvement. >>> I just looked through the patch, here are some comments. >>> 1. >>> The v2 patch can't be applied cleanly. >>> Applying: Lock-free XLog Reservation using lock-free hash-table >>> .git/rebase-apply/patch:33: trailing whitespace. >>> .git/rebase-apply/patch:37: space before tab in indent. >>> { >>> .git/rebase-apply/patch:38: space before tab in indent. >>> int i; >>> .git/rebase-apply/patch:39: trailing whitespace. >>> .git/rebase-apply/patch:46: space before tab in indent. >>> LWLockReleaseClearVar(&WALInsertLocks[i].l.lock, >>> warning: squelched 4 whitespace errors >>> warning: 9 lines add whitespace errors. >>> 2. >>> And there is a typo: >>> + * PrevLinksHash is a lock-free hash table based on Cuckoo >>> algorith. It is >>> + * mostly 4 way: for every element computed two positions h1, h2, and >>> s/algorith/algorithm/g >> >> Hi, Japin >> >> Thank you a lot for measuring and comments. >> >> May I ask you to compare not only against master, but against straight >> increase of NUM_XLOGINSERT_LOCKS to 128 as well? >> This way the profit from added complexity will be more obvious: does >> it pay for self or not. > > The above test already increases NUM_XLOGINSERT_LOCKS to 64; I will try 128 > and update the result later. Oh, I see: I forgot that I removed increase of NUM_XLOGINSERT_LOCKS from v2 patch.