Обсуждение: Re: Increase NUM_XLOGINSERT_LOCKS

Поиск
Список
Период
Сортировка

Re: Increase NUM_XLOGINSERT_LOCKS

От
Andres Freund
Дата:
Hi,

On 2025-01-16 16:52:46 +0300, Yura Sokolov wrote:
> Good day, hackers.
> 
> Zhiguo Zhow proposed to transform xlog reservation to lock-free algorighm to
> increment NUM_XLOGINSERT_LOCKS on very huge (480vCPU) servers. [1]
> 
> While I believe lock-free reservation make sense on huge server, it is hard
> to measure on small servers and personal computers/notebooks.
> 
> But increase of NUM_XLOGINSERT_LOCKS have measurable performance gain (using
> synthetic test) even on my working notebook:
> 
>   Ryzen-5825U (8 cores, 16 threads) limited to 2GHz , Ubuntu 24.04

I've experimented with this in the past.


Unfortunately increasing it substantially can make the contention on the
spinlock *substantially* worse.

c=80 && psql -c checkpoint -c 'select pg_switch_wal()' && pgbench -n -M prepared -c$c -j$c -f <(echo "SELECT
pg_logical_emit_message(true,'test', repeat('0', 1024*1024));";) -P1 -T15
 

On a 2x Xeon Gold 5215, with max_wal_size = 150GB and the workload ran a few
times to ensure WAL is already allocated.

With
NUM_XLOGINSERT_LOCKS = 8:       1459 tps
NUM_XLOGINSERT_LOCKS = 80:      2163 tps

The main reason is that the increase in insert locks puts a lot more pressure
on the spinlock. Secondarily it's also that we spend more time iterating
through the insert locks when waiting, and that that causes a lot of cacheline
pingpong.


On much larger machines this gets considerably worse. IIRC I saw something
like an 8x regression on a large machine in the past, but I couldn't find the
actual numbers anymore, so I wouldn't want to bet on it.

Greetings,

Andres Freund



Re: Increase NUM_XLOGINSERT_LOCKS

От
Yura Sokolov
Дата:
Excuse me, Andres, I've found I've pressed wrong button when I sent this 
letter first time, and it was sent only to you. So I'm sending the copy now.

Please, reply to this message with copy of your answer. Your answer is 
really valuable to be published in the list.

16.01.2025 18:36, Andres Freund wrote:
 > Hi,
 >
 > On 2025-01-16 16:52:46 +0300, Yura Sokolov wrote:
 >> Good day, hackers.
 >>
 >> Zhiguo Zhow proposed to transform xlog reservation to lock-free 
algorighm to
 >> increment NUM_XLOGINSERT_LOCKS on very huge (480vCPU) servers. [1]
 >>
 >> While I believe lock-free reservation make sense on huge server, it 
is hard
 >> to measure on small servers and personal computers/notebooks.
 >>
 >> But increase of NUM_XLOGINSERT_LOCKS have measurable performance 
gain (using
 >> synthetic test) even on my working notebook:
 >>
 >>    Ryzen-5825U (8 cores, 16 threads) limited to 2GHz , Ubuntu 24.04
 >
 > I've experimented with this in the past.
 >
 >
 > Unfortunately increasing it substantially can make the contention on the
 > spinlock *substantially* worse.
 >
 > c=80 && psql -c checkpoint -c 'select pg_switch_wal()' && pgbench -n 
-M prepared -c$c -j$c -f <(echo "SELECT pg_logical_emit_message(true, 
'test', repeat('0', 1024*1024));";) -P1 -T15
 >
 > On a 2x Xeon Gold 5215, with max_wal_size = 150GB and the workload 
ran a few
 > times to ensure WAL is already allocated.
 >
 > With
 > NUM_XLOGINSERT_LOCKS = 8:       1459 tps
 > NUM_XLOGINSERT_LOCKS = 80:      2163 tps

So, even in your test you have +50% gain from increasing 
NUM_XLOGINSERT_LOCKS.

(And that is why I'm keen on smaller increase, like upto 64, not 128).

 >
 > The main reason is that the increase in insert locks puts a lot more 
pressure
 > on the spinlock.

That it addressed by Zhiguo Zhow and me in other thread [1]. But 
increasing NUM_XLOGINSERT_LOCKS gives benefits right now (at least on 
smaller installations), and "lock-free reservation" should be measured 
against it.

 > Secondarily it's also that we spend more time iterating
 > through the insert locks when waiting, and that that causes a lot of 
cacheline
 > pingpong.

Waiting is done with LWLockWaitForVar, and there is no wait if 
`insertingAt` is in future. It looks very efficient in master branch code.

 > On much larger machines this gets considerably worse. IIRC I saw 
something
 > like an 8x regression on a large machine in the past, but I couldn't 
find the
 > actual numbers anymore, so I wouldn't want to bet on it.

I believe, it should be remeasured.

[1] 
https://postgr.es/m/flat/PH7PR11MB5796659F654F9BE983F3AD97EF142%40PH7PR11MB5796.namprd11.prod.outlook.com

------
regards
Yura



Re: Increase NUM_XLOGINSERT_LOCKS

От
Yura Sokolov
Дата:
Since it seems Andres missed my request to send answer's copy,
here it is:

On 2025-01-16 18:55:47 +0300, Yura Sokolov wrote:
 > 16.01.2025 18:36, Andres Freund пишет:
 >> Hi,
 >>
 >> On 2025-01-16 16:52:46 +0300, Yura Sokolov wrote:
 >>> Good day, hackers.
 >>>
 >>> Zhiguo Zhow proposed to transform xlog reservation to lock-free 
algorighm to
 >>> increment NUM_XLOGINSERT_LOCKS on very huge (480vCPU) servers. [1]
 >>>
 >>> While I believe lock-free reservation make sense on huge server, it 
is hard
 >>> to measure on small servers and personal computers/notebooks.
 >>>
 >>> But increase of NUM_XLOGINSERT_LOCKS have measurable performance 
gain (using
 >>> synthetic test) even on my working notebook:
 >>>
 >>>    Ryzen-5825U (8 cores, 16 threads) limited to 2GHz , Ubuntu 24.04
 >>
 >> I've experimented with this in the past.
 >>
 >>
 >> Unfortunately increasing it substantially can make the contention on the
 >> spinlock *substantially* worse.
 >>
 >> c=80 && psql -c checkpoint -c 'select pg_switch_wal()' && pgbench -n 
-M prepared -c$c -j$c -f <(echo "SELECT pg_logical_emit_message(true, 
'test', repeat('0', 1024*1024));";) -P1 -T15
 >>
 >> On a 2x Xeon Gold 5215, with max_wal_size = 150GB and the workload 
ran a few
 >> times to ensure WAL is already allocated.
 >>
 >> With
 >> NUM_XLOGINSERT_LOCKS = 8:       1459 tps
 >> NUM_XLOGINSERT_LOCKS = 80:      2163 tps
 >
 > So, even in your test you have +50% gain from increasing
 > NUM_XLOGINSERT_LOCKS.
 >
 > (And that is why I'm keen on smaller increase, like upto 64, not 128).

Oops, I swapped the results around when reformatting the results, sorry! 
It's
the opposite way.  I.e. increasing the locks hurts.

Here's that issue fixed and a few more NUM_XLOGINSERT_LOCKS.  This is a
slightly different disk (the other seems to have to go the way of the dodo),
so the results aren't expected to be exactly the same.

NUM_XLOGINSERT_LOCKS    TPS
1                       2583
2                       2524
4                       2711
8            2788
16                      1938
32                      1834
64                      1865
128                     1543


 >>
 >> The main reason is that the increase in insert locks puts a lot more 
pressure
 >> on the spinlock.
 >
 > That it addressed by Zhiguo Zhow and me in other thread [1]. But 
increasing
 > NUM_XLOGINSERT_LOCKS gives benefits right now (at least on smaller
 > installations), and "lock-free reservation" should be measured 
against it.

I know that there's that thread, I just don't see how we can increase
NUM_XLOGINSERT_LOCKS due to the regressions it can cause.


 >> Secondarily it's also that we spend more time iterating
 >> through the insert locks when waiting, and that that causes a lot of 
cacheline
 >> pingpong.
 >
 > Waiting is done with LWLockWaitForVar, and there is no wait if 
`insertingAt`
 > is in future. It looks very efficient in master branch code.

But LWLockWaitForVar is called from WaitXLogInsertionsToFinish, which just
iterates over all locks.



Greetings,

Andres Freund



Re: Increase NUM_XLOGINSERT_LOCKS

От
Japin Li
Дата:
On Sat, 18 Jan 2025 at 14:53, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
> Since it seems Andres missed my request to send answer's copy,
> here it is:
>
> On 2025-01-16 18:55:47 +0300, Yura Sokolov wrote:
>> 16.01.2025 18:36, Andres Freund пишет:
>>> Hi,
>>>
>>> On 2025-01-16 16:52:46 +0300, Yura Sokolov wrote:
>>>> Good day, hackers.
>>>>
>>>> Zhiguo Zhow proposed to transform xlog reservation to lock-free
>     algorighm to
>>>> increment NUM_XLOGINSERT_LOCKS on very huge (480vCPU) servers. [1]
>>>>
>>>> While I believe lock-free reservation make sense on huge server,
>     it is hard
>>>> to measure on small servers and personal computers/notebooks.
>>>>
>>>> But increase of NUM_XLOGINSERT_LOCKS have measurable performance
>     gain (using
>>>> synthetic test) even on my working notebook:
>>>>
>>>>    Ryzen-5825U (8 cores, 16 threads) limited to 2GHz , Ubuntu 24.04
>>>
>>> I've experimented with this in the past.
>>>
>>>
>>> Unfortunately increasing it substantially can make the contention on the
>>> spinlock *substantially* worse.
>>>
>>> c=80 && psql -c checkpoint -c 'select pg_switch_wal()' && pgbench
>    -n -M prepared -c$c -j$c -f <(echo "SELECT
>    pg_logical_emit_message(true, 'test', repeat('0', 1024*1024));";)
>   -P1 -T15
>>>
>>> On a 2x Xeon Gold 5215, with max_wal_size = 150GB and the workload
>    ran a few
>>> times to ensure WAL is already allocated.
>>>
>>> With
>>> NUM_XLOGINSERT_LOCKS = 8:       1459 tps
>>> NUM_XLOGINSERT_LOCKS = 80:      2163 tps
>>
>> So, even in your test you have +50% gain from increasing
>> NUM_XLOGINSERT_LOCKS.
>>
>> (And that is why I'm keen on smaller increase, like upto 64, not 128).
>
> Oops, I swapped the results around when reformatting the results,
> sorry! It's
> the opposite way.  I.e. increasing the locks hurts.
>
> Here's that issue fixed and a few more NUM_XLOGINSERT_LOCKS.  This is a
> slightly different disk (the other seems to have to go the way of the dodo),
> so the results aren't expected to be exactly the same.
>
> NUM_XLOGINSERT_LOCKS    TPS
> 1                       2583
> 2                       2524
> 4                       2711
> 8            2788
> 16                      1938
> 32                      1834
> 64                      1865
> 128                     1543
>
>
>>>
>>> The main reason is that the increase in insert locks puts a lot
>    more pressure
>>> on the spinlock.
>>
>> That it addressed by Zhiguo Zhow and me in other thread [1]. But
>   increasing
>> NUM_XLOGINSERT_LOCKS gives benefits right now (at least on smaller
>> installations), and "lock-free reservation" should be measured
>   against it.
>
> I know that there's that thread, I just don't see how we can increase
> NUM_XLOGINSERT_LOCKS due to the regressions it can cause.
>
>
>>> Secondarily it's also that we spend more time iterating
>>> through the insert locks when waiting, and that that causes a lot
>    of cacheline
>>> pingpong.
>>
>> Waiting is done with LWLockWaitForVar, and there is no wait if
>   `insertingAt`
>> is in future. It looks very efficient in master branch code.
>
> But LWLockWaitForVar is called from WaitXLogInsertionsToFinish, which just
> iterates over all locks.
>

Hi, Yura Sokolov

I tested the patch on Hygon C86 7490 64-core using benchmarksql 5.0 with
500 warehouses and 256 terminals run time 10 mins:

| case               | min          | avg          | max          |
|--------------------+--------------+--------------+--------------|
| master (4108440)   | 891,225.77   | 904,868.75   | 913,708.17   |
| lock 64            | 1,007,716.95 | 1,012,013.22 | 1,018,674.00 |
| lock 64 attempt 1  | 1,016,716.07 | 1,017,735.55 | 1,019,328.36 |
| lock 64 attempt 2  | 1,015,328.31 | 1,018,147.74 | 1,021,513.14 |
| lock 128           | 1,010,147.38 | 1,014,128.11 | 1,018,672.01 |
| lock 128 attempt 1 | 1,018,154.79 | 1,023,348.35 | 1,031,365.42 |
| lock 128 attempt 2 | 1,013,245.56 | 1,018,984.78 | 1,023,696.00 |

I didn't NUM_XLOGINSERT_LOCKS with 16 and 32, however, I tested it with 256,
and got the following error:

2025-01-23 02:23:23.828 CST [333524] PANIC:  too many LWLocks taken

I hope this test will be helpful.

--
Regrads,
Japin Li



Re: Increase NUM_XLOGINSERT_LOCKS

От
wenhui qiu
Дата:
HI Japin 
     Thank you for you test ,It seems NUM_XLOGINSERT_LOCKS 64 is great , I think it doesn't need to grow much,What do you think?

Regards 


On Thu, Jan 23, 2025 at 10:30 AM Japin Li <japinli@hotmail.com> wrote:
On Sat, 18 Jan 2025 at 14:53, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
> Since it seems Andres missed my request to send answer's copy,
> here it is:
>
> On 2025-01-16 18:55:47 +0300, Yura Sokolov wrote:
>> 16.01.2025 18:36, Andres Freund пишет:
>>> Hi,
>>>
>>> On 2025-01-16 16:52:46 +0300, Yura Sokolov wrote:
>>>> Good day, hackers.
>>>>
>>>> Zhiguo Zhow proposed to transform xlog reservation to lock-free
>     algorighm to
>>>> increment NUM_XLOGINSERT_LOCKS on very huge (480vCPU) servers. [1]
>>>>
>>>> While I believe lock-free reservation make sense on huge server,
>     it is hard
>>>> to measure on small servers and personal computers/notebooks.
>>>>
>>>> But increase of NUM_XLOGINSERT_LOCKS have measurable performance
>     gain (using
>>>> synthetic test) even on my working notebook:
>>>>
>>>>    Ryzen-5825U (8 cores, 16 threads) limited to 2GHz , Ubuntu 24.04
>>>
>>> I've experimented with this in the past.
>>>
>>>
>>> Unfortunately increasing it substantially can make the contention on the
>>> spinlock *substantially* worse.
>>>
>>> c=80 && psql -c checkpoint -c 'select pg_switch_wal()' && pgbench
>    -n -M prepared -c$c -j$c -f <(echo "SELECT
>    pg_logical_emit_message(true, 'test', repeat('0', 1024*1024));";)
>   -P1 -T15
>>>
>>> On a 2x Xeon Gold 5215, with max_wal_size = 150GB and the workload
>    ran a few
>>> times to ensure WAL is already allocated.
>>>
>>> With
>>> NUM_XLOGINSERT_LOCKS = 8:       1459 tps
>>> NUM_XLOGINSERT_LOCKS = 80:      2163 tps
>>
>> So, even in your test you have +50% gain from increasing
>> NUM_XLOGINSERT_LOCKS.
>>
>> (And that is why I'm keen on smaller increase, like upto 64, not 128).
>
> Oops, I swapped the results around when reformatting the results,
> sorry! It's
> the opposite way.  I.e. increasing the locks hurts.
>
> Here's that issue fixed and a few more NUM_XLOGINSERT_LOCKS.  This is a
> slightly different disk (the other seems to have to go the way of the dodo),
> so the results aren't expected to be exactly the same.
>
> NUM_XLOGINSERT_LOCKS  TPS
> 1                       2583
> 2                       2524
> 4                       2711
> 8                     2788
> 16                      1938
> 32                      1834
> 64                      1865
> 128                     1543
>
>
>>>
>>> The main reason is that the increase in insert locks puts a lot
>    more pressure
>>> on the spinlock.
>>
>> That it addressed by Zhiguo Zhow and me in other thread [1]. But
>   increasing
>> NUM_XLOGINSERT_LOCKS gives benefits right now (at least on smaller
>> installations), and "lock-free reservation" should be measured
>   against it.
>
> I know that there's that thread, I just don't see how we can increase
> NUM_XLOGINSERT_LOCKS due to the regressions it can cause.
>
>
>>> Secondarily it's also that we spend more time iterating
>>> through the insert locks when waiting, and that that causes a lot
>    of cacheline
>>> pingpong.
>>
>> Waiting is done with LWLockWaitForVar, and there is no wait if
>   `insertingAt`
>> is in future. It looks very efficient in master branch code.
>
> But LWLockWaitForVar is called from WaitXLogInsertionsToFinish, which just
> iterates over all locks.
>

Hi, Yura Sokolov

I tested the patch on Hygon C86 7490 64-core using benchmarksql 5.0 with
500 warehouses and 256 terminals run time 10 mins:

| case               | min          | avg          | max          |
|--------------------+--------------+--------------+--------------|
| master (4108440)   | 891,225.77   | 904,868.75   | 913,708.17   |
| lock 64            | 1,007,716.95 | 1,012,013.22 | 1,018,674.00 |
| lock 64 attempt 1  | 1,016,716.07 | 1,017,735.55 | 1,019,328.36 |
| lock 64 attempt 2  | 1,015,328.31 | 1,018,147.74 | 1,021,513.14 |
| lock 128           | 1,010,147.38 | 1,014,128.11 | 1,018,672.01 |
| lock 128 attempt 1 | 1,018,154.79 | 1,023,348.35 | 1,031,365.42 |
| lock 128 attempt 2 | 1,013,245.56 | 1,018,984.78 | 1,023,696.00 |

I didn't NUM_XLOGINSERT_LOCKS with 16 and 32, however, I tested it with 256,
and got the following error:

2025-01-23 02:23:23.828 CST [333524] PANIC:  too many LWLocks taken

I hope this test will be helpful.

--
Regrads,
Japin Li

Re: Increase NUM_XLOGINSERT_LOCKS

От
Yura Sokolov
Дата:
23.01.2025 08:41, wenhui qiu wrote:
> HI Japin
>       Thank you for you test ,It seems NUM_XLOGINSERT_LOCKS 64 
> is great , I think it doesn't need to grow much,What do you think?

I agree: while 128 shows small benefit, it is not as big at the moment.
Given there's other waiting issues (may) arise from increasing it, 64
seems to be sweet spot.

Probably in a future it could be increased more after other places will 
be optimized.

> On Thu, Jan 23, 2025 at 10:30 AM Japin Li <japinli@hotmail.com 
> <mailto:japinli@hotmail.com>> wrote:
> 
>     On Sat, 18 Jan 2025 at 14:53, Yura Sokolov <y.sokolov@postgrespro.ru
>     <mailto:y.sokolov@postgrespro.ru>> wrote:
>      > Since it seems Andres missed my request to send answer's copy,
>      > here it is:
>      >
>      > On 2025-01-16 18:55:47 +0300, Yura Sokolov wrote:
>      >> 16.01.2025 18:36, Andres Freund пишет:
>      >>> Hi,
>      >>>
>      >>> On 2025-01-16 16:52:46 +0300, Yura Sokolov wrote:
>      >>>> Good day, hackers.
>      >>>>
>      >>>> Zhiguo Zhow proposed to transform xlog reservation to lock-free
>      >     algorighm to
>      >>>> increment NUM_XLOGINSERT_LOCKS on very huge (480vCPU) servers. [1]
>      >>>>
>      >>>> While I believe lock-free reservation make sense on huge server,
>      >     it is hard
>      >>>> to measure on small servers and personal computers/notebooks.
>      >>>>
>      >>>> But increase of NUM_XLOGINSERT_LOCKS have measurable performance
>      >     gain (using
>      >>>> synthetic test) even on my working notebook:
>      >>>>
>      >>>>    Ryzen-5825U (8 cores, 16 threads) limited to 2GHz , Ubuntu
>     24.04
>      >>>
>      >>> I've experimented with this in the past.
>      >>>
>      >>>
>      >>> Unfortunately increasing it substantially can make the
>     contention on the
>      >>> spinlock *substantially* worse.
>      >>>
>      >>> c=80 && psql -c checkpoint -c 'select pg_switch_wal()' && pgbench
>      >    -n -M prepared -c$c -j$c -f <(echo "SELECT
>      >    pg_logical_emit_message(true, 'test', repeat('0', 1024*1024));";)
>      >   -P1 -T15
>      >>>
>      >>> On a 2x Xeon Gold 5215, with max_wal_size = 150GB and the workload
>      >    ran a few
>      >>> times to ensure WAL is already allocated.
>      >>>
>      >>> With
>      >>> NUM_XLOGINSERT_LOCKS = 8:       1459 tps
>      >>> NUM_XLOGINSERT_LOCKS = 80:      2163 tps
>      >>
>      >> So, even in your test you have +50% gain from increasing
>      >> NUM_XLOGINSERT_LOCKS.
>      >>
>      >> (And that is why I'm keen on smaller increase, like upto 64, not
>     128).
>      >
>      > Oops, I swapped the results around when reformatting the results,
>      > sorry! It's
>      > the opposite way.  I.e. increasing the locks hurts.
>      >
>      > Here's that issue fixed and a few more NUM_XLOGINSERT_LOCKS. 
>     This is a
>      > slightly different disk (the other seems to have to go the way of
>     the dodo),
>      > so the results aren't expected to be exactly the same.
>      >
>      > NUM_XLOGINSERT_LOCKS  TPS
>      > 1                       2583
>      > 2                       2524
>      > 4                       2711
>      > 8                     2788
>      > 16                      1938
>      > 32                      1834
>      > 64                      1865
>      > 128                     1543
>      >
>      >
>      >>>
>      >>> The main reason is that the increase in insert locks puts a lot
>      >    more pressure
>      >>> on the spinlock.
>      >>
>      >> That it addressed by Zhiguo Zhow and me in other thread [1]. But
>      >   increasing
>      >> NUM_XLOGINSERT_LOCKS gives benefits right now (at least on smaller
>      >> installations), and "lock-free reservation" should be measured
>      >   against it.
>      >
>      > I know that there's that thread, I just don't see how we can increase
>      > NUM_XLOGINSERT_LOCKS due to the regressions it can cause.
>      >
>      >
>      >>> Secondarily it's also that we spend more time iterating
>      >>> through the insert locks when waiting, and that that causes a lot
>      >    of cacheline
>      >>> pingpong.
>      >>
>      >> Waiting is done with LWLockWaitForVar, and there is no wait if
>      >   `insertingAt`
>      >> is in future. It looks very efficient in master branch code.
>      >
>      > But LWLockWaitForVar is called from WaitXLogInsertionsToFinish,
>     which just
>      > iterates over all locks.
>      >
> 
>     Hi, Yura Sokolov
> 
>     I tested the patch on Hygon C86 7490 64-core using benchmarksql 5.0 with
>     500 warehouses and 256 terminals run time 10 mins:
> 
>     | case               | min          | avg          | max          |
>     |--------------------+--------------+--------------+--------------|
>     | master (4108440)   | 891,225.77   | 904,868.75   | 913,708.17   |
>     | lock 64            | 1,007,716.95 | 1,012,013.22 | 1,018,674.00 |
>     | lock 64 attempt 1  | 1,016,716.07 | 1,017,735.55 | 1,019,328.36 |
>     | lock 64 attempt 2  | 1,015,328.31 | 1,018,147.74 | 1,021,513.14 |
>     | lock 128           | 1,010,147.38 | 1,014,128.11 | 1,018,672.01 |
>     | lock 128 attempt 1 | 1,018,154.79 | 1,023,348.35 | 1,031,365.42 |
>     | lock 128 attempt 2 | 1,013,245.56 | 1,018,984.78 | 1,023,696.00 |
> 
>     I didn't NUM_XLOGINSERT_LOCKS with 16 and 32, however, I tested it
>     with 256,
>     and got the following error:
> 
>     2025-01-23 02:23:23.828 CST [333524] PANIC:  too many LWLocks taken
> 
>     I hope this test will be helpful.
> 
>     -- 
>     Regrads,
>     Japin Li
> 




Re: Increase NUM_XLOGINSERT_LOCKS

От
Japin Li
Дата:
On Thu, 23 Jan 2025 at 15:50, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
> 23.01.2025 08:41, wenhui qiu wrote:
>> HI Japin
>>       Thank you for you test ,It seems NUM_XLOGINSERT_LOCKS 64
>> is great , I think it doesn't need to grow much,What do you think?
>
> I agree: while 128 shows small benefit, it is not as big at the moment.
> Given there's other waiting issues (may) arise from increasing it, 64
> seems to be sweet spot.
>
> Probably in a future it could be increased more after other places
> will be optimized.
>

+1.
--
Regrads,
Japin Li