Hi Tender Wang,
13.03.2024 09:50, Tender Wang wrote:
> Hi Alexander,
> I haven't been able to reproduce this issue on my machine(2 vCPU, 2GB memory).
>
> Can you reproduce this issue reliably on your machine?
Thanks for your attention to this issue!
On my 12-core workstation, where pgbench shows approximately 2500 tps,
`pgbench -t 10000 -c 40` failed on iterations 1, 1, 3:
...
number of transactions actually processed: 398639/400000
number of failed transactions: 0 (0.000%)
latency average = 15.931 ms
initial connection time = 56.806 ms
tps = 2510.845701 (without initial connection time)
pgbench: error: Run was aborted; the above results are incomplete.
...
The server built with gcc,
CPPFLAGS="-O0" ./configure --enable-debug --enable-cassert ...
I have also intensified bgwriter as follows:
rc = WaitLatch(MyLatch,
WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
- BgWriterDelay /* ms */ , WAIT_EVENT_BGWRITER_MAIN);
+ 1 /* ms */ , WAIT_EVENT_BGWRITER_MAIN);
That is, I have the attached modification applied and the following in my
extra.config:
fsync = off
autovacuum = off
shared_buffers = '1MB'
Using your provided patch and configuration, I can reproduce this issue easily on my mashine(2 vCPU, 2GB).
I went throug the StrategySyncStart() and ClockSweepTick() code. After d72731a70 commit, operation on StrategyControl->nextVictimBuffer
doesn't need to get the StrategyControl->buffer_strategy_lock, but StrategyControl->completePasses still needs lock.
Before d72731a70, if bgwriter gets the StrategyControl->buffer_strategy_lock spinlock in StrategySyncStart(), backends can't add StrategyControl->nextVictimBuffer,
then bgwriter can get a consistent value consisting of nextVictimBuffer and completePasses.
But now, even though bgwriter gets the spinlock, backends also could add the StrategyControl->nextVictimBuffer. In corner cases, bgwriter will see a litter StrategyControl->nextVictimBuffer
value due to StrategyControl->nextVictimBuffer wraparound. But StrategyControl->completePasses didn't update. So the Assert(strategy_delta >= 0) will trigger.
I have two solutions in my head:
1. remove the Assert, but I'm not srue bgwriter write strategy should change if strategy_delta < 0.
2. add more check in BgBufferSync() just like below:
/*
* Since nextVictimBuffer in StrategyControl has been atomic.
* So its operation would not need to get buffer_strategy_lock.
* In extreme circumstances, StrategySyncStart would see not consistent
* value consisting of nextVictimBuffer and completePasses.
* So we add one to passes_delta to make strategy_delta >= 0.
*/
if (passes_delta == 0 && strategy_delta < 0)
passes_delta++;