Re: BUG #18385: Assert("strategy_delta >= 0") in BgBufferSync() fails due to race condition
От | Tender Wang |
---|---|
Тема | Re: BUG #18385: Assert("strategy_delta >= 0") in BgBufferSync() fails due to race condition |
Дата | |
Msg-id | CAHewXNm0YG1ew5ZW=yEi2B6gAu5kEEDVh2CRoVQX+MObUgC-hA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #18385: Assert("strategy_delta >= 0") in BgBufferSync() fails due to race condition (Tender Wang <tndrwang@gmail.com>) |
Список | pgsql-bugs |
Sorry, the second solution is still not fix the problem. I met Assert failed:
strategy_passes is 100964, but prev_strategy_passes is 100965. The added check was not satisfied.
Would it happen that current strategy_passes is less than prev_strategy_passes.
Tender Wang <tndrwang@gmail.com> 于2024年3月15日周五 16:05写道:
Alexander Lakhin <exclusion@gmail.com> 于2024年3月13日周三 16:00写道:Hi Tender Wang,
13.03.2024 09:50, Tender Wang wrote:
> Hi Alexander,
> I haven't been able to reproduce this issue on my machine(2 vCPU, 2GB memory).
>
> Can you reproduce this issue reliably on your machine?
Thanks for your attention to this issue!
On my 12-core workstation, where pgbench shows approximately 2500 tps,
`pgbench -t 10000 -c 40` failed on iterations 1, 1, 3:
...
number of transactions actually processed: 398639/400000
number of failed transactions: 0 (0.000%)
latency average = 15.931 ms
initial connection time = 56.806 ms
tps = 2510.845701 (without initial connection time)
pgbench: error: Run was aborted; the above results are incomplete.
...
The server built with gcc,
CPPFLAGS="-O0" ./configure --enable-debug --enable-cassert ...
I have also intensified bgwriter as follows:
rc = WaitLatch(MyLatch,
WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
- BgWriterDelay /* ms */ , WAIT_EVENT_BGWRITER_MAIN);
+ 1 /* ms */ , WAIT_EVENT_BGWRITER_MAIN);
That is, I have the attached modification applied and the following in my
extra.config:
fsync = off
autovacuum = off
shared_buffers = '1MB'Using your provided patch and configuration, I can reproduce this issue easily on my mashine(2 vCPU, 2GB).I went throug the StrategySyncStart() and ClockSweepTick() code. After d72731a70 commit, operation on StrategyControl->nextVictimBufferdoesn't need to get the StrategyControl->buffer_strategy_lock, but StrategyControl->completePasses still needs lock.Before d72731a70, if bgwriter gets the StrategyControl->buffer_strategy_lock spinlock in StrategySyncStart(), backends can't add StrategyControl->nextVictimBuffer,then bgwriter can get a consistent value consisting of nextVictimBuffer and completePasses.But now, even though bgwriter gets the spinlock, backends also could add the StrategyControl->nextVictimBuffer. In corner cases, bgwriter will see a litter StrategyControl->nextVictimBuffervalue due to StrategyControl->nextVictimBuffer wraparound. But StrategyControl->completePasses didn't update. So the Assert(strategy_delta >= 0) will trigger.I have two solutions in my head:1. remove the Assert, but I'm not srue bgwriter write strategy should change if strategy_delta < 0.2. add more check in BgBufferSync() just like below:/** Since nextVictimBuffer in StrategyControl has been atomic.
* So its operation would not need to get buffer_strategy_lock.
* In extreme circumstances, StrategySyncStart would see not consistent
* value consisting of nextVictimBuffer and completePasses.
* So we add one to passes_delta to make strategy_delta >= 0.
*/
if (passes_delta == 0 && strategy_delta < 0)
passes_delta++;I test two times using the second solution, not trigger Assert again. Need more times to test.--Tender WangOpenPie: https://en.openpie.com/
Tender Wang
OpenPie: https://en.openpie.com/В списке pgsql-bugs по дате отправления: