Re: LogwrtResult contended spinlock

Поиск
Список
Период
Сортировка
От Bharath Rupireddy
Тема Re: LogwrtResult contended spinlock
Дата
Msg-id CALj2ACUZAQfTfHwB8BN0Jm2uAw+ij238qn_TRM3-4OCAGs8OYw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: LogwrtResult contended spinlock  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Ответы Re: LogwrtResult contended spinlock  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Список pgsql-hackers
On Mon, Mar 4, 2024 at 9:15 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> > 0003:
> >
> > * We need to maintain the invariant that Copy >= Write >= Flush. I
> > believe that's always satisfied, because the
> > XLogWaitInsertionsToFinish() is always called before XLogWrite(). But
> > we should add an assert or runtime check of this invariant somewhere.
>
> Yes, that invariant is already maintained by the server. Although, I'm
> not fully agree, I added an assertion to WaitXLogInsertionsToFinish
> after updating XLogCtl->LogwrtResult.Copy. CF bot is happy with it -
> https://github.com/BRupireddy2/postgres/tree/atomic_LogwrtResult_v13.

I've now separated these invariants out into the 0004 patch.

With the assertions placed in WaitXLogInsertionsToFinish after
updating Copy ptr, I observed the assertion failing in one of the CF
bot machines - https://cirrus-ci.com/build/6202112288227328. I could
reproduce it locally with [1]. I guess the reason is that the Write
and Flush ptrs are now updated independently and atomically without
lock, they might drift and become out-of-order for a while if
concurrently they are accessed in WaitXLogInsertionsToFinish. So, I
guess the right place to verify the invariant Copy >= Write >= Flush
is in XLogWrite once Write and Flush ptrs in shared memory are updated
(note that only one process at a time can do this). Accordingly, I've
moved the assertions to XLogWrite in the attached v14-0004 patch.

> Please see the attached v13 patch set for further review.

Earlier versions of the patches removed a piece of code ensuring
shared WAL 'request' values did not fall beading the 'result' values.
There's a good reason for us to have it. So, I restored it.

-       /*
-        * Update shared-memory status
-        *
-        * We make sure that the shared 'request' values do not fall behind the
-        * 'result' values.  This is not absolutely essential, but it saves some
-        * code in a couple of places.
-        */

Please see the attached v14 patch set.

[1] for i in {1..100}; do make check
PROVE_TESTS="t/027_stream_regress.pl"; if [ $? -ne 0 ]; then echo "The
command failed on iteration $i"; break; fi; done

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Anton A. Melnikov"
Дата:
Сообщение: Re: May be BUG. Periodic burst growth of the checkpoint_req counter on replica.
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: type cache cleanup improvements