Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

Поиск

Список

Период

Сортировка

От	Alvaro Herrera
Тема	Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock
Дата	12 декабря 2023 г. 13:28:51
Msg-id	202312121328.hntemjpqg4hd@alvherre.pgsql обсуждение исходный текст
Ответ на	Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock (Dilip Kumar <dilipbalaut@gmail.com>)
Ответы	Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock
Список	pgsql-hackers

Дерево обсуждения

[Added Andrey again in CC, because as I understand they are using this
code or something like it in production.  Please don't randomly remove
people from CC lists.]

I've been looking at this some more, and I'm not confident in that the
group clog update stuff is correct.  I think the injection points test
case was good enough to discover a problem, but it's hard to get peace
of mind that there aren't other, more subtle problems.

The problem I see is that the group update mechanism is designed around
contention of the global xact-SLRU control lock; it uses atomics to
coordinate a single queue when the lock is contended.  So if we split up
the global SLRU control lock using banks, then multiple processes using
different bank locks might not contend.  OK, this is fine, but what
happens if two separate groups of processes encounter contention on two
different bank locks?  I think they will both try to update the same
queue, and coordinate access to that *using different bank locks*.  I
don't see how can this work correctly.

I suspect the first part of that algorithm, where atomics are used to
create the list without a lock, might work fine.  But will each "leader"
process, each of which is potentially using a different bank lock,
coordinate correctly?  Maybe this works correctly because only one
process will find the queue head not empty?  If this is what happens,
then there needs to be comments about it.  Without any explanation,
this seems broken and potentially dangerous, as some transaction commit
bits might become lost given high enough concurrency and bad luck.

Maybe this can be made to work by having one more lwlock that we use
solely to coordinate this task.  Though we would have to demonstrate
that coordinating this task with a different lock works correctly in
conjunction with the per-bank lwlock usage in the regular slru.c paths.


Andrey, do you have any stress tests or anything else that you used to
gain confidence in this code?

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"El sabio habla porque tiene algo que decir;
el tonto, porque tiene que decir algo" (Platon).

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock