Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock
Дата
Msg-id 202312121328.hntemjpqg4hd@alvherre.pgsql
обсуждение исходный текст
Ответ на Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock  (Dilip Kumar <dilipbalaut@gmail.com>)
Ответы Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock  (Dilip Kumar <dilipbalaut@gmail.com>)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock  ("Andrey M. Borodin" <x4mmm@yandex-team.ru>)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
[Added Andrey again in CC, because as I understand they are using this
code or something like it in production.  Please don't randomly remove
people from CC lists.]

I've been looking at this some more, and I'm not confident in that the
group clog update stuff is correct.  I think the injection points test
case was good enough to discover a problem, but it's hard to get peace
of mind that there aren't other, more subtle problems.

The problem I see is that the group update mechanism is designed around
contention of the global xact-SLRU control lock; it uses atomics to
coordinate a single queue when the lock is contended.  So if we split up
the global SLRU control lock using banks, then multiple processes using
different bank locks might not contend.  OK, this is fine, but what
happens if two separate groups of processes encounter contention on two
different bank locks?  I think they will both try to update the same
queue, and coordinate access to that *using different bank locks*.  I
don't see how can this work correctly.

I suspect the first part of that algorithm, where atomics are used to
create the list without a lock, might work fine.  But will each "leader"
process, each of which is potentially using a different bank lock,
coordinate correctly?  Maybe this works correctly because only one
process will find the queue head not empty?  If this is what happens,
then there needs to be comments about it.  Without any explanation,
this seems broken and potentially dangerous, as some transaction commit
bits might become lost given high enough concurrency and bad luck.

Maybe this can be made to work by having one more lwlock that we use
solely to coordinate this task.  Though we would have to demonstrate
that coordinating this task with a different lock works correctly in
conjunction with the per-bank lwlock usage in the regular slru.c paths.


Andrey, do you have any stress tests or anything else that you used to
gain confidence in this code?

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"El sabio habla porque tiene algo que decir;
el tonto, porque tiene que decir algo" (Platon).



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Korotkov
Дата:
Сообщение: Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
Следующее
От: Xiaoran Wang
Дата:
Сообщение: Re: [PATCH]: Not to invaldiate CatalogSnapshot for local invalidation messages