Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

Поиск

Список

Период

Сортировка

От	Robert Haas
Тема	Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
Дата	4 июня 2012 г. 11:12:21
Msg-id	CA+TgmoYTakYfxErwgr4LihSa7B-fZzQk0cDFhGNEBA45A5grQQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile (Ants Aasma <ants@cybertec.at>)
Ответы	Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
Список	pgsql-hackers

Дерево обсуждения

On Fri, Jun 1, 2012 at 9:55 PM, Ants Aasma <ants@cybertec.at> wrote:
> On Sat, Jun 2, 2012 at 1:48 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> Buffer pins aren't a cache: with a cache you are trying to mask a slow
>> operation (like a disk i/o) with a faster such that the amount of slow
>> operations are minimized.  Buffer pins however are very different in
>> that we only care about contention on the reference count (the buffer
>> itself is not locked!) which makes me suspicious that caching type
>> algorithms are the wrong place to be looking.  I think it comes to do
>> picking between your relatively complex but general, lock displacement
>> approach or a specific strategy based on known bottlenecks.
>
> I agree that pins aren't like a cache. I mentioned the caching
> algorithms because they work based on access frequency and highly
> contended locks are likely to be accessed frequently even from a
> single backend. However this only makes sense for the delayed
> unpinning method, and I also have come to the conclusion that it's not
> likely to work well. Besides delaying cleanup, the overhead for the
> common case of uncontended access is just too much.
>
> It seems to me that even the nailing approach will need a replacement
> algorithm. The local pins still need to be published globally and
> because shared memory size is fixed, the maximum amount of locally
> pinned nailed buffers needs to be limited as well.
>
> But anyway, I managed to completely misread the profile that Sergey
> gave. Somehow I missed that the time went into the retry TAS in slock
> instead of the inlined TAS. This shows that the issue isn't just
> cacheline ping-pong but cacheline stealing. This could be somewhat
> mitigated by making pinning lock-free. The Nb-GCLOCK paper that Robert
> posted earlier in another thread describes an approach for this. I
> have a WIP patch (attached) that makes the clock sweep lock-free in
> the common case. This patch gave a 40% performance increase for an
> extremely allocation heavy load running with 64 clients on a 4 core 1
> socket system, lesser gains across the board. Pinning has a shorter
> lock duration (and a different lock type) so the gain might be less,
> or it might be a larger problem and post a higher gain. Either way, I
> think the nailing approach should be explored further, cacheline
> ping-pong could still be a problem with higher number of processors
> and losing the spinlock also loses the ability to detect contention.

Note sure about the rest of this patch, but this part is definitely bogus:

+#if !defined(pg_atomic_fetch_and_set)
+#define pg_atomic_fetch_and_set(dst, src, value) \
+       do { S_LOCK(&dummy_spinlock); \
+       dst = src; \
+       src = value; \
+       S_UNLOCK(&dummy_spinlock); } while (0)
+#endif

Locking a dummy backend-local spinlock doesn't provide atomicity
across multiple processes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile