Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
От | Robert Haas |
---|---|
Тема | Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile |
Дата | |
Msg-id | CA+TgmoYTakYfxErwgr4LihSa7B-fZzQk0cDFhGNEBA45A5grQQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile (Ants Aasma <ants@cybertec.at>) |
Ответы |
Re: 9.2beta1, parallel queries, ReleasePredicateLocks,
CheckForSerializableConflictIn in the oprofile
|
Список | pgsql-hackers |
On Fri, Jun 1, 2012 at 9:55 PM, Ants Aasma <ants@cybertec.at> wrote: > On Sat, Jun 2, 2012 at 1:48 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >> Buffer pins aren't a cache: with a cache you are trying to mask a slow >> operation (like a disk i/o) with a faster such that the amount of slow >> operations are minimized. Buffer pins however are very different in >> that we only care about contention on the reference count (the buffer >> itself is not locked!) which makes me suspicious that caching type >> algorithms are the wrong place to be looking. I think it comes to do >> picking between your relatively complex but general, lock displacement >> approach or a specific strategy based on known bottlenecks. > > I agree that pins aren't like a cache. I mentioned the caching > algorithms because they work based on access frequency and highly > contended locks are likely to be accessed frequently even from a > single backend. However this only makes sense for the delayed > unpinning method, and I also have come to the conclusion that it's not > likely to work well. Besides delaying cleanup, the overhead for the > common case of uncontended access is just too much. > > It seems to me that even the nailing approach will need a replacement > algorithm. The local pins still need to be published globally and > because shared memory size is fixed, the maximum amount of locally > pinned nailed buffers needs to be limited as well. > > But anyway, I managed to completely misread the profile that Sergey > gave. Somehow I missed that the time went into the retry TAS in slock > instead of the inlined TAS. This shows that the issue isn't just > cacheline ping-pong but cacheline stealing. This could be somewhat > mitigated by making pinning lock-free. The Nb-GCLOCK paper that Robert > posted earlier in another thread describes an approach for this. I > have a WIP patch (attached) that makes the clock sweep lock-free in > the common case. This patch gave a 40% performance increase for an > extremely allocation heavy load running with 64 clients on a 4 core 1 > socket system, lesser gains across the board. Pinning has a shorter > lock duration (and a different lock type) so the gain might be less, > or it might be a larger problem and post a higher gain. Either way, I > think the nailing approach should be explored further, cacheline > ping-pong could still be a problem with higher number of processors > and losing the spinlock also loses the ability to detect contention. Note sure about the rest of this patch, but this part is definitely bogus: +#if !defined(pg_atomic_fetch_and_set) +#define pg_atomic_fetch_and_set(dst, src, value) \ + do { S_LOCK(&dummy_spinlock); \ + dst = src; \ + src = value; \ + S_UNLOCK(&dummy_spinlock); } while (0) +#endif Locking a dummy backend-local spinlock doesn't provide atomicity across multiple processes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: