Re: Proposal of tunable fix for scalability of 8.4

Поиск

Список

Период

Сортировка

От	Jignesh K. Shah
Тема	Re: Proposal of tunable fix for scalability of 8.4
Дата	12 марта 2009 г. 15:41:47
Msg-id	49B9566C.3010708@sun.com обсуждение исходный текст
Ответ на	Re: Proposal of tunable fix for scalability of 8.4 (Scott Carey <scott@richrelevance.com>)
Ответы	Re: Proposal of tunable fix for scalability of 8.4 Re: Proposal of tunable fix for scalability of 8.4 Re: Proposal of tunable fix for scalability of 8.4
Список	pgsql-performance

Дерево обсуждения

On 03/12/09 13:48, Scott Carey wrote:

On 3/11/09 7:47 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

All I’m adding, is that it makes some sense to me based on my experience in CPU / RAM bound scalability tuning. It was expressed that the test itself didn’t even make sense. I was wrong in my understanding of what the change did. If it wakes ALL waiters up there is an indeterminate amount of time a lock will wait. However, if instead of waking up all of them, if it only wakes up the shared readers and leaves all the exclusive ones at the front of the queue, there is no possibility of starvation since those exclusives will be at the front of the line after the wake-up batch. As for this being a use case that is important: * SSDs will drive the % of use cases that are not I/O bound up significantly over the next couple years. All postgres installations with less than about 100GB of data TODAY could avoid being I/O bound with current SSD technology, and those less than 2TB can do so as well but at high expense or with less proven technology like the ZFS L2ARC flash cache. * Intel will have a mainstream CPU that handles 12 threads (6 cores, 2 threads each) at the end of this year. Mainstream two CPU systems will have access to 24 threads and be common in 2010. Higher end 4CPU boxes will have access to 48 CPU threads. Hardware thread count is only going up. This is the future.

SSDs are precisely my motivation of doing RAM based tests with PostgreSQL. While I am waiting for my SSDs to arrive, I started to emulate SSDs by putting the whole database on RAM which in sense are better than SSDs so if we can tune with RAM disks then SSDs will be covered.What we have is a pool of 2000 users and we start making each user do series of transactions on different rows and see how much the database can handle linearly before some bottleneck (system or database) kicks in and there can be no more linear increase in active users. Many times there is drop after reaching some value of active users. If all 2000 users can scale linearly then another test with say 2500 can be executed .. All to do is what's the limit we can go till typically there are no system resources still remaining to be exploited.That said the testkit that I am using is a lightweight OLTP typish workload which a user runs against a preknown schema and between various transactions that it does it emulates a wait time of 200ms. That said it is some sense emulating a real user who clicks and then waits to see what he got and does another click which results in another transaction happening. (Not exactly but you get the point). Like all workloads it is generally used to find bottlenecks in systems before putting production stuff on it.That said my current environment I am having similar workloads and seeing how many users can go to the point where system has no more CPU resources available to do a linear growth in tpm. Generally as many of you mentioned you will see disk latency, network latency, cpu resource problems, etc.. And thats the work I am doing right now.. I am working around network latency by doing a private network, improving Operating systems tunables to improve efficiency out there.. I am improving disk latency by putting them on /RAM (and soon on SSDs).. However if I still cannot consume all CPU then it means I am probably hit by locks . Using PostgreSQL DTrace probes I can see what's happening.. At low user (100 users) counts my lock profiles from a user point of view are as follows: # dtrace -q -s 84_lwlock.d 1764 Lock Id Mode State Count ProcArrayLock Shared Waiting 1 CLogControlLock Shared Acquired 2 ProcArrayLock Exclusive Waiting 3 ProcArrayLock Exclusive Acquired 24 XidGenLock Exclusive Acquired 24 FirstLockMgrLock Shared Acquired 25 CLogControlLock Exclusive Acquired 26 FirstBufMappingLock Shared Acquired 55 WALInsertLock Exclusive Acquired 75 ProcArrayLock Shared Acquired 178 SInvalReadLock Shared Acquired 378 Lock Id Mode State Combined Time (ns) SInvalReadLock Acquired 29849 ProcArrayLock Shared Waiting 92261 ProcArrayLock Acquired 951470 FirstLockMgrLock Exclusive Acquired 1069064 CLogControlLock Exclusive Acquired 1295551 ProcArrayLock Exclusive Waiting 1758033 FirstBufMappingLock Exclusive Acquired 2078507 XidGenLock Exclusive Acquired 3460800 WALInsertLock Exclusive Acquired 12205466 SInvalReadLock Exclusive Acquired 42684236 ProcArrayLock Exclusive Acquired 57397139 As users grow beyond 1000 it changes to the following for the sample user point of view# dtrace -q -s 84_lwlock.d 1764 Lock Id Mode State Count CLogControlLock Exclusive Waiting 1 WALInsertLock Exclusive Waiting 1 ProcArrayLock Exclusive Acquired 7 XidGenLock Exclusive Acquired 7 ProcArrayLock Exclusive Waiting 10 CLogControlLock Shared Acquired 13 WALInsertLock Exclusive Acquired 23 CLogControlLock Exclusive Acquired 30 ProcArrayLock Shared Acquired 50 FirstLockMgrLock Shared Acquired 104 SInvalReadLock Shared Acquired 105 FirstBufMappingLock Shared Acquired 106 Lock Id Mode State Combined Time (ns) WALInsertLock Exclusive Waiting 73990 CLogControlLock Exclusive Waiting 383066 XidGenLock Exclusive Acquired 408301 CLogControlLock Exclusive Acquired 1871642 ProcArrayLock Acquired 2825372 WALInsertLock Exclusive Acquired 3144580 FirstLockMgrLock Exclusive Acquired 3799818 FirstBufMappingLock Exclusive Acquired 4083473 SInvalReadLock Exclusive Acquired 20611120 ProcArrayLock Exclusive Acquired 37920098 ProcArrayLock Exclusive Waiting 3783942020 Thats similar to what I had seen last year.. But thats the reason I am playing with lwlock.c to see how changing of how LWLockRelease() can be modified to do different types of wake-ups have impact on this top waiting time which is basically waste of time from perspective of application, operating system, cpu . All I am saying is with tuning flexibility we can actually reduce the time wasted and probably use that time with acquired state while it is doing some useful work. I dont think I have misconfigured the system. I am just showing that hey there are ways to cut down some inefficiencies here and showing test points. I am also showing where it does seem to help performance. It may not help in all case but I just gave you a test where it helps performance where it is better than what it is. And again this is the third time I am saying.. the test users also have some latency build up in them which is what generally is exploited to get more users than number of CPUS on the system but that's the point we want to exploit.. Otherwise if all new users begin to do their job with no latency then we would need 6+ billion cpus to handle all possible users. Typically as an administrator (System and database) I can only tweak/control latencies within my domain, that is network, disk, cpu's etc and those are what I am tweaking and coming to a *Configured* environment and now trying to improve lock contentions/waits in PostgreSQL so that we have an optimized setup. I am trying another run where I limit the waked up threads to a pre-configured number to see how various numbers pans out in terms of throughput on this server. Regards, Jignesh

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Proposal of tunable fix for scalability of 8.4