Обсуждение: User concurrency thresholding: where do I look?
Folks, I've run into this a number of times with various PostgreSQL users, so we tested it at Sun. What seems to be happening is that at some specific number of connections average throughput drops 30% and response time quadruples or worse. The amount seems to vary per machine; I've seen it as variously 95, 1050, 1700 or 2800 connections. Tinkering with postgresql.conf parameters doesn't seem to affect this threshold. As an example of this behavior: Users Txn/User Resp. Time 50 105.38 0.01 100 113.05 0.01 150 114.05 0.01 200 113.51 0.01 250 113.38 0.01 300 112.14 0.01 350 112.26 0.01 400 111.43 0.01 450 110.72 0.01 500 110.44 0.01 550 109.36 0.01 600 107.01 0.02 650 105.71 0.02 700 106.95 0.02 750 107.69 0.02 800 106.78 0.02 850 108.59 0.02 900 106.03 0.02 950 106.13 0.02 1000 64.58 0.15 1050 52.32 0.23 1100 49.79 0.25 Tinkering with shared_buffers has had no effect on this threholding (the above was with 3gb to 6gb of shared_buffers). Any ideas on where we should look for the source of the bottleneck? -- Josh Berkus PostgreSQL @ Sun San Francisco
Josh Berkus wrote:
> Folks,
>
> 650    105.71    0.02
> 700    106.95    0.02
> 750    107.69    0.02
> 800    106.78    0.02
> 850    108.59    0.02
> 900    106.03    0.02
> 950    106.13    0.02
> 1000    64.58    0.15
> 1050    52.32    0.23
> 1100    49.79    0.25
>
> Tinkering with shared_buffers has had no effect on this threholding (the above
> was with 3gb to 6gb of shared_buffers).   Any ideas on where we should look
> for the source of the bottleneck?
I have seen this as well. I always knocked it up to PG having to
managing so many connections but there are some interesting evidences to
review.
The amount of memory "each" connection takes up. Consider 4-11 meg per
connection depending on various things like number of prepared queries.
Number of CPUs. Obviously 500 connections over 4 CPUS isn't the same as
500 connections over 8 CPUS.
That number of connections generally means a higher velocity, a higher
velocity means more checkpoint segments. Wrong settings with your
checkpoint segments, bgwriter and checkpoint will cause you to start
falling down.
I would also note that our experience is that PG falls down a little
higher, more toward 2500 connections last time I checked, but this was
likely on different hardware.
Joshua D. Drake
--
       === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/
			
		Josh Berkus wrote: > Folks, > > I've run into this a number of times with various PostgreSQL users, so we > tested it at Sun. What seems to be happening is that at some specific number > of connections average throughput drops 30% and response time quadruples or > worse. The amount seems to vary per machine; I've seen it as variously 95, > 1050, 1700 or 2800 connections. Tinkering with postgresql.conf parameters > doesn't seem to affect this threshold. > > As an example of this behavior: > > Users Txn/User Resp. Time > 50 105.38 0.01 > 100 113.05 0.01 > 150 114.05 0.01 > 200 113.51 0.01 > 250 113.38 0.01 > 300 112.14 0.01 > 350 112.26 0.01 > 400 111.43 0.01 > 450 110.72 0.01 > 500 110.44 0.01 > 550 109.36 0.01 > 600 107.01 0.02 > 650 105.71 0.02 > 700 106.95 0.02 > 750 107.69 0.02 > 800 106.78 0.02 > 850 108.59 0.02 > 900 106.03 0.02 > 950 106.13 0.02 > 1000 64.58 0.15 > 1050 52.32 0.23 > 1100 49.79 0.25 > > Tinkering with shared_buffers has had no effect on this threholding (the above > was with 3gb to 6gb of shared_buffers). Any ideas on where we should look > for the source of the bottleneck? Have you messed with max_connections and/or max_locks_per_transaction while testing this? The lock table is sized to max_locks_per_xact times max_connections, and shared memory hash tables get slower when they are full. Of course, the saturation point would depend on the avg number of locks acquired per user, which would explain why you are seeing a lower number for some users and higher for others (simpler/more complex queries). This is just a guess though. No profiling or measuring at all, really. -- Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC "How amazing is that? I call it a night and come back to find that a bug has been identified and patched while I sleep." (Robert Davidson) http://archives.postgresql.org/pgsql-sql/2006-03/msg00378.php
Alvaro, > Have you messed with max_connections and/or max_locks_per_transaction > while testing this? The lock table is sized to max_locks_per_xact times > max_connections, and shared memory hash tables get slower when they are > full. Of course, the saturation point would depend on the avg number of > locks acquired per user, which would explain why you are seeing a lower > number for some users and higher for others (simpler/more complex > queries). That's an interesting thought. Let me check lock counts and see if this is possibly the case. -- Josh Berkus PostgreSQL @ Sun San Francisco
Josh Berkus <josh@agliodbs.com> writes:
> Alvaro,
>> Have you messed with max_connections and/or max_locks_per_transaction
>> while testing this?  The lock table is sized to max_locks_per_xact times
>> max_connections, and shared memory hash tables get slower when they are
>> full.  Of course, the saturation point would depend on the avg number of
>> locks acquired per user, which would explain why you are seeing a lower
>> number for some users and higher for others (simpler/more complex
>> queries).
> That's an interesting thought.  Let me check lock counts and see if this is
> possibly the case.
AFAIK you'd get hard failures, not slowdowns, if you ran out of lock
space entirely; and the fact that you can continue the curve upwards
says that you're not on the edge of running out.  However I agree that
it's worth experimenting with those two parameters to see if the curve
moves around at all.
Another resource that might be interesting is the number of open files.
Also, have you tried watching vmstat or local equivalent to confirm that
the machine's not starting to swap?
            regards, tom lane
			
		Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > > Alvaro, > >> Have you messed with max_connections and/or max_locks_per_transaction > >> while testing this? The lock table is sized to max_locks_per_xact times > >> max_connections, and shared memory hash tables get slower when they are > >> full. Of course, the saturation point would depend on the avg number of > >> locks acquired per user, which would explain why you are seeing a lower > >> number for some users and higher for others (simpler/more complex > >> queries). > > > That's an interesting thought. Let me check lock counts and see if this is > > possibly the case. > > AFAIK you'd get hard failures, not slowdowns, if you ran out of lock > space entirely; Well, if there still is shared memory available, the lock hash can continue to grow, but it would slow down according to this comment in ShmemInitHash: * max_size is the estimated maximum number of hashtable entries. This is * not a hard limit, but the access efficiency will degrade if it is * exceeded substantially (since it's used to compute directory size and * the hash table buckets will get overfull). For the lock hash tables this max_size is (MaxBackends+max_prepared_xacts) * max_locks_per_xact. So maybe this does not make much sense in normal operation, thus not applicable to what Josh Berkus is reporting. However I was talking to Josh Drake yesterday and he told me that pg_dump was spending some significant amount of time in LOCK TABLE when there are lots of tables (say 300k). -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> AFAIK you'd get hard failures, not slowdowns, if you ran out of lock
>> space entirely;
> Well, if there still is shared memory available, the lock hash can
> continue to grow, but it would slow down according to this comment in
> ShmemInitHash:
Right, but there's not an enormous amount of headroom in shared memory
beyond the intended size of the hash tables.  I'd think that you'd start
seeing hard failures not very far beyond the point at which performance
impacts became visible.  Of course this is all speculation; I quite
agree with varying the table-size parameters to see if it makes a
difference.
Josh, what sort of workload is being tested here --- read-mostly,
write-mostly, a mixture?
> However I was talking to Josh Drake yesterday and he told me that
> pg_dump was spending some significant amount of time in LOCK TABLE when
> there are lots of tables (say 300k).
I wouldn't be too surprised if there's some O(N^2) effects when a single
transaction holds that many locks, because of the linked-list proclock
data structures.  This would not be relevant to Josh's case though.
            regards, tom lane
			
		Alvaro Herrera wrote: > Tom Lane wrote: >> Josh Berkus <josh@agliodbs.com> writes: > So maybe this does not make much sense in normal operation, thus not > applicable to what Josh Berkus is reporting. > > However I was talking to Josh Drake yesterday and he told me that > pg_dump was spending some significant amount of time in LOCK TABLE when > there are lots of tables (say 300k). Less, 128k Joshua D. Drake >
On Thu, 19 Jul 2007, Josh Berkus wrote: > What seems to be happening is that at some specific number of > connections average throughput drops 30% and response time quadruples or > worse. Could you characterize what each connection is doing and how you're generating the load? I don't know how productive speculating about the cause here will be until there's a test script available so other people can see where the tipping point is on their system. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Tom, all: > Also, have you tried watching vmstat or local equivalent to confirm that > the machine's not starting to swap? We're not swapping. > Josh, what sort of workload is being tested here --- read-mostly, > write-mostly, a mixture? It's a TPCC-like workload, so heavy single-row updates, and the updates/inserts are what's being measured. For that matter, when I've seen this before it was with heavy-write workloads and we were measuring the number of updates/inserts and not the number of reads. -- Josh Berkus PostgreSQL @ Sun San Francisco
Josh Berkus <josh@agliodbs.com> writes:
>> Josh, what sort of workload is being tested here --- read-mostly,
>> write-mostly, a mixture?
> It's a TPCC-like workload, so heavy single-row updates, and the
> updates/inserts are what's being measured.  For that matter, when I've seen
> this before it was with heavy-write workloads and we were measuring the
> number of updates/inserts and not the number of reads.
Well, if the load is a lot of short writing transactions then you'd
expect the throughput to depend on how fast stuff can be pushed down to
WAL.  What have you got wal_buffers set to?  Are you using a commit
delay?  What's the I/O system anyway (any BB write cache on the WAL
disk?) and what wal sync method are you using?
While I'm asking questions, exactly what were the data columns you
presented?  Txn/User doesn't make much sense to me, and I'm not sure
what "response time" you were measuring either.
            regards, tom lane
			
		Tom, > Well, if the load is a lot of short writing transactions then you'd > expect the throughput to depend on how fast stuff can be pushed down to > WAL. What have you got wal_buffers set to? Are you using a commit > delay? What's the I/O system anyway (any BB write cache on the WAL > disk?) and what wal sync method are you using? You know, I think Jignesh needs to me on this list so I can stop relaying questions on a workload I didn't design. Let me get him. -- Josh Berkus PostgreSQL @ Sun San Francisco
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Josh Berkus <josh@agliodbs.com> writes: > >> That's an interesting thought. Let me check lock counts and see if this is >> possibly the case. > > AFAIK you'd get hard failures, not slowdowns, if you ran out of lock > space entirely I assume you've checked the server logs and are sure that you aren't in fact getting errors. I could, for example, envision a situation where a fraction of the transactions are getting some error and those transactions are therefore not being counted against the txn/s result. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Thu, 19 Jul 2007, Josh Berkus wrote: > It's a TPCC-like workload, so heavy single-row updates, and the > updates/inserts are what's being measured. There's so much going on with a TPC-C kind of workload. Has anyone ever looked into quantifying scaling for more fundamental operations? There are so many places a complicated workload could get caught up that starting there is hard. I've found it's helpful to see the breaking points for simpler operations, then compare how things change as each new transaction element is introduced. As an example, take a look at the MySQL SysBench tool: http://sysbench.sourceforge.net/docs/ Specifically their "oltp" tests. Note how you can get a handle on how simple selects scale, then simple inserts, then updates, and so on. The only thing I've thought of they missed is testing a trivial operation that doesn't even touch the buffer cache ('SELECT 1'?) that would let you quantify just general connection scaling issues. It seems to me that you could narrow the list of possible causes here much more quickly if you had a good handle on the upper concurrency of lower-level operations. [Note: it's possible to run SysBench against a PG database, but the code is very immature. Last time I tried there were plenty of crashes and there seemed to be some transaction wrapping issues that caused deadlocks with some tests.] -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg, > There's so much going on with a TPC-C kind of workload. Has anyone ever > looked into quantifying scaling for more fundamental operations? There > are so many places a complicated workload could get caught up that > starting there is hard. I've found it's helpful to see the breaking > points for simpler operations, then compare how things change as each > new transaction element is introduced. ... eagerly awaiting Michael Doilson's PgUnitTest .... --Josh
Awww Josh, I was just enjoying the chat on the picket fence! :-) Anyway the workload is mixed (reads,writes) with simple to medium queries. The workload is known to scale well. But inorder to provide substantial input I am still trying to eliminate things that can bottleneck. Currently I have eliminated CPU (plenty free) , RAM (memory is 48GB RAM in this server for a 32-bit postgresql instance), IO Storage (used the free ram to do /tmp database to eliminate IO) and am still trying to eliminate any network bottlenecks to say for sure we have a problem in PostgreSQL. But yes till that final thing is confirmed (network which can very well be the case) it could be a problem somewhere else. However the thing that worries me is more of the big drop instead of remaining constant out there.. Anyway more on this within a day or two once I add more network nics between the systems to eliminate network problems (even though stats dont show them as problems right now) and also reduce malloc lock penalties if any. As for other questions: max_locks_per_transactions is set to default (10 I believe) increasing it still seems to degrade overall throughput number. max_connections is set to 1500 for now till I get decent scaling till 1400-1500 users. There are no hard failures reported anywhere. Log min durations does show that queries are now slowing down and taking longer. OS is not swapping and also eliminated IO by putting the whole database on /tmp So while I finish adding more network connections between the two systems (need to get cards) do enjoy the following URL :-) http://www.spec.org/jAppServer2004/results/res2007q3/jAppServer2004-20070703-00073.html Of course all postgresql.conf still remains from the old test so no flames on that one again :-) Regards, Jignesh Josh Berkus wrote: > Tom, > > >> Well, if the load is a lot of short writing transactions then you'd >> expect the throughput to depend on how fast stuff can be pushed down to >> WAL. What have you got wal_buffers set to? Are you using a commit >> delay? What's the I/O system anyway (any BB write cache on the WAL >> disk?) and what wal sync method are you using? >> > > You know, I think Jignesh needs to me on this list so I can stop relaying > questions on a workload I didn't design. Let me get him. > >
I forgot to add one more piece of information.. I also tried the same test with 64-bit postgresql with 6GB shared_buffers and results are the same it drops around the same point which to me sounds like a bottleneck.. More later -Jignesh Jignesh K. Shah wrote: > Awww Josh, > > I was just enjoying the chat on the picket fence! :-) > > Anyway the workload is mixed (reads,writes) with simple to medium > queries. The workload is known to scale well. But inorder to provide > substantial input I am still trying to eliminate things that can > bottleneck. Currently I have eliminated CPU (plenty free) , RAM > (memory is 48GB RAM in this server for a 32-bit postgresql > instance), IO Storage (used the free ram to do /tmp database to > eliminate IO) and am still trying to eliminate any network > bottlenecks to say for sure we have a problem in PostgreSQL. But yes > till that final thing is confirmed (network which can very well be the > case) it could be a problem somewhere else. However the thing that > worries me is more of the big drop instead of remaining constant out > there.. > > Anyway more on this within a day or two once I add more network nics > between the systems to eliminate network problems (even though stats > dont show them as problems right now) and also reduce malloc lock > penalties if any. > > As for other questions: > > max_locks_per_transactions is set to default (10 I believe) increasing > it still seems to degrade overall throughput number. > > max_connections is set to 1500 for now till I get decent scaling till > 1400-1500 users. > > There are no hard failures reported anywhere. Log min durations does > show that queries are now slowing down and taking longer. > > OS is not swapping and also eliminated IO by putting the whole > database on /tmp > > So while I finish adding more network connections between the two > systems (need to get cards) do enjoy the following URL :-) > > http://www.spec.org/jAppServer2004/results/res2007q3/jAppServer2004-20070703-00073.html > > > Of course all postgresql.conf still remains from the old test so no > flames on that one again :-) > > Regards, > Jignesh > > > > > Josh Berkus wrote: >> Tom, >> >> >>> Well, if the load is a lot of short writing transactions then you'd >>> expect the throughput to depend on how fast stuff can be pushed down to >>> WAL. What have you got wal_buffers set to? Are you using a commit >>> delay? What's the I/O system anyway (any BB write cache on the WAL >>> disk?) and what wal sync method are you using? >>> >> >> You know, I think Jignesh needs to me on this list so I can stop >> relaying questions on a workload I didn't design. Let me get him. >> >> > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org
"Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
> There are no hard failures reported anywhere. Log min durations does
> show that queries are now slowing down and taking longer.
> OS is not swapping and also eliminated IO  by putting the whole database
> on /tmp
Hmm.  Do you see any evidence of a context swap storm (ie, a drastic
increase in the context swaps/second reading reported by vmstat)?
            regards, tom lane
			
		Yes I did see increase in context switches and CPU migrations at that point using mpstat. Regards, Jignesh Tom Lane wrote: > "Jignesh K. Shah" <J.K.Shah@Sun.COM> writes: > >> There are no hard failures reported anywhere. Log min durations does >> show that queries are now slowing down and taking longer. >> OS is not swapping and also eliminated IO by putting the whole database >> on /tmp >> > > Hmm. Do you see any evidence of a context swap storm (ie, a drastic > increase in the context swaps/second reading reported by vmstat)? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match >
"Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
> Yes I did see increase in context switches and CPU migrations at that
> point using mpstat.
So follow that up --- try to determine which lock is being contended
for.  There's some very crude code in the sources that you can enable
with -DLWLOCK_STATS, but probably DTrace would be a better tool.
            regards, tom lane
			
		
Tom Lane wrote:
> "Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
>> Yes I did see increase in context switches and CPU migrations at that
>> point using mpstat.
>
> So follow that up --- try to determine which lock is being contended
> for.  There's some very crude code in the sources that you can enable
> with -DLWLOCK_STATS, but probably DTrace would be a better tool.
>
>             regards, tom lane
Using plockstat -A -s 5 -p $pid
on bgwriter:  doesnt report anything
On one of the many connections:
This one is hard to read easily
By default, plockstat monitors all lock con-
tention events, gathers  frequency  and  timing  data  about
those  events, and displays the data in decreasing frequency
order, so that the most common events appear first.
^Cbash-3.00# plockstat -A -s 5  -p 6401
^C
Mutex hold
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
   59   186888 0x10059e280                  libumem.so.1`process_free+0x12c
      nsec ---- Time Distribution --- count Stack
     16384 |                        |     1 libumem.so.1`process_free+0x12c
     32768 |@@@@@                   |    14 postgres`AllocSetDelete+0x98
     65536 |@@                      |     5
postgres`MemoryContextDelete+0x78
    131072 |                        |     0 postgres`CommitTransaction+0x240
    262144 |@@@@@@@@@@@@@@@         |    39
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  530    12226 0x10059e280
libumem.so.1`umem_cache_alloc+0x200
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@         |   338
libumem.so.1`umem_cache_alloc+0x200
      8192 |@                       |    24 libumem.so.1`umem_alloc+0x5c
     16384 |@                       |    37 libumem.so.1`malloc+0x40
     32768 |@@@@@                   |   131 postgres`AllocSetAlloc+0x1c4
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  324    10214 0x100578030                  libumem.so.1`vmem_xfree+0x164
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@          |   192 libumem.so.1`vmem_xfree+0x164
      8192 |@@@@                    |    56 libumem.so.1`process_free+0x12c
     16384 |@                       |    26 postgres`AllocSetDelete+0x98
     32768 |@@@                     |    50
postgres`MemoryContextDelete+0x78
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  161    13585 0x10059e280                  libumem.so.1`process_free+0x12c
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@       |   118 libumem.so.1`process_free+0x12c
      8192 |                        |     4 postgres`AllocSetDelete+0x98
     16384 |@                       |    10
postgres`MemoryContextDelete+0x78
     32768 |@@@                     |    24 postgres`PortalDrop+0x160
     65536 |                        |     3
    131072 |                        |     0
    262144 |                        |     2
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  326     6081 libumem.so.1`vmem0+0xc38     libumem.so.1`vmem_xalloc+0x630
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@            |   170 libumem.so.1`vmem_xalloc+0x630
      8192 |@@@@@@@@@@@             |   155 libumem.so.1`vmem_alloc+0x1f8
     16384 |                        |     1 libumem.so.1`vmem_xalloc+0x524
                                            libumem.so.1`vmem_alloc+0x1f8
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  326     5867 libumem.so.1`vmem0+0x30      libumem.so.1`vmem_alloc+0x248
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@           |   185 libumem.so.1`vmem_alloc+0x248
      8192 |@@@@@@@@@@              |   141
libumem.so.1`vmem_sbrk_alloc+0x30
                                            libumem.so.1`vmem_xalloc+0x524
                                            libumem.so.1`vmem_alloc+0x1f8
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  318     5873 0x100578030                  libumem.so.1`vmem_alloc+0x1d0
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@       |   228 libumem.so.1`vmem_alloc+0x1d0
      8192 |@@@@@                   |    78 libumem.so.1`umem_alloc+0xec
     16384 |                        |     6 libumem.so.1`malloc+0x40
     32768 |                        |     6 postgres`AllocSetAlloc+0x1c4
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  326     5591 0x100578030                  libumem.so.1`vmem_xalloc+0x630
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@         |   213 libumem.so.1`vmem_xalloc+0x630
      8192 |@@@@@@@@                |   112 libumem.so.1`vmem_alloc+0x1f8
     16384 |                        |     0 libumem.so.1`umem_alloc+0xec
     32768 |                        |     1 libumem.so.1`malloc+0x40
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  324     5208 libumem.so.1`vmem0+0xc38     libumem.so.1`vmem_xfree+0x164
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@       |   236 libumem.so.1`vmem_xfree+0x164
      8192 |@@@@@@                  |    88 libumem.so.1`process_free+0x12c
                                            postgres`AllocSetDelete+0x98
postgres`MemoryContextDelete+0x78
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  326     4108 libumem.so.1`vmem0+0xc38     libumem.so.1`vmem_alloc+0x1d0
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@@ |   325 libumem.so.1`vmem_alloc+0x1d0
      8192 |                        |     1 libumem.so.1`vmem_xalloc+0x524
                                            libumem.so.1`vmem_alloc+0x1f8
                                            libumem.so.1`umem_alloc+0xec
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  326     4108 0x100578030                  libumem.so.1`vmem_xalloc+0x50c
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@@ |   325 libumem.so.1`vmem_xalloc+0x50c
      8192 |                        |     1 libumem.so.1`vmem_alloc+0x1f8
                                            libumem.so.1`umem_alloc+0xec
                                            libumem.so.1`malloc+0x40
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  326     4096 libumem.so.1`vmem0+0xc38     libumem.so.1`vmem_xalloc+0x50c
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@@@|   326 libumem.so.1`vmem_xalloc+0x50c
                                            libumem.so.1`vmem_alloc+0x1f8
                                            libumem.so.1`vmem_xalloc+0x524
                                            libumem.so.1`vmem_alloc+0x1f8
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  240     5444 libumem.so.1`vmem0+0x30      libumem.so.1`process_free+0x12c
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@        |   167 libumem.so.1`process_free+0x12c
      8192 |@@@@@@@                 |    72 postgres`AllocSetDelete+0x98
     16384 |                        |     0
postgres`MemoryContextDelete+0x78
     32768 |                        |     1 postgres`ExecutorEnd+0x40
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  123     9057 0x10059e1d0
libumem.so.1`umem_depot_alloc+0xb8
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@             |    60
libumem.so.1`umem_depot_alloc+0xb8
      8192 |@@@@                    |    24
libumem.so.1`umem_cache_free+0xc4
     16384 |@@@@@@@                 |    37 libumem.so.1`process_free+0x12c
     32768 |                        |     2 postgres`AllocSetDelete+0x98
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  200     4935 0x10059e280                  libumem.so.1`process_free+0x12c
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@  |   185 libumem.so.1`process_free+0x12c
      8192 |                        |     4 postgres`AllocSetDelete+0x98
     16384 |@                       |    10
postgres`MemoryContextDelete+0x78
     32768 |                        |     1 postgres`ExecutorEnd+0x40
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  164     5219 0x100595700
libumem.so.1`umem_cache_alloc+0x200
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@       |   121
libumem.so.1`umem_cache_alloc+0x200
      8192 |@@@@@@                  |    42 libumem.so.1`umem_alloc+0x5c
     16384 |                        |     1 libumem.so.1`malloc+0x40
                                            postgres`AllocSetAlloc+0x1c4
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  122     6748 0x10059e1d0
libumem.so.1`umem_depot_alloc+0xb8
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@                |    43
libumem.so.1`umem_depot_alloc+0xb8
      8192 |@@@@@@@@@@@@@@@         |    79
libumem.so.1`umem_cache_alloc+0xa0
                                            libumem.so.1`umem_alloc+0x5c
                                            libumem.so.1`malloc+0x40
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  163     4146 0x100595700                  libumem.so.1`process_free+0x12c
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@@ |   161 libumem.so.1`process_free+0x12c
      8192 |                        |     2 postgres`AllocSetDelete+0x98
postgres`MemoryContextDelete+0x78
                                            postgres`PortalDrop+0x160
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
   50    12615 0x10059e280                  libumem.so.1`process_free+0x12c
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@           |    28 libumem.so.1`process_free+0x12c
      8192 |@                       |     3 postgres`AllocSetDelete+0x98
     16384 |@@@                     |     8
postgres`MemoryContextDelete+0x78
     32768 |@@@@@                   |    11 postgres`FreeExecutorState+0x6c
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  123     4096 0x10059e1d0
libumem.so.1`umem_cache_free+0xec
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@@@|   123
libumem.so.1`umem_cache_free+0xec
                                            libumem.so.1`process_free+0x12c
                                            postgres`AllocSetDelete+0x98
postgres`MemoryContextDelete+0x78
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  122     4096 0x10059e1d0
libumem.so.1`umem_cache_alloc+0xc4
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@@@|   122
libumem.so.1`umem_cache_alloc+0xc4
                                            libumem.so.1`umem_alloc+0x5c
                                            libumem.so.1`malloc+0x40
                                            postgres`AllocSetAlloc+0x1c4
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
   37     7970 libumem.so.1`vmem0+0x30      libumem.so.1`process_free+0x12c
      nsec ---- Time Distribution --- count Stack
      4096 |@                       |     2 libumem.so.1`process_free+0x12c
      8192 |@@@@@@@@@@@@@@@@@@@@@@  |    35 postgres`AllocSetDelete+0x98
postgres`MemoryContextDelete+0x78
postgres`exec_parse_message+0x130
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
   37     5867 0x10059e280                  libumem.so.1`process_free+0x12c
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@   |    33 libumem.so.1`process_free+0x12c
      8192 |@                       |     2 postgres`AllocSetDelete+0x98
     16384 |                        |     0
postgres`MemoryContextDelete+0x78
     32768 |@                       |     2
postgres`exec_parse_message+0x130
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
   39     4516 libumem.so.1`vmem0+0x30      libumem.so.1`process_free+0x12c
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@   |    35 libumem.so.1`process_free+0x12c
      8192 |@@                      |     4 postgres`AllocSetDelete+0x98
postgres`MemoryContextDelete+0x78
                                            postgres`PortalDrop+0x160
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
   37     4428 0x10058b700
libumem.so.1`umem_cache_alloc+0x200
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@  |    34
libumem.so.1`umem_cache_alloc+0x200
      8192 |@                       |     3 libumem.so.1`umem_alloc+0x5c
                                            libumem.so.1`malloc+0x40
postgres`base_yy_scan_buffer+0x38
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
   37     4206 0x10058b700                  libumem.so.1`process_free+0x12c
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@@ |    36 libumem.so.1`process_free+0x12c
      8192 |                        |     1 postgres`scanner_finish+0x50
                                            postgres`raw_parser+0x3c
                                            postgres`pg_parse_query+0x54
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
   11    10426 0x10059e280                  libumem.so.1`process_free+0x12c
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@           |     6 libumem.so.1`process_free+0x12c
      8192 |@@@@@@                  |     3 postgres`AllocSetDelete+0x98
     16384 |                        |     0
postgres`MemoryContextDelete+0x78
     32768 |@@@@                    |     2 postgres`ExecEndAgg+0x68
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
    8     5120 libumem.so.1`vmem0+0x30      libumem.so.1`process_free+0x12c
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@      |     6 libumem.so.1`process_free+0x12c
      8192 |@@@@@@                  |     2 postgres`AllocSetDelete+0x98
postgres`MemoryContextDelete+0x78
                                            postgres`ExecEndSort+0x24
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
    8     4096 0x10059e280                  libumem.so.1`process_free+0x12c
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@@@|     8 libumem.so.1`process_free+0x12c
                                            postgres`AllocSetDelete+0x98
postgres`MemoryContextDelete+0x78
                                            postgres`ExecEndSort+0x24
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
    8     4096 0x100578030                  libumem.so.1`vmem_alloc+0x1d0
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@@@|     8 libumem.so.1`vmem_alloc+0x1d0
                                            libumem.so.1`umem_alloc+0xec
                                            libumem.so.1`malloc+0x40
                                            postgres`AllocSetAlloc+0x314
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
    3     4096 0x10059e280                  libumem.so.1`process_free+0x12c
      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@@@|     3 libumem.so.1`process_free+0x12c
                                            postgres`AllocSetDelete+0x98
postgres`MemoryContextDelete+0x78
                                            postgres`tbm_free+0x10
bash-3.00#
			
		"Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
> Tom Lane wrote:
>> So follow that up --- try to determine which lock is being contended
>> for.  There's some very crude code in the sources that you can enable
>> with -DLWLOCK_STATS, but probably DTrace would be a better tool.
> Using plockstat -A -s 5 -p $pid
I don't know what that is, but it doesn't appear to have anything to do
with Postgres LWLocks or spinlocks, which are the locks I was thinking of.
Try asking Robert Lor about this --- IIRC he had some dtrace probes to
work with our locks.
            regards, tom lane
			
		sorry.. The are solaris mutex locks used by the postgresql process. What its saying is that there are holds/waits in trying to get locks which are locked at Solaris user library levels called from the postgresql functions: For example both the following functions are hitting on the same mutex lock 0x10059e280 in Solaris Library call: postgres`AllocSetDelete+0x98 postgres`AllocSetAlloc+0x1c4 I need to enable the DTrace probes on my builds -Jignesh Tom Lane wrote: > "Jignesh K. Shah" <J.K.Shah@Sun.COM> writes: > >> Tom Lane wrote: >> >>> So follow that up --- try to determine which lock is being contended >>> for. There's some very crude code in the sources that you can enable >>> with -DLWLOCK_STATS, but probably DTrace would be a better tool. >>> > > >> Using plockstat -A -s 5 -p $pid >> > > I don't know what that is, but it doesn't appear to have anything to do > with Postgres LWLocks or spinlocks, which are the locks I was thinking of. > Try asking Robert Lor about this --- IIRC he had some dtrace probes to > work with our locks. > > regards, tom lane >
"Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
> What its saying is that there are holds/waits in trying to get locks
> which are locked at Solaris user library levels called from the
> postgresql functions:
> For example both the following functions are hitting on the same mutex
> lock  0x10059e280  in Solaris Library call:
> postgres`AllocSetDelete+0x98
> postgres`AllocSetAlloc+0x1c4
That's a perfect example of the sort of useless overhead that I was
complaining of just now in pgsql-patches.  Having malloc/free use
an internal mutex is necessary in multi-threaded programs, but the
backend isn't multi-threaded.  And yet, apparently you can't turn
that off in Solaris.
(Fortunately, the palloc layer is probably insulating us from malloc's
performance enough that this isn't a huge deal.  But it's annoying.)
            regards, tom lane
			
		True you cant switch off the locks since libthread has been folded into libc in Solaris 10. Anyway just to give you an idea of the increase in context switching at the break point here are the mpstat (taken at 10 second interval) on this 8-socket Sun Fire V890. The low icsw (Involuntary Context Switches) is about 950-1000 user mark after which a context switch storm starts at users above 1000-1050 mark and drops in total throughput drops about 30% instantaneously.. I will try rebuilding the postgresql with dtrace probes to get more clues. (NOTE you will see 1 cpu (cpuid:22) doing more system work... thats the one doing handling the network interrupts) CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 57 0 27 108 6 4072 98 1749 416 1 7763 47 13 0 40 1 46 0 24 22 6 4198 11 1826 427 0 7547 45 13 0 42 2 42 0 34 104 8 4103 91 1682 424 1 7797 46 13 0 41 3 51 0 22 21 6 4125 10 1734 435 0 7399 45 13 0 43 4 65 0 27 19 6 4015 8 1706 411 0 7292 44 15 0 41 5 54 0 21 21 6 4297 10 1702 464 0 7708 45 13 0 42 6 36 0 16 66 47 4218 12 1713 426 0 7685 47 11 0 42 7 40 0 100 318 206 3699 10 1534 585 0 6851 45 14 0 41 16 41 0 30 87 5 3780 78 1509 401 1 7604 45 13 0 42 17 39 0 24 22 5 3970 12 1631 408 0 7265 44 12 0 44 18 42 0 24 99 5 3829 89 1519 401 1 7343 45 12 0 43 19 39 0 31 78830 5 3588 8 1509 400 0 6629 43 13 0 44 20 22 0 20 19 6 3925 9 1577 419 0 7364 44 12 0 44 21 38 0 31 23 5 3792 13 1566 407 0 7133 45 12 0 44 22 8 0 110 7053 7045 1641 8 728 838 0 2917 16 50 0 33 23 62 0 29 21 5 3985 10 1579 449 0 7368 44 12 0 44 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 13 0 27 123 6 4228 113 1820 433 1 8084 49 13 0 38 1 16 0 63 26 6 4253 15 1875 420 0 7754 47 14 0 39 2 11 0 31 110 8 4178 97 1741 425 1 8095 48 14 0 38 3 8 0 24 20 6 4257 9 1818 444 0 7807 47 13 0 40 4 13 0 54 28 6 4145 17 1774 426 1 7732 46 16 0 38 5 12 0 35 23 6 4412 12 1775 447 0 8249 48 13 0 39 6 8 0 24 38 15 4323 14 1760 422 0 8016 49 11 0 39 7 8 0 120 323 206 3801 15 1599 635 0 7290 47 15 0 38 16 11 0 44 107 5 3896 98 1582 393 1 7997 47 15 0 39 17 15 0 29 24 5 4120 14 1716 416 0 7648 46 13 0 41 18 9 0 35 113 5 3933 103 1594 399 1 7714 47 13 0 40 19 8 0 34 83271 5 3702 12 1564 403 0 7010 45 14 0 41 20 7 0 28 27 6 3997 16 1624 400 0 7676 46 13 0 41 21 8 0 28 25 5 3997 15 1664 402 0 7658 47 12 0 41 22 4 0 97 7741 7731 1586 11 704 906 0 2933 17 51 0 32 23 13 0 28 25 5 4144 15 1658 437 0 7810 47 12 0 41 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 141 315 6 9262 301 2812 330 0 10905 49 16 0 35 1 1 0 153 199 6 9400 186 2808 312 0 11066 48 16 0 37 2 0 0 140 256 8 8798 242 2592 310 0 10111 47 15 0 38 3 1 0 141 189 6 8803 172 2592 314 0 10171 47 15 0 39 4 0 0 120 214 6 9540 207 2801 322 0 10531 46 17 0 36 5 1 0 152 180 6 8764 161 2564 342 0 9904 47 15 0 38 6 1 0 107 344 148 8180 181 2512 290 0 9314 51 14 0 35 7 0 0 665 443 204 8733 153 2574 404 0 9892 43 21 0 37 16 0 0 113 217 5 6446 201 1975 265 0 7552 45 12 0 44 17 0 0 107 153 5 6568 140 2021 274 0 7586 44 11 0 45 18 0 0 121 215 5 6072 201 1789 276 1 7690 44 12 0 44 19 1 0 102 47142 5 6123 126 1829 262 0 7185 43 12 0 45 20 0 0 102 143 6 6451 129 1939 262 0 7450 43 13 0 44 21 1 0 106 150 5 6538 133 1997 285 0 7425 44 11 0 44 22 0 0 494 5949 5876 3586 73 1040 399 0 4058 26 39 0 34 23 0 0 102 159 5 6393 142 1942 324 0 7226 43 12 0 46 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 217 441 7 10763 426 3234 363 0 12449 47 18 0 35 1 0 0 210 322 7 11113 309 3273 351 0 12527 46 17 0 37 2 1 0 212 387 8 10306 370 2977 354 0 11320 45 16 0 38 3 0 0 230 276 7 10332 257 2947 341 0 11901 43 16 0 40 4 0 0 234 306 7 11324 290 3265 352 0 12805 45 18 0 37 5 0 0 212 284 7 10590 262 3042 388 0 11789 44 17 0 39 6 1 0 154 307 48 9583 241 2903 324 0 10564 50 15 0 35 7 0 0 840 535 206 10354 247 3035 428 0 11700 42 22 0 37 16 0 0 169 303 5 7446 286 2250 290 0 8361 42 13 0 45 17 0 0 173 240 5 7640 225 2288 295 0 8674 41 13 0 47 18 0 0 170 289 5 7445 270 2108 286 0 8167 41 12 0 47 19 0 0 176 51118 5 7365 197 2138 288 0 7934 40 13 0 47 20 1 0 172 222 6 7835 204 2323 298 0 8759 40 14 0 46 21 0 0 167 233 5 7749 218 2339 326 0 8264 42 13 0 46 22 0 0 749 6612 6516 4173 97 1166 421 0 4741 23 44 0 33 23 0 0 181 239 6 7709 219 2258 383 0 8402 41 12 0 47 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 198 439 6 10364 417 3113 327 0 11962 49 17 0 34 1 0 0 210 299 6 10655 282 3135 346 0 12463 47 17 0 36 2 0 0 202 352 8 9960 332 2890 320 0 11261 47 16 0 37 3 0 0 182 276 6 9950 255 2857 334 0 11021 46 16 0 38 4 0 0 200 305 6 10841 286 3127 325 0 12440 48 18 0 35 5 0 0 240 286 6 9983 272 2912 358 0 11450 46 16 0 37 6 0 0 153 323 81 9062 233 2767 300 0 9675 49 18 0 33 7 0 0 850 556 206 10027 271 2910 415 0 11048 43 22 0 35 16 0 0 152 306 5 7261 291 2216 266 0 8055 44 12 0 44 17 0 0 151 236 5 7193 217 2170 283 0 8099 43 12 0 45 18 0 0 170 263 5 7008 246 2009 254 0 7836 43 12 0 46 19 0 0 165 47738 5 6824 197 1989 273 0 7663 42 12 0 46 20 0 0 188 217 6 7496 197 2222 280 0 8435 43 13 0 44 21 0 0 179 248 5 7352 234 2233 309 0 8237 43 12 0 44 22 0 0 813 6041 5963 4006 82 1125 448 0 4442 25 42 0 33 23 0 0 162 241 5 7364 225 2170 355 0 7720 43 11 0 45 Tom Lane wrote: > "Jignesh K. Shah" <J.K.Shah@Sun.COM> writes: > >> What its saying is that there are holds/waits in trying to get locks >> which are locked at Solaris user library levels called from the >> postgresql functions: >> For example both the following functions are hitting on the same mutex >> lock 0x10059e280 in Solaris Library call: >> postgres`AllocSetDelete+0x98 >> postgres`AllocSetAlloc+0x1c4 >> > > That's a perfect example of the sort of useless overhead that I was > complaining of just now in pgsql-patches. Having malloc/free use > an internal mutex is necessary in multi-threaded programs, but the > backend isn't multi-threaded. And yet, apparently you can't turn > that off in Solaris. > > (Fortunately, the palloc layer is probably insulating us from malloc's > performance enough that this isn't a huge deal. But it's annoying.) > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
Tom Lane wrote: > Having malloc/free use > an internal mutex is necessary in multi-threaded programs, but the > backend isn't multi-threaded. > Hmm...confused. I'm not following why then there is contention for the mutex. Surely this has to be some other mutex that is in contention, not a heap lock ? It'd be handy to see the call stack for the wait state -- if the thing is spending a significant proportion of its time in contention it should be easy to get that with a simple tool such as pstack or a debugger.
David Boreham <david_list@boreham.org> writes:
> Tom Lane wrote:
>> Having malloc/free use
>> an internal mutex is necessary in multi-threaded programs, but the
>> backend isn't multi-threaded.
> Hmm...confused. I'm not following why then there is contention for the
> mutex.
There isn't any contention for that mutex; Jignesh's results merely show
that it was taken and released a lot of times.
            regards, tom lane
			
		On Fri, 2007-07-20 at 16:57 -0400, Tom Lane wrote: > "Jignesh K. Shah" <J.K.Shah@Sun.COM> writes: > > What its saying is that there are holds/waits in trying to get locks > > which are locked at Solaris user library levels called from the > > postgresql functions: > > For example both the following functions are hitting on the same mutex > > lock 0x10059e280 in Solaris Library call: > > postgres`AllocSetDelete+0x98 > > postgres`AllocSetAlloc+0x1c4 > > That's a perfect example of the sort of useless overhead that I was > complaining of just now in pgsql-patches. Having malloc/free use > an internal mutex is necessary in multi-threaded programs, but the > backend isn't multi-threaded. And yet, apparently you can't turn > that off in Solaris. > > (Fortunately, the palloc layer is probably insulating us from malloc's > performance enough that this isn't a huge deal. But it's annoying.) There is one thing that the palloc layer doesn't handle: EState. All other memory contexts have a very well chosen initial allocation that prevents mallocs during low-medium complexity OLTP workloads. EState is about 8300 bytes, so just above the large allocation limit. This means that every time we request an EState, i.e. at least once per statement we need to malloc() and then later free(). Would it be worth a special case in the palloc system to avoid having to repeatedly issue external memory allocation calls? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes:
> EState is about 8300 bytes,
What?
(gdb) p sizeof(EState)
$1 = 112
This is on a 32-bit machine, but even on 64-bit it wouldn't be more than
double that.
> Would it be worth a special case in the palloc system to avoid having to
> repeatedly issue external memory allocation calls?
The appropriate hack would be to change the AllocSetContextCreate
initial-size parameter for the containing context.  But I really have
no idea what you're on about.
            regards, tom lane
			
		On Mon, 2007-07-23 at 10:11 -0400, Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > EState is about 8300 bytes, > > What? > > (gdb) p sizeof(EState) > $1 = 112 > > This is on a 32-bit machine, but even on 64-bit it wouldn't be more than > double that. > > > Would it be worth a special case in the palloc system to avoid having to > > repeatedly issue external memory allocation calls? > > The appropriate hack would be to change the AllocSetContextCreate > initial-size parameter for the containing context. But I really have > no idea what you're on about. I looked at this last May and my notes say "ExecutorState". I guess that was wrong, but my analysis showed there was a single malloc of 8228 bytes happening once per query during my tests. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes:
> I looked at this last May and my notes say "ExecutorState". I guess that
> was wrong, but my analysis showed there was a single malloc of 8228
> bytes happening once per query during my tests.
Well, if you can track down where it's coming from, we could certainly
hack the containing context's parameters.  But EState's not it.
            regards, tom lane
			
		On Mon, 2007-07-23 at 10:54 -0400, Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > I looked at this last May and my notes say "ExecutorState". I guess that > > was wrong, but my analysis showed there was a single malloc of 8228 > > bytes happening once per query during my tests. > > Well, if you can track down where it's coming from, we could certainly > hack the containing context's parameters. But EState's not it. Well, I discover there is an allocation of 8232 (inflation...) made once per statement by a memory context called... ExecutorState. Still not sure exactly which allocation this is, but its definitely once per statement on pgbench, which should narrow it down. Plan, query etc? I don't see a way to hack the allocation, since the max chunk size is 8K. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes:
> Well, I discover there is an allocation of 8232 (inflation...) made once
> per statement by a memory context called... ExecutorState. Still not
> sure exactly which allocation this is, but its definitely once per
> statement on pgbench, which should narrow it down. Plan, query etc?
Are you working with stock sources?  The only allocation exceeding 1K
that I can see during pgbench is BTScanOpaqueData, which is 6600 bytes.
(Checked by setting a conditional breakpoint on AllocSetAlloc.)  The
path that allocates a single-chunk block is never taken at all.
            regards, tom lane
			
		On Mon, 2007-07-23 at 16:48 +0100, Simon Riggs wrote: > On Mon, 2007-07-23 at 10:54 -0400, Tom Lane wrote: > > "Simon Riggs" <simon@2ndquadrant.com> writes: > > > I looked at this last May and my notes say "ExecutorState". I guess that > > > was wrong, but my analysis showed there was a single malloc of 8228 > > > bytes happening once per query during my tests. > > > > Well, if you can track down where it's coming from, we could certainly > > hack the containing context's parameters. But EState's not it. > > Well, I discover there is an allocation of 8232 (inflation...) made once > per statement by a memory context called... ExecutorState. Still not > sure exactly which allocation this is, but its definitely once per > statement on pgbench, which should narrow it down. Plan, query etc? > > I don't see a way to hack the allocation, since the max chunk size is > 8K. It is the allocation of BTScanOpaqueData called from btrescan() in nbtree.c currPos and markPos are defined as BTScanPosData, which is an array of BTScanPosItems. That makes BTScanOpaqueData up to 8232 bytes, which seems wasteful since markPos is only ever used during merge joins. Most of that space isn't even used during merge joins either, we just do that to slightly optimise the speed of the restore during merge joins. Seems like we should allocate the memory when we do the first mark. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Mon, 2007-07-23 at 12:35 -0400, Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > Well, I discover there is an allocation of 8232 (inflation...) made once > > per statement by a memory context called... ExecutorState. Still not > > sure exactly which allocation this is, but its definitely once per > > statement on pgbench, which should narrow it down. Plan, query etc? > > Are you working with stock sources? The only allocation exceeding 1K > that I can see during pgbench is BTScanOpaqueData, which is 6600 bytes. > (Checked by setting a conditional breakpoint on AllocSetAlloc.) The > path that allocates a single-chunk block is never taken at all. I do have the bitmap patch currently applied, but it doesn't touch that part of the code. (gdb) p size $1 = 8232 (gdb) p sizeof(int) $2 = 4 (gdb) p sizeof(BTScanPosData) $3 = 4104 Since my notes say I got 8228 last year, seems reasonable. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes:
> currPos and markPos are defined as BTScanPosData, which is an array of
> BTScanPosItems. That makes BTScanOpaqueData up to 8232 bytes, which
> seems wasteful since markPos is only ever used during merge joins. Most
> of that space isn't even used during merge joins either, we just do that
> to slightly optimise the speed of the restore during merge joins.
Ah.  I was seeing it as 6600 bytes on HPPA and 6608 on x86_64, but
I forgot that both of those architectures have MAXALIGN = 8.  On a
MAXALIGN = 4 machine, MaxIndexTuplesPerPage will be significantly
larger, leading to larger BTScanPosData.
Not sure it's worth fooling with, given that these days almost everyone
who's seriously concerned about performance is probably using 64bit
hardware.  One less malloc cycle per indexscan is never going to be a
measurable savings anyway...
            regards, tom lane
			
		On Mon, 2007-07-23 at 14:19 -0400, Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > currPos and markPos are defined as BTScanPosData, which is an array of > > BTScanPosItems. That makes BTScanOpaqueData up to 8232 bytes, which > > seems wasteful since markPos is only ever used during merge joins. Most > > of that space isn't even used during merge joins either, we just do that > > to slightly optimise the speed of the restore during merge joins. > > Ah. I was seeing it as 6600 bytes on HPPA and 6608 on x86_64, but > I forgot that both of those architectures have MAXALIGN = 8. On a > MAXALIGN = 4 machine, MaxIndexTuplesPerPage will be significantly > larger, leading to larger BTScanPosData. > > Not sure it's worth fooling with, given that these days almost everyone > who's seriously concerned about performance is probably using 64bit > hardware. One less malloc cycle per indexscan is never going to be a > measurable savings anyway... Oh sure, I was thinking to avoid Solaris' mutex by avoiding malloc() completely. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Here is how I got the numbers..
I had about 1600 users login into postgresql. Then started the run with
500 users and using DTrace I started tracking Postgresql  Locking "as
viewed from one user/connection". Echo statements indicate how many
users were active at that point and how was throughput performing. All
IO is done on /tmp which means on a RAM disk.
bash-3.00# echo 500 users - baseline number
500 users
bash-3.00# ./3_lwlock_acquires.d 19178
             Lock Id            Mode           Count
    FirstLockMgrLock       Exclusive               1
    RelCacheInitLock       Exclusive               2
          SInvalLock       Exclusive               2
       WALInsertLock       Exclusive              10
      BufMappingLock       Exclusive              12
      CheckpointLock          Shared              29
 CheckpointStartLock          Shared              29
          OidGenLock       Exclusive              29
          XidGenLock       Exclusive              29
    FirstLockMgrLock          Shared              33
 CheckpointStartLock       Exclusive              78
       FreeSpaceLock       Exclusive             114
          OidGenLock          Shared             126
          XidGenLock          Shared             152
       ProcArrayLock          Shared             482
             Lock Id   Combined Time (ns)
          SInvalLock                29800
    RelCacheInitLock                30300
      BufMappingLock               168800
    FirstLockMgrLock               414300
       FreeSpaceLock              1281700
       ProcArrayLock              7869900
       WALInsertLock             11113200
 CheckpointStartLock             13494700
          OidGenLock             25719100
          XidGenLock             26443300
      CheckpointLock            194267800
bash-3.00# echo 600 users - Throughput rising
600 users
bash-3.00# ./3_lwlock_acquires.d 19178
             Lock Id            Mode           Count
    RelCacheInitLock       Exclusive               1
          SInvalLock       Exclusive               1
      BufMappingLock       Exclusive               2
     CLogControlLock       Exclusive               2
       WALInsertLock       Exclusive              11
    FirstLockMgrLock          Shared              20
      CheckpointLock          Shared              24
 CheckpointStartLock          Shared              24
          OidGenLock       Exclusive              24
          XidGenLock       Exclusive              24
 CheckpointStartLock       Exclusive              72
       FreeSpaceLock       Exclusive             102
          OidGenLock          Shared             106
          XidGenLock          Shared             128
       ProcArrayLock          Shared             394
             Lock Id   Combined Time (ns)
          SInvalLock                15600
    RelCacheInitLock                15700
      BufMappingLock                31000
     CLogControlLock                41500
    FirstLockMgrLock               289000
       FreeSpaceLock              3045400
 CheckpointStartLock              7371800
       WALInsertLock              9383200
       ProcArrayLock             10457900
          OidGenLock             20005900
          XidGenLock             20331500
      CheckpointLock            187067900
bash-3.00# echo 700 users  - Throughput rising
700 users
bash-3.00# ./3_lwlock_acquires.d 19178
             Lock Id            Mode           Count
    RelCacheInitLock       Exclusive               1
          SInvalLock       Exclusive               1
      BufMappingLock       Exclusive               2
       WALInsertLock       Exclusive              17
      CheckpointLock          Shared              33
 CheckpointStartLock          Shared              33
          OidGenLock       Exclusive              33
          XidGenLock       Exclusive              33
    FirstLockMgrLock          Shared              81
 CheckpointStartLock       Exclusive              87
       FreeSpaceLock       Exclusive             124
          OidGenLock          Shared             125
          XidGenLock          Shared             150
       ProcArrayLock          Shared             500
             Lock Id   Combined Time (ns)
    RelCacheInitLock                15100
          SInvalLock                15400
      BufMappingLock                47400
    FirstLockMgrLock              3021000
       FreeSpaceLock              3794300
       WALInsertLock              7567300
          XidGenLock             18427400
       ProcArrayLock             20884000
 CheckpointStartLock             24084900
          OidGenLock             26399500
      CheckpointLock            256549800
bash-3.00# echo 800 users  - Throughput rising
800 users
bash-3.00# ./3_lwlock_acquires.d 19178
             Lock Id            Mode           Count
      BufMappingLock       Exclusive               1
    RelCacheInitLock       Exclusive               1
          SInvalLock       Exclusive               1
        WALWriteLock       Exclusive               1
       WALInsertLock       Exclusive              11
      CheckpointLock          Shared              27
 CheckpointStartLock          Shared              27
          OidGenLock       Exclusive              27
          XidGenLock       Exclusive              27
    FirstLockMgrLock          Shared              32
 CheckpointStartLock       Exclusive              73
       FreeSpaceLock       Exclusive             110
          OidGenLock          Shared             118
          XidGenLock          Shared             140
       ProcArrayLock          Shared             442
             Lock Id   Combined Time (ns)
        WALWriteLock                13900
          SInvalLock                15000
    RelCacheInitLock                15500
      BufMappingLock                18600
    FirstLockMgrLock               391100
       WALInsertLock              3953700
       FreeSpaceLock              4801300
 CheckpointStartLock             13131800
       ProcArrayLock             14480500
          OidGenLock             17736500
          XidGenLock             21723100
      CheckpointLock            206423500
bash-3.00# echo 850 users - SLIGHT DROP in throughput
850 users
bash-3.00# ./3_lwlock_acquires.d 19178
             Lock Id            Mode           Count
    FirstLockMgrLock       Exclusive               1
    RelCacheInitLock       Exclusive               1
          SInvalLock       Exclusive               1
        WALWriteLock       Exclusive               1
      BufMappingLock       Exclusive               3
       WALInsertLock       Exclusive               7
      CheckpointLock          Shared              39
 CheckpointStartLock          Shared              39
          OidGenLock       Exclusive              39
          XidGenLock       Exclusive              39
    FirstLockMgrLock          Shared              47
 CheckpointStartLock       Exclusive             113
       FreeSpaceLock       Exclusive             152
          OidGenLock          Shared             162
          XidGenLock          Shared             194
       ProcArrayLock          Shared             621
             Lock Id   Combined Time (ns)
        WALWriteLock                14200
    RelCacheInitLock                15100
          SInvalLock                15600
      BufMappingLock                64100
       WALInsertLock              2073200
    FirstLockMgrLock              3040300
       FreeSpaceLock              7329500
          OidGenLock             21619100
 CheckpointStartLock             23261300
       ProcArrayLock             23917500
          XidGenLock             24873100
      CheckpointLock            309221200
bash-3.00# echo 900 users - ANOTHER SLIGHT DROP IN THROUGPUT
900 users
bash-3.00# ./3_lwlock_acquires.d 19178
             Lock Id            Mode           Count
        WALWriteLock       Exclusive               1
       WALInsertLock       Exclusive               7
 CheckpointStartLock          Shared              13
          OidGenLock       Exclusive              13
      CheckpointLock          Shared              14
          XidGenLock       Exclusive              14
    FirstLockMgrLock          Shared              15
       FreeSpaceLock       Exclusive              51
          OidGenLock          Shared              51
          XidGenLock          Shared              62
 CheckpointStartLock       Exclusive             170
       ProcArrayLock          Shared             202
             Lock Id   Combined Time (ns)
        WALWriteLock                16800
    FirstLockMgrLock               170300
       FreeSpaceLock               601500
       ProcArrayLock              3971300
       WALInsertLock              7757200
          OidGenLock              8261900
          XidGenLock             18450900
 CheckpointStartLock             39155100
      CheckpointLock            143751500
bash-3.00# echo 950 users  - BIG DROP IN THROUGHPUT
950 users
bash-3.00# ./3_lwlock_acquires.d 19178
             Lock Id            Mode           Count
       WALInsertLock       Exclusive               3
    FirstLockMgrLock          Shared               4
      CheckpointLock          Shared               7
 CheckpointStartLock          Shared               7
          OidGenLock       Exclusive               7
          XidGenLock       Exclusive               7
       FreeSpaceLock       Exclusive              29
          OidGenLock          Shared              30
          XidGenLock          Shared              36
       ProcArrayLock          Shared             115
 CheckpointStartLock       Exclusive             134
             Lock Id   Combined Time (ns)
    FirstLockMgrLock                64400
       FreeSpaceLock               342300
       WALInsertLock              1759600
          OidGenLock              4276900
       ProcArrayLock              6234300
          XidGenLock              6865000
 CheckpointStartLock             37590800
      CheckpointLock             58994300
bash-3.00# echo 1000 users - STEADY AT PREVIOUS LOW VALUE
1000 users
bash-3.00# ./3_lwlock_acquires.d 19178
             Lock Id            Mode           Count
      BufMappingLock       Exclusive               1
    RelCacheInitLock       Exclusive               1
          SInvalLock       Exclusive               1
       WALInsertLock       Exclusive               3
      CheckpointLock          Shared               9
 CheckpointStartLock          Shared               9
          OidGenLock       Exclusive               9
          XidGenLock       Exclusive               9
    FirstLockMgrLock          Shared              14
       FreeSpaceLock       Exclusive              33
          OidGenLock          Shared              37
          XidGenLock          Shared              44
 CheckpointStartLock       Exclusive             122
       ProcArrayLock          Shared             145
             Lock Id   Combined Time (ns)
    RelCacheInitLock                14300
          SInvalLock                15600
      BufMappingLock                21400
    FirstLockMgrLock               184000
       FreeSpaceLock               366200
       WALInsertLock              1769500
       ProcArrayLock              5076500
          XidGenLock              5898400
          OidGenLock              9244800
 CheckpointStartLock             31077500
      CheckpointLock             91861900
bash-3.00# echo 1050 users - SMALL  INCREASE
1050 users
bash-3.00# ./3_lwlock_acquires.d 19178
             Lock Id            Mode           Count
      BufMappingLock       Exclusive               2
       WALInsertLock       Exclusive               9
      CheckpointLock          Shared              24
          XidGenLock       Exclusive              24
 CheckpointStartLock          Shared              25
          OidGenLock       Exclusive              25
    FirstLockMgrLock          Shared              30
       FreeSpaceLock       Exclusive             100
          OidGenLock          Shared             107
          XidGenLock          Shared             129
 CheckpointStartLock       Exclusive             153
       ProcArrayLock          Shared             400
             Lock Id   Combined Time (ns)
      BufMappingLock                36600
    FirstLockMgrLock               420600
       FreeSpaceLock              2998400
       WALInsertLock              3818300
       ProcArrayLock              8986900
          OidGenLock             18127200
          XidGenLock             18569200
 CheckpointStartLock             44795700
      CheckpointLock            206488400
bash-3.00# echo 1100 users - SMALL DROP AGAIN
1100 users
bash-3.00# ./3_lwlock_acquires.d 19178
             Lock Id            Mode           Count
      BufMappingLock       Exclusive               1
       WALInsertLock       Exclusive               6
      CheckpointLock          Shared              11
          XidGenLock       Exclusive              11
 CheckpointStartLock          Shared              12
          OidGenLock       Exclusive              12
    FirstLockMgrLock          Shared              24
       FreeSpaceLock       Exclusive              39
          OidGenLock          Shared              44
          XidGenLock          Shared              51
 CheckpointStartLock       Exclusive              88
       ProcArrayLock          Shared             171
             Lock Id   Combined Time (ns)
      BufMappingLock                19500
    FirstLockMgrLock               302700
       FreeSpaceLock               511200
       ProcArrayLock              5042300
       WALInsertLock              5592800
 CheckpointStartLock             25009900
          OidGenLock             25231600
          XidGenLock            108045300
      CheckpointLock            379734000
bash-3.00# echo 1150 users  - STEADY AT LOW VALUE
1150 users
bash-3.00# ./3_lwlock_acquires.d 19178
             Lock Id            Mode           Count
        WALWriteLock       Exclusive               1
       WALInsertLock       Exclusive               2
      CheckpointLock          Shared               5
 CheckpointStartLock          Shared               6
          OidGenLock       Exclusive               6
          XidGenLock       Exclusive               6
    FirstLockMgrLock          Shared               8
       FreeSpaceLock       Exclusive              21
          OidGenLock          Shared              26
          XidGenLock          Shared              31
       ProcArrayLock          Shared              93
 CheckpointStartLock       Exclusive             122
             Lock Id   Combined Time (ns)
        WALWriteLock                14900
       WALInsertLock               116900
    FirstLockMgrLock               120600
       FreeSpaceLock              2177800
          XidGenLock              4899200
       ProcArrayLock             20721700
 CheckpointStartLock             27805200
      CheckpointLock             76369300
          OidGenLock            470145800
bash-3.00# echo 1250 users - STEADY AT LOW VALUE
1250 users
bash-3.00# ./3_lwlock_acquires.d 19178
             Lock Id            Mode           Count
      CheckpointLock          Shared               2
 CheckpointStartLock          Shared               2
          OidGenLock       Exclusive               2
       WALInsertLock       Exclusive               2
          XidGenLock       Exclusive               2
       FreeSpaceLock       Exclusive               9
          OidGenLock          Shared              10
          XidGenLock          Shared              12
       ProcArrayLock          Shared              36
 CheckpointStartLock       Exclusive             135
             Lock Id   Combined Time (ns)
       WALInsertLock                39500
       FreeSpaceLock                98600
       ProcArrayLock               318800
          XidGenLock              1379900
          OidGenLock              3437700
      CheckpointLock              9565200
 CheckpointStartLock             56547900
bash-3.00#
			
		"Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
> Here is how I got the numbers..
> I had about 1600 users login into postgresql. Then started the run with
> 500 users and using DTrace I started tracking Postgresql  Locking "as
> viewed from one user/connection". Echo statements indicate how many
> users were active at that point and how was throughput performing. All
> IO is done on /tmp which means on a RAM disk.
> bash-3.00# echo 500 users - baseline number
> 500 users
> bash-3.00# ./3_lwlock_acquires.d 19178
>              Lock Id            Mode           Count
>     FirstLockMgrLock       Exclusive               1
>     RelCacheInitLock       Exclusive               2
>           SInvalLock       Exclusive               2
>        WALInsertLock       Exclusive              10
>       BufMappingLock       Exclusive              12
>       CheckpointLock          Shared              29
>  CheckpointStartLock          Shared              29
>           OidGenLock       Exclusive              29
>           XidGenLock       Exclusive              29
>     FirstLockMgrLock          Shared              33
>  CheckpointStartLock       Exclusive              78
>        FreeSpaceLock       Exclusive             114
>           OidGenLock          Shared             126
>           XidGenLock          Shared             152
>        ProcArrayLock          Shared             482
I don't think I believe these numbers.  For one thing, CheckpointLock
is simply not ever taken in shared mode.  The ratios of counts for
different locks seems pretty improbable too, eg there is no way on
earth that the LockMgrLocks are taken more often shared than
exclusive (I would expect no shared acquires at all in the sort of
test you are running).  Not to mention that the absolute number of
counts seems way too low.  So I think the counting tool is broken.
>              Lock Id   Combined Time (ns)
>           SInvalLock                29800
>     RelCacheInitLock                30300
>       BufMappingLock               168800
>     FirstLockMgrLock               414300
>        FreeSpaceLock              1281700
>        ProcArrayLock              7869900
>        WALInsertLock             11113200
>  CheckpointStartLock             13494700
>           OidGenLock             25719100
>           XidGenLock             26443300
>       CheckpointLock            194267800
Combined time of what exactly?  It looks like this must be the total
duration the lock is held, at least assuming that the time for
CheckpointLock is correctly reported.  It'd be much more useful to see
the total time spent waiting to acquire the lock.
            regards, tom lane
			
		The count is only for a 10-second snapshot.. Plus remember there are
about 1000 users running so the connection  being profiled only gets
0.01 of the period on CPU..  And the count is for that CONNECTION only.
Anyway using the lock wait script it shows the real picture as you
requested. Here the combined time means time "spent waiting" for the lock.
bash-3.00# echo 500 users
500 users
bash-3.00# ./4_lwlock_waits.d 20764
             Lock Id            Mode           Count
          OidGenLock       Exclusive               1
 CheckpointStartLock          Shared               3
          OidGenLock          Shared               4
       WALInsertLock       Exclusive               7
       FreeSpaceLock       Exclusive               8
          XidGenLock       Exclusive              15
 CheckpointStartLock       Exclusive              16
             Lock Id   Combined Time (ns)
          XidGenLock              3825000
          OidGenLock              5307100
       WALInsertLock              6317800
       FreeSpaceLock              7244100
 CheckpointStartLock             22199200
bash-3.00# echo 600 users
600 users
bash-3.00# ./4_lwlock_waits.d 20764
             Lock Id            Mode           Count
          OidGenLock       Exclusive               1
       WALInsertLock       Exclusive               1
 CheckpointStartLock          Shared               4
 CheckpointStartLock       Exclusive              11
          XidGenLock       Exclusive              21
             Lock Id   Combined Time (ns)
          OidGenLock              1728200
       WALInsertLock              2040700
          XidGenLock             19878500
 CheckpointStartLock             24156500
bash-3.00# echo 700 users
700 users
bash-3.00# ./4_lwlock_waits.d 20764
             Lock Id            Mode           Count
          OidGenLock          Shared               1
          XidGenLock          Shared               1
 CheckpointStartLock          Shared               2
       WALInsertLock       Exclusive               4
 CheckpointStartLock       Exclusive               6
       FreeSpaceLock       Exclusive               6
          XidGenLock       Exclusive              13
             Lock Id   Combined Time (ns)
          OidGenLock              1730000
       WALInsertLock              7253400
       FreeSpaceLock             10977700
 CheckpointStartLock             13356800
          XidGenLock             38220500
bash-3.00# echo 800 users
800 users
bash-3.00# ./4_lwlock_waits.d 20764
             Lock Id            Mode           Count
 CheckpointStartLock          Shared               1
       WALInsertLock       Exclusive               2
          XidGenLock          Shared               2
 CheckpointStartLock       Exclusive               5
       FreeSpaceLock       Exclusive               8
          XidGenLock       Exclusive              12
             Lock Id   Combined Time (ns)
       WALInsertLock              3746800
       FreeSpaceLock              7628900
 CheckpointStartLock             11297500
          XidGenLock             16649000
bash-3.00# echo 900 users  - BIG DROP IN THROUGHPUT OCCURS...
900 users
bash-3.00# ./4_lwlock_waits.d 20764
             Lock Id            Mode           Count
          OidGenLock       Exclusive               1
          OidGenLock          Shared               1
          XidGenLock          Shared               1
       FreeSpaceLock       Exclusive               2
       WALInsertLock       Exclusive               2
 CheckpointStartLock          Shared               6
          XidGenLock       Exclusive              12
 CheckpointStartLock       Exclusive             121
             Lock Id   Combined Time (ns)
          OidGenLock              1968100
       FreeSpaceLock              2076300
       WALInsertLock              2190400
          XidGenLock             20259400
 CheckpointStartLock           1407294300
bash-3.00# echo 950 users
950 users
bash-3.00# ./4_lwlock_waits.d 20764
             Lock Id            Mode           Count
          OidGenLock       Exclusive               1
          OidGenLock          Shared               2
 CheckpointStartLock          Shared               3
       WALInsertLock       Exclusive               4
       FreeSpaceLock       Exclusive               5
          XidGenLock       Exclusive              11
 CheckpointStartLock       Exclusive              50
             Lock Id   Combined Time (ns)
       WALInsertLock              5577100
       FreeSpaceLock              9115900
          XidGenLock             13765800
          OidGenLock             50155500
 CheckpointStartLock            759872200
bash-3.00# echo 1000 users
1000 users
bash-3.00# ./4_lwlock_waits.d 20764
             Lock Id            Mode           Count
          OidGenLock          Shared               1
       WALInsertLock       Exclusive               1
          XidGenLock       Exclusive               5
 CheckpointStartLock          Shared               6
 CheckpointStartLock       Exclusive             102
             Lock Id   Combined Time (ns)
          OidGenLock                21900
       WALInsertLock                82500
          XidGenLock              3313400
 CheckpointStartLock           1448289900
bash-3.00# echo 1050 users
1050 users
bash-3.00# ./4_lwlock_waits.d 20764
             Lock Id            Mode           Count
       FreeSpaceLock       Exclusive               1
 CheckpointStartLock          Shared               3
          XidGenLock       Exclusive               3
 CheckpointStartLock       Exclusive             146
             Lock Id   Combined Time (ns)
       FreeSpaceLock                18400
          XidGenLock              1900900
 CheckpointStartLock           2392893700
bash-3.00#
-Jignesh
Tom Lane wrote:
> "Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
>
>> Here is how I got the numbers..
>> I had about 1600 users login into postgresql. Then started the run with
>> 500 users and using DTrace I started tracking Postgresql  Locking "as
>> viewed from one user/connection". Echo statements indicate how many
>> users were active at that point and how was throughput performing. All
>> IO is done on /tmp which means on a RAM disk.
>>
>
>
>> bash-3.00# echo 500 users - baseline number
>> 500 users
>> bash-3.00# ./3_lwlock_acquires.d 19178
>>
> I don't think I believe these numbers.  For one thing, CheckpointLock
> is simply not ever taken in shared mode.  The ratios of counts for
> different locks seems pretty improbable too, eg there is no way on
> earth that the LockMgrLocks are taken more often shared than
> exclusive (I would expect no shared acquires at all in the sort of
> test you are running).  Not to mention that the absolute number of
> counts seems way too low.  So I think the counting tool is broken.
>
>
> Combined time of what exactly?  It looks like this must be the total
> duration the lock is held, at least assuming that the time for
> CheckpointLock is correctly reported.  It'd be much more useful to see
> the total time spent waiting to acquire the lock.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>
			
		"Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
> The count is only for a 10-second snapshot.. Plus remember there are
> about 1000 users running so the connection  being profiled only gets
> 0.01 of the period on CPU..  And the count is for that CONNECTION only.
OK, that explains the low absolute levels of the counts, but if the
counts are for a regular backend then there's still a lot of bogosity
here.  Backends won't be taking the CheckpointLock at all, nor do they
take CheckpointStartLock in exclusive mode.  The bgwriter would do that
but it'd not be taking most of these other locks.  So I think the script
is mislabeling the locks somehow.
Also, elementary statistics should tell you that a sample taken as above
is going to have enormous amounts of noise.  You should be sampling over
a much longer period, say on the order of a minute of runtime, to have
numbers that are trustworthy.
            regards, tom lane
			
		On Thu, 2007-07-26 at 10:29 -0400, Jignesh K. Shah wrote: > The count is only for a 10-second snapshot.. Plus remember there are > about 1000 users running so the connection being profiled only gets > 0.01 of the period on CPU.. And the count is for that CONNECTION only. Is that for one process, or all processes aggregated in some way? > CheckpointStartLock Shared 6 > CheckpointStartLock Exclusive 102 That's definitely whacked. Surely we didn't start 102 checkpoints yet attempt to commit 6 times? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
I will look for runs with longer samples..
Now the script could have mislabeled lock names.. Anyway digging into
the one that seems to increase over time... I did stack profiles on how
that increases... and here are some numbers..
For  600-850 Users: that potential mislabeled CheckPointStartLock  or
LockID==12 comes from various sources where the top source (while system
is still doing great) comes from:
              postgres`LWLockAcquire+0x1c8
              postgres`SimpleLruReadPage_ReadOnly+0xc
              postgres`TransactionIdGetStatus+0x14
              postgres`TransactionLogFetch+0x58
              postgres`TransactionIdDidCommit+0x4
              postgres`HeapTupleSatisfiesSnapshot+0x234
              postgres`heap_release_fetch+0x1a8
              postgres`index_getnext+0xf4
              postgres`IndexNext+0x7c
              postgres`ExecScan+0x8c
              postgres`ExecProcNode+0xb4
              postgres`ExecutePlan+0xdc
              postgres`ExecutorRun+0xb0
              postgres`PortalRunSelect+0x9c
              postgres`PortalRun+0x244
              postgres`exec_execute_message+0x3a0
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
          8202100
              postgres`LWLockAcquire+0x1c8
              postgres`TransactionIdSetStatus+0x1c
              postgres`RecordTransactionCommit+0x2a8
              postgres`CommitTransaction+0xc8
              postgres`CommitTransactionCommand+0x90
              postgres`finish_xact_command+0x60
              postgres`exec_execute_message+0x3d8
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
              postgres`main+0x394
              postgres`_start+0x108
         30822100
However at 900 Users where the big drop in throughput occurs:
It gives a different top "consumer" of time:
              postgres`LWLockAcquire+0x1c8
              postgres`TransactionIdSetStatus+0x1c
              postgres`RecordTransactionCommit+0x2a8
              postgres`CommitTransaction+0xc8
              postgres`CommitTransactionCommand+0x90
              postgres`finish_xact_command+0x60
              postgres`exec_execute_message+0x3d8
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
              postgres`main+0x394
              postgres`_start+0x108
        406601300
              postgres`LWLockAcquire+0x1c8
              postgres`SimpleLruReadPage+0x1ac
              postgres`TransactionIdGetStatus+0x14
              postgres`TransactionLogFetch+0x58
              postgres`TransactionIdDidCommit+0x4
              postgres`HeapTupleSatisfiesUpdate+0x360
              postgres`heap_lock_tuple+0x27c
              postgres`ExecutePlan+0x33c
              postgres`ExecutorRun+0xb0
              postgres`PortalRunSelect+0x9c
              postgres`PortalRun+0x244
              postgres`exec_execute_message+0x3a0
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
              postgres`main+0x394
              postgres`_start+0x108
        444523100
              postgres`LWLockAcquire+0x1c8
              postgres`SimpleLruReadPage+0x1ac
              postgres`TransactionIdGetStatus+0x14
              postgres`TransactionLogFetch+0x58
              postgres`TransactionIdDidCommit+0x4
              postgres`HeapTupleSatisfiesSnapshot+0x234
              postgres`heap_release_fetch+0x1a8
              postgres`index_getnext+0xf4
              postgres`IndexNext+0x7c
              postgres`ExecScan+0x8c
              postgres`ExecProcNode+0xb4
              postgres`ExecutePlan+0xdc
              postgres`ExecutorRun+0xb0
              postgres`PortalRunSelect+0x9c
              postgres`PortalRun+0x244
              postgres`exec_execute_message+0x3a0
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
       1661300000
Maybe you all will understand more than I do on what it does here..
Looks like IndexNext has a problem at high number of users to me.. but I
could be wrong..
-Jignesh
Tom Lane wrote:
> "Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
>
>> The count is only for a 10-second snapshot.. Plus remember there are
>> about 1000 users running so the connection  being profiled only gets
>> 0.01 of the period on CPU..  And the count is for that CONNECTION only.
>>
>
> OK, that explains the low absolute levels of the counts, but if the
> counts are for a regular backend then there's still a lot of bogosity
> here.  Backends won't be taking the CheckpointLock at all, nor do they
> take CheckpointStartLock in exclusive mode.  The bgwriter would do that
> but it'd not be taking most of these other locks.  So I think the script
> is mislabeling the locks somehow.
>
> Also, elementary statistics should tell you that a sample taken as above
> is going to have enormous amounts of noise.  You should be sampling over
> a much longer period, say on the order of a minute of runtime, to have
> numbers that are trustworthy.
>
>             regards, tom lane
>
			
		"Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
> For  600-850 Users: that potential mislabeled CheckPointStartLock  or
> LockID==12 comes from various sources where the top source (while system
> is still doing great) comes from:
>               postgres`LWLockAcquire+0x1c8
>               postgres`SimpleLruReadPage_ReadOnly+0xc
>               postgres`TransactionIdGetStatus+0x14
That path would be taking CLogControlLock ... so you're off by at least
one.  Compare the script to src/include/storage/lwlock.h.
            regards, tom lane
			
		On Thu, 2007-07-26 at 11:27 -0400, Jignesh K. Shah wrote:
> However at 900 Users where the big drop in throughput occurs:
> It gives a different top "consumer" of time:
                postgres`LWLockAcquire+0x1c8
>               postgres`SimpleLruReadPage+0x1ac
>               postgres`TransactionIdGetStatus+0x14
>               postgres`TransactionLogFetch+0x58
TransactionIdGetStatus doesn't directly call SimpleLruReadPage().
Presumably the compiler has been rearranging things??
Looks like you're out of clog buffers. It seems like the clog buffers
aren't big enough to hold clog pages for long enough and the SELECT FOR
SHARE processing is leaving lots of additional read locks that are
increasing the number of clog requests for older xids.
Try the enclosed patch.
--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com
			
		Вложения
BEAUTIFUL!!!
Using the Patch I can now go upto 1300 users without dropping.. But now
it still repeats at 1300-1350 users..
I corrected the Lock Descriptions based on what I got from lwlock.h and
retried the whole scalability again with profiling again.. This time it
looks like the ProcArrayLock
bash-3.00# echo 600 users
600 users
bash-3.00# ./4_lwlock_waits.d 7056
             Lock Id            Mode           Count
          XidGenLock       Exclusive               1
     CLogControlLock          Shared               2
          XidGenLock          Shared               2
        WALWriteLock       Exclusive               4
       WALInsertLock       Exclusive               8
     CLogControlLock       Exclusive               9
       ProcArrayLock       Exclusive               9
             Lock Id   Combined Time (ns)
        WALWriteLock              2842300
          XidGenLock              4951000
     CLogControlLock             11151800
       WALInsertLock             13035600
       ProcArrayLock             20040000
bash-3.00# echo 700 users
700 users
bash-3.00# ./4_lwlock_waits.d 7056
             Lock Id            Mode           Count
          XidGenLock       Exclusive               1
        WALWriteLock       Exclusive               2
          XidGenLock          Shared               2
     CLogControlLock          Shared               3
     CLogControlLock       Exclusive               8
       WALInsertLock       Exclusive               9
       ProcArrayLock       Exclusive              22
             Lock Id   Combined Time (ns)
          XidGenLock              4093300
        WALWriteLock              4914800
       WALInsertLock              7389100
       ProcArrayLock             10248200
     CLogControlLock             11989400
bash-3.00# echo 800 users
800 users
bash-3.00# ./4_lwlock_waits.d 7056
             Lock Id            Mode           Count
        WALWriteLock       Exclusive               1
          XidGenLock          Shared               2
     CLogControlLock          Shared               3
     CLogControlLock       Exclusive               7
       WALInsertLock       Exclusive               7
       ProcArrayLock       Exclusive              31
             Lock Id   Combined Time (ns)
        WALWriteLock               319100
          XidGenLock              5388700
       WALInsertLock              9901400
     CLogControlLock             13465000
       ProcArrayLock             42979700
bash-3.00# echo 900 users
900 users
bash-3.00# ./4_lwlock_waits.d 7056
             Lock Id            Mode           Count
     CLogControlLock          Shared               1
          XidGenLock       Exclusive               1
        WALWriteLock       Exclusive               2
     CLogControlLock       Exclusive               6
       WALInsertLock       Exclusive               9
       ProcArrayLock       Exclusive              25
             Lock Id   Combined Time (ns)
          XidGenLock              3197700
        WALWriteLock              3887100
     CLogControlLock             15774500
       WALInsertLock             38268700
       ProcArrayLock            162073100
bash-3.00# ./6_lwlock_stack.d 4 7056
             Lock Id            Mode           Count
       ProcArrayLock          Shared               1
       ProcArrayLock       Exclusive              67
             Lock Id   Combined Time (ns)
       ProcArrayLock            216773800
             Lock Id   Combined Time (ns)
              postgres`LWLockAcquire+0x1c8
              postgres`GetSnapshotData+0x118
              postgres`GetTransactionSnapshot+0x5c
              postgres`PortalStart+0x150
              postgres`exec_bind_message+0x81c
              postgres`PostgresMain+0x12b8
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
              postgres`main+0x394
              postgres`_start+0x108
          2779000
              postgres`LWLockAcquire+0x1c8
              postgres`CommitTransaction+0xe0
              postgres`CommitTransactionCommand+0x90
              postgres`finish_xact_command+0x60
              postgres`exec_execute_message+0x3d8
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
              postgres`main+0x394
              postgres`_start+0x108
        213994800
bash-3.00# echo 1000 users - SLIGHT DROP
1000 users
bash-3.00# ./4_lwlock_waits.d 7056
             Lock Id            Mode           Count
        WALWriteLock       Exclusive               1
     CLogControlLock       Exclusive               2
          XidGenLock          Shared               2
     CLogControlLock          Shared               3
       WALInsertLock       Exclusive               4
       ProcArrayLock       Exclusive              26
             Lock Id   Combined Time (ns)
        WALWriteLock              1807400
          XidGenLock              2024000
       WALInsertLock              2177500
     CLogControlLock              9064700
       ProcArrayLock            199216000
bash-3.00# ./6_lwlock_stack.d 4 7056
             Lock Id            Mode           Count
       ProcArrayLock          Shared               3
       ProcArrayLock       Exclusive              38
             Lock Id   Combined Time (ns)
       ProcArrayLock            858238600
             Lock Id   Combined Time (ns)
              postgres`LWLockAcquire+0x1c8
              postgres`TransactionIdIsInProgress+0x50
              postgres`HeapTupleSatisfiesVacuum+0x2ec
              postgres`_bt_check_unique+0x2a0
              postgres`_bt_doinsert+0x98
              postgres`btinsert+0x54
              postgres`FunctionCall6+0x44
              postgres`index_insert+0x90
              postgres`ExecInsertIndexTuples+0x1bc
              postgres`ExecUpdate+0x500
              postgres`ExecutePlan+0x704
              postgres`ExecutorRun+0x60
              postgres`PortalRunMulti+0x2a0
              postgres`PortalRun+0x310
              postgres`exec_execute_message+0x3a0
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
              postgres`main+0x394
           167100
              postgres`LWLockAcquire+0x1c8
              postgres`GetSnapshotData+0x118
              postgres`GetTransactionSnapshot+0x5c
              postgres`PortalRunMulti+0x22c
              postgres`PortalRun+0x310
              postgres`exec_execute_message+0x3a0
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
              postgres`main+0x394
              postgres`_start+0x108
          7125900
              postgres`LWLockAcquire+0x1c8
              postgres`CommitTransaction+0xe0
              postgres`CommitTransactionCommand+0x90
              postgres`finish_xact_command+0x60
              postgres`exec_execute_message+0x3d8
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
              postgres`main+0x394
              postgres`_start+0x108
        850945600
bash-3.00# echo 1100 users - DROP ....
1100 users
bash-3.00# ./4_lwlock_waits.d 7056
             Lock Id            Mode           Count
     CLogControlLock          Shared               1
        WALWriteLock       Exclusive               1
          XidGenLock       Exclusive               1
       ProcArrayLock          Shared               2
       WALInsertLock       Exclusive               2
          XidGenLock          Shared               2
     CLogControlLock       Exclusive               4
       ProcArrayLock       Exclusive              20
             Lock Id   Combined Time (ns)
        WALWriteLock              4179500
          XidGenLock              6249400
     CLogControlLock             20411100
       WALInsertLock             29707600
       ProcArrayLock            207923700
bash-3.00# ./6_lwlock_stack.d 4 7056
             Lock Id            Mode           Count
       ProcArrayLock       Exclusive              40
             Lock Id   Combined Time (ns)
       ProcArrayLock            692242100
             Lock Id   Combined Time (ns)
              postgres`LWLockAcquire+0x1c8
              postgres`CommitTransaction+0xe0
              postgres`CommitTransactionCommand+0x90
              postgres`finish_xact_command+0x60
              postgres`exec_execute_message+0x3d8
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
              postgres`main+0x394
              postgres`_start+0x108
        692242100
bash-3.00#
LockID for ProcArrayLock is 4 or the 5 entry in lwlock.h which seems to
indicate it is lwlock.h
Any tweaks for that?
-Jignesh
Simon Riggs wrote:
> On Thu, 2007-07-26 at 11:27 -0400, Jignesh K. Shah wrote:
>
>
>> However at 900 Users where the big drop in throughput occurs:
>> It gives a different top "consumer" of time:
>>
>
>
>                 postgres`LWLockAcquire+0x1c8
>
>>               postgres`SimpleLruReadPage+0x1ac
>>               postgres`TransactionIdGetStatus+0x14
>>               postgres`TransactionLogFetch+0x58
>>
>
> TransactionIdGetStatus doesn't directly call SimpleLruReadPage().
> Presumably the compiler has been rearranging things??
>
> Looks like you're out of clog buffers. It seems like the clog buffers
> aren't big enough to hold clog pages for long enough and the SELECT FOR
> SHARE processing is leaving lots of additional read locks that are
> increasing the number of clog requests for older xids.
>
> Try the enclosed patch.
>
>
> ------------------------------------------------------------------------
>
> Index: src/include/access/clog.h
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/src/include/access/clog.h,v
> retrieving revision 1.19
> diff -c -r1.19 clog.h
> *** src/include/access/clog.h    5 Jan 2007 22:19:50 -0000    1.19
> --- src/include/access/clog.h    26 Jul 2007 15:44:58 -0000
> ***************
> *** 29,35 ****
>
>
>   /* Number of SLRU buffers to use for clog */
> ! #define NUM_CLOG_BUFFERS    8
>
>
>   extern void TransactionIdSetStatus(TransactionId xid, XidStatus status);
> --- 29,35 ----
>
>
>   /* Number of SLRU buffers to use for clog */
> ! #define NUM_CLOG_BUFFERS    64
>
>
>   extern void TransactionIdSetStatus(TransactionId xid, XidStatus status);
>
> ------------------------------------------------------------------------
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>
			
		Tom Lane wrote: > > That path would be taking CLogControlLock ... so you're off by at least > one. Compare the script to src/include/storage/lwlock.h. > Indeed, the indexing was off by one due to the removal of BuffMappingLock in src/include/storage/lwlock.h between 8.1 and 8.2 which was not updated in the DTrace script. Thanks, Robert
On Thu, 2007-07-26 at 15:44 -0400, Jignesh K. Shah wrote: > BEAUTIFUL!!! > > Using the Patch I can now go upto 1300 users without dropping.. But now > it still repeats at 1300-1350 users.. OK, can you try again with 16 and 32 buffers please? We need to know how many is enough and whether this number needs to be variable via a parameter, or just slightly larger by default. Thanks. > I corrected the Lock Descriptions based on what I got from lwlock.h and > retried the whole scalability again with profiling again.. This time it > looks like the ProcArrayLock That's what I would expect with that many users. > Lock Id Mode Count > XidGenLock Exclusive 1 > CLogControlLock Shared 2 > XidGenLock Shared 2 > WALWriteLock Exclusive 4 > WALInsertLock Exclusive 8 > CLogControlLock Exclusive 9 > ProcArrayLock Exclusive 9 ...but as Tom mentioned, we need to do longer runs now so these counts get to somewhere in the hundreds so we have some statistical validity. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Will try 16 and 32  CLOGBUFFER tomorrow:
But here is locks data again with about increased time profiling (about
2 minutes) for the connection with about 2000 users:
bash-3.00# time ./4_lwlock_waits.d 13583
^C
             Lock Id            Mode           Count
       ProcArrayLock          Shared               5
          XidGenLock       Exclusive              13
     CLogControlLock          Shared              14
          XidGenLock          Shared              15
     CLogControlLock       Exclusive              21
       WALInsertLock       Exclusive              77
        WALWriteLock       Exclusive             175
       ProcArrayLock       Exclusive             275
             Lock Id   Combined Time (ns)
          XidGenLock            194966200
       WALInsertLock            517955000
     CLogControlLock            679665100
        WALWriteLock           2838716200
       ProcArrayLock          44181002600
Top Wait time   seems to come from the following code path for
ProcArrayLock:
             Lock Id            Mode           Count
       ProcArrayLock       Exclusive              21
             Lock Id   Combined Time (ns)
       ProcArrayLock           5255937500
             Lock Id   Combined Time (ns)
              postgres`LWLockAcquire+0x1f0
              postgres`CommitTransaction+0x104
              postgres`CommitTransactionCommand+0xbc
              postgres`finish_xact_command+0x78
              postgres`exec_execute_message+0x42c
              postgres`PostgresMain+0x1838
              postgres`BackendRun+0x2f8
              postgres`ServerLoop+0x680
              postgres`PostmasterMain+0xda8
              postgres`main+0x3d0
              postgres`_start+0x17c
       5255937500
Regards,
Jignesh
Simon Riggs wrote:
> On Thu, 2007-07-26 at 15:44 -0400, Jignesh K. Shah wrote:
>
>> BEAUTIFUL!!!
>>
>> Using the Patch I can now go upto 1300 users without dropping.. But now
>> it still repeats at 1300-1350 users..
>>
>
> OK, can you try again with 16 and 32 buffers please?  We need to know
> how many is enough and whether this number needs to be variable via a
> parameter, or just slightly larger by default. Thanks.
>
>
>> I corrected the Lock Descriptions based on what I got from lwlock.h and
>> retried the whole scalability again with profiling again.. This time it
>> looks like the ProcArrayLock
>>
>
> That's what I would expect with that many users.
>
>
>>              Lock Id            Mode           Count
>>           XidGenLock       Exclusive               1
>>      CLogControlLock          Shared               2
>>           XidGenLock          Shared               2
>>         WALWriteLock       Exclusive               4
>>        WALInsertLock       Exclusive               8
>>      CLogControlLock       Exclusive               9
>>        ProcArrayLock       Exclusive               9
>>
>
> ...but as Tom mentioned, we need to do longer runs now so these counts
> get to somewhere in the hundreds so we have some statistical validity.
>
>
			
		Jignesh K. Shah wrote: > Top Wait time seems to come from the following code path for > ProcArrayLock: > > Lock Id Mode Count > ProcArrayLock Exclusive 21 > > Lock Id Combined Time (ns) > ProcArrayLock 5255937500 > > Lock Id Combined Time (ns) > > > postgres`LWLockAcquire+0x1f0 > postgres`CommitTransaction+0x104 Yeah, ProcArrayLock is pretty contended. I think it would be kinda neat if we could split it up in partitions. This lock is quite particular though. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Fri, 2007-07-27 at 04:58 -0400, Alvaro Herrera wrote: > Jignesh K. Shah wrote: > > > Top Wait time seems to come from the following code path for > > ProcArrayLock: > > > > Lock Id Mode Count > > ProcArrayLock Exclusive 21 > > > > Lock Id Combined Time (ns) > > ProcArrayLock 5255937500 > > > > Lock Id Combined Time (ns) > > > > > > postgres`LWLockAcquire+0x1f0 > > postgres`CommitTransaction+0x104 > > Yeah, ProcArrayLock is pretty contended. I think it would be kinda neat > if we could split it up in partitions. This lock is quite particular > though. Maybe, if we did we should set the partitions according to numbers of users, so lower numbers of users are all in one partition. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Thu, 2007-07-26 at 17:17 -0400, Jignesh K. Shah wrote:
>              Lock Id   Combined Time (ns)
>           XidGenLock            194966200
>        WALInsertLock            517955000
>      CLogControlLock            679665100
>         WALWriteLock           2838716200
>        ProcArrayLock          44181002600
Is this the time the lock is held for or the time that we wait for that
lock? It would be good to see the break down of time separately for
shared and exclusive.
Can we have a table like this:
    LockId,LockMode,SumTimeLockHeld,SumTimeLockWait
> Top Wait time   seems to come from the following code path for
> ProcArrayLock:
>
>              Lock Id            Mode           Count
>        ProcArrayLock       Exclusive              21
>
>              Lock Id   Combined Time (ns)
>        ProcArrayLock           5255937500
>
>              Lock Id   Combined Time (ns)
>
>
>               postgres`LWLockAcquire+0x1f0
>               postgres`CommitTransaction+0x104
>               postgres`CommitTransactionCommand+0xbc
>               postgres`finish_xact_command+0x78
Well thats pretty weird. That code path clearly only happens once per
transaction and ought to be fast. The other code paths that take
ProcArrayLock like TransactionIdIsInProgress() and GetSnapshotData()
ought to spend more time holding the lock. Presumably you are running
with a fair number of SERIALIZABLE transactions?
Are you running with commit_delay > 0? Its possible that the call to
CountActiveBackends() is causing pinging of the procarray by other
backends while we're trying to read it during CommitTransaction(). If
so, try the attached patch.
--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com
			
		Вложения
Yes I can try to breakup the Shared and exclusive time.. Also yes I use commit delays =10, it helps out a lot in reducing IO load.. I will try out the patch soon. -Jignesh Simon Riggs wrote: > On Thu, 2007-07-26 at 17:17 -0400, Jignesh K. Shah wrote: > >> Lock Id Combined Time (ns) >> XidGenLock 194966200 >> WALInsertLock 517955000 >> CLogControlLock 679665100 >> WALWriteLock 2838716200 >> ProcArrayLock 44181002600 >> > > Is this the time the lock is held for or the time that we wait for that > lock? It would be good to see the break down of time separately for > shared and exclusive. > > Can we have a table like this: > LockId,LockMode,SumTimeLockHeld,SumTimeLockWait > > >> Top Wait time seems to come from the following code path for >> ProcArrayLock: >> >> Lock Id Mode Count >> ProcArrayLock Exclusive 21 >> >> Lock Id Combined Time (ns) >> ProcArrayLock 5255937500 >> >> Lock Id Combined Time (ns) >> >> >> postgres`LWLockAcquire+0x1f0 >> postgres`CommitTransaction+0x104 >> postgres`CommitTransactionCommand+0xbc >> postgres`finish_xact_command+0x78 >> > > Well thats pretty weird. That code path clearly only happens once per > transaction and ought to be fast. The other code paths that take > ProcArrayLock like TransactionIdIsInProgress() and GetSnapshotData() > ought to spend more time holding the lock. Presumably you are running > with a fair number of SERIALIZABLE transactions? > > Are you running with commit_delay > 0? Its possible that the call to > CountActiveBackends() is causing pinging of the procarray by other > backends while we're trying to read it during CommitTransaction(). If > so, try the attached patch. > >
I tried CLOG Buffers 32 and the performance is as good as 64 bit.. (I
havent tried 16 yet though.. ) I am going to try your second patch now..
Also here is the breakup by Mode. The combined time is the total time it
waits for all counts.
             Lock Id            Mode           Count
       ProcArrayLock          Shared               1
     CLogControlLock       Exclusive               4
     CLogControlLock          Shared               4
          XidGenLock          Shared               4
          XidGenLock       Exclusive               7
       WALInsertLock       Exclusive              21
        WALWriteLock       Exclusive              62
       ProcArrayLock       Exclusive              79
             Lock Id            Mode    Combined Time (ns)
     CLogControlLock        Exclusive               325200
     CLogControlLock           Shared              4509200
          XidGenLock        Exclusive             11839600
       ProcArrayLock           Shared             40506600
          XidGenLock           Shared            119013700
       WALInsertLock        Exclusive            148063100
        WALWriteLock        Exclusive            347052100
       ProcArrayLock        Exclusive           1054780600
Here is another one at higher user count 1600:
bash-3.00# ./4_lwlock_waits.d 9208
             Lock Id            Mode           Count
     CLogControlLock       Exclusive               1
     CLogControlLock          Shared               2
          XidGenLock          Shared               7
       WALInsertLock       Exclusive              12
        WALWriteLock       Exclusive              50
       ProcArrayLock       Exclusive              82
             Lock Id            Mode   Combined Time (ns)
     CLogControlLock        Exclusive                27300
          XidGenLock           Shared             14689300
     CLogControlLock           Shared             72664900
       WALInsertLock        Exclusive            101431300
        WALWriteLock        Exclusive            534357400
       ProcArrayLock        Exclusive           4110350300
Now I will try with your second patch.
Regards,
Jignesh
Simon Riggs wrote:
> On Thu, 2007-07-26 at 17:17 -0400, Jignesh K. Shah wrote:
>
>>              Lock Id   Combined Time (ns)
>>           XidGenLock            194966200
>>        WALInsertLock            517955000
>>      CLogControlLock            679665100
>>         WALWriteLock           2838716200
>>        ProcArrayLock          44181002600
>>
>
> Is this the time the lock is held for or the time that we wait for that
> lock? It would be good to see the break down of time separately for
> shared and exclusive.
>
> Can we have a table like this:
>     LockId,LockMode,SumTimeLockHeld,SumTimeLockWait
>
>
>> Top Wait time   seems to come from the following code path for
>> ProcArrayLock:
>>
>>              Lock Id            Mode           Count
>>        ProcArrayLock       Exclusive              21
>>
>>              Lock Id   Combined Time (ns)
>>        ProcArrayLock           5255937500
>>
>>              Lock Id   Combined Time (ns)
>>
>>
>>               postgres`LWLockAcquire+0x1f0
>>               postgres`CommitTransaction+0x104
>>               postgres`CommitTransactionCommand+0xbc
>>               postgres`finish_xact_command+0x78
>>
>
> Well thats pretty weird. That code path clearly only happens once per
> transaction and ought to be fast. The other code paths that take
> ProcArrayLock like TransactionIdIsInProgress() and GetSnapshotData()
> ought to spend more time holding the lock. Presumably you are running
> with a fair number of SERIALIZABLE transactions?
>
> Are you running with commit_delay > 0? Its possible that the call to
> CountActiveBackends() is causing pinging of the procarray by other
> backends while we're trying to read it during CommitTransaction(). If
> so, try the attached patch.
>
>
> ------------------------------------------------------------------------
>
> Index: src/backend/access/transam/xact.c
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xact.c,v
> retrieving revision 1.245
> diff -c -r1.245 xact.c
> *** src/backend/access/transam/xact.c    7 Jun 2007 21:45:58 -0000    1.245
> --- src/backend/access/transam/xact.c    27 Jul 2007 09:09:08 -0000
> ***************
> *** 820,827 ****
>                * are fewer than CommitSiblings other backends with active
>                * transactions.
>                */
> !             if (CommitDelay > 0 && enableFsync &&
> !                 CountActiveBackends() >= CommitSiblings)
>                   pg_usleep(CommitDelay);
>
>               XLogFlush(recptr);
> --- 820,826 ----
>                * are fewer than CommitSiblings other backends with active
>                * transactions.
>                */
> !             if (CommitDelay > 0 && enableFsync)
>                   pg_usleep(CommitDelay);
>
>               XLogFlush(recptr);
>
> ------------------------------------------------------------------------
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>
			
		Using CLOG Buffers 32 and the commit sibling check patch I still see a
drop at 1200-1300 users..
bash-3.00# ./4_lwlock_waits.d 18250
             Lock Id            Mode           Count
          XidGenLock          Shared               1
     CLogControlLock          Shared               2
       ProcArrayLock          Shared               2
          XidGenLock       Exclusive               4
     CLogControlLock       Exclusive              15
       WALInsertLock       Exclusive              18
        WALWriteLock       Exclusive              38
       ProcArrayLock       Exclusive              77
             Lock Id            Mode   Combined Time (ns)
          XidGenLock          Shared                88700
       WALInsertLock       Exclusive             69556000
       ProcArrayLock          Shared             95656800
          XidGenLock       Exclusive            139634100
     CLogControlLock       Exclusive            148822200
     CLogControlLock          Shared            161630000
        WALWriteLock       Exclusive            332781800
       ProcArrayLock       Exclusive           5688265500
bash-3.00# ./4_lwlock_waits.d 18599
             Lock Id            Mode           Count
       ProcArrayLock          Shared               2
          XidGenLock       Exclusive               3
          XidGenLock          Shared               4
     CLogControlLock          Shared               5
       WALInsertLock       Exclusive              10
     CLogControlLock       Exclusive              21
        WALWriteLock       Exclusive              28
       ProcArrayLock       Exclusive              54
             Lock Id            Mode   Combined Time (ns)
          XidGenLock       Exclusive              5688800
       WALInsertLock       Exclusive             11424700
     CLogControlLock          Shared             55589100
       ProcArrayLock          Shared            135220400
        WALWriteLock       Exclusive            177906900
          XidGenLock          Shared            524146500
     CLogControlLock       Exclusive            524563900
       ProcArrayLock       Exclusive           5828744500
bash-3.00#
bash-3.00# ./6_lwlock_stack.d 4 18599
             Lock Id            Mode           Count
       ProcArrayLock          Shared               1
       ProcArrayLock       Exclusive              52
             Lock Id            Mode   Combined Time (ns)
       ProcArrayLock          Shared             41428300
       ProcArrayLock       Exclusive           3858386500
             Lock Id   Combined Time (ns)
              postgres`LWLockAcquire+0x1f0
              postgres`GetSnapshotData+0x120
              postgres`GetTransactionSnapshot+0x80
              postgres`PortalStart+0x198
              postgres`exec_bind_message+0x84c
              postgres`PostgresMain+0x17f8
              postgres`BackendRun+0x2f8
              postgres`ServerLoop+0x680
              postgres`PostmasterMain+0xda8
              postgres`main+0x3d0
              postgres`_start+0x17c
  Shared                                                     41428300
              postgres`LWLockAcquire+0x1f0
              postgres`CommitTransaction+0x104
              postgres`CommitTransactionCommand+0xbc
              postgres`finish_xact_command+0x78
              postgres`exec_execute_message+0x42c
              postgres`PostgresMain+0x1838
              postgres`BackendRun+0x2f8
              postgres`ServerLoop+0x680
              postgres`PostmasterMain+0xda8
              postgres`main+0x3d0
              postgres`_start+0x17c
  Exclusive                                                3858386500
-Jignesh
			
		With CLOG 16 the drp[s comes at about 1150 users with the following lock
stats
bash-3.00# ./4_lwlock_waits.d 16404
             Lock Id            Mode           Count
       ProcArrayLock          Shared               2
          XidGenLock       Exclusive               2
          XidGenLock          Shared               4
       WALInsertLock       Exclusive               7
     CLogControlLock          Shared               8
        WALWriteLock       Exclusive              46
       ProcArrayLock       Exclusive              64
     CLogControlLock       Exclusive             263
             Lock Id            Mode   Combined Time (ns)
          XidGenLock       Exclusive               528300
       ProcArrayLock          Shared               968800
       WALInsertLock       Exclusive              4090900
          XidGenLock          Shared             73987600
        WALWriteLock       Exclusive             86200700
       ProcArrayLock       Exclusive            130756000
     CLogControlLock          Shared            240471000
     CLogControlLock       Exclusive           4115158500
So I think 32  is a better option for CLogs before ProcArrayLock becomes
the bottleneck.
Though I havent seen what we can do with ProcArrayLock problem.
Regards,
Jignesh
Jignesh K. Shah wrote:
> Using CLOG Buffers 32 and the commit sibling check patch I still see a
> drop at 1200-1300 users..
>
>
>
> bash-3.00# ./4_lwlock_waits.d 18250
>
>             Lock Id            Mode           Count
>          XidGenLock          Shared               1
>     CLogControlLock          Shared               2
>       ProcArrayLock          Shared               2
>          XidGenLock       Exclusive               4
>     CLogControlLock       Exclusive              15
>       WALInsertLock       Exclusive              18
>        WALWriteLock       Exclusive              38
>       ProcArrayLock       Exclusive              77
>
>             Lock Id            Mode   Combined Time (ns)
>          XidGenLock          Shared                88700
>       WALInsertLock       Exclusive             69556000
>       ProcArrayLock          Shared             95656800
>          XidGenLock       Exclusive            139634100
>     CLogControlLock       Exclusive            148822200
>     CLogControlLock          Shared            161630000
>        WALWriteLock       Exclusive            332781800
>       ProcArrayLock       Exclusive           5688265500
>
> bash-3.00# ./4_lwlock_waits.d 18599
>
>             Lock Id            Mode           Count
>       ProcArrayLock          Shared               2
>          XidGenLock       Exclusive               3
>          XidGenLock          Shared               4
>     CLogControlLock          Shared               5
>       WALInsertLock       Exclusive              10
>     CLogControlLock       Exclusive              21
>        WALWriteLock       Exclusive              28
>       ProcArrayLock       Exclusive              54
>
>             Lock Id            Mode   Combined Time (ns)
>          XidGenLock       Exclusive              5688800
>       WALInsertLock       Exclusive             11424700
>     CLogControlLock          Shared             55589100
>       ProcArrayLock          Shared            135220400
>        WALWriteLock       Exclusive            177906900
>          XidGenLock          Shared            524146500
>     CLogControlLock       Exclusive            524563900
>       ProcArrayLock       Exclusive           5828744500
>
> bash-3.00#
> bash-3.00# ./6_lwlock_stack.d 4 18599
>
>             Lock Id            Mode           Count
>       ProcArrayLock          Shared               1
>       ProcArrayLock       Exclusive              52
>
>             Lock Id            Mode   Combined Time (ns)
>       ProcArrayLock          Shared             41428300
>       ProcArrayLock       Exclusive           3858386500
>
>             Lock Id   Combined Time (ns)
>
>
>              postgres`LWLockAcquire+0x1f0
>              postgres`GetSnapshotData+0x120
>              postgres`GetTransactionSnapshot+0x80
>              postgres`PortalStart+0x198
>              postgres`exec_bind_message+0x84c
>              postgres`PostgresMain+0x17f8
>              postgres`BackendRun+0x2f8
>              postgres`ServerLoop+0x680
>              postgres`PostmasterMain+0xda8
>              postgres`main+0x3d0
>              postgres`_start+0x17c
>  Shared                                                     41428300
>
>              postgres`LWLockAcquire+0x1f0
>              postgres`CommitTransaction+0x104
>              postgres`CommitTransactionCommand+0xbc
>              postgres`finish_xact_command+0x78
>              postgres`exec_execute_message+0x42c
>              postgres`PostgresMain+0x1838
>              postgres`BackendRun+0x2f8
>              postgres`ServerLoop+0x680
>              postgres`PostmasterMain+0xda8
>              postgres`main+0x3d0
>              postgres`_start+0x17c
>  Exclusive                                                3858386500
>
>
> -Jignesh
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
			
		Simon, > Well thats pretty weird. That code path clearly only happens once per > transaction and ought to be fast. The other code paths that take > ProcArrayLock like TransactionIdIsInProgress() and GetSnapshotData() > ought to spend more time holding the lock. Presumably you are running > with a fair number of SERIALIZABLE transactions? Given that this is TPCC-analog, I'd assume that we are. Jignesh? -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
Yep quite a bit of transactions .. But the piece that's slow is where it is clearing it up in CommitTransaction(). I am not sure of how ProcArrayLock is designed to work and hence not clear what we are seeing is what we expect. Do we have some design doc on ProcArrayLock to understand its purpose? Thanks. Regards, Jignesh Josh Berkus wrote: > Simon, > > >> Well thats pretty weird. That code path clearly only happens once per >> transaction and ought to be fast. The other code paths that take >> ProcArrayLock like TransactionIdIsInProgress() and GetSnapshotData() >> ought to spend more time holding the lock. Presumably you are running >> with a fair number of SERIALIZABLE transactions? >> > > Given that this is TPCC-analog, I'd assume that we are. > > Jignesh? > >
Hi Simon, This patch seems to work well (both with 32 and 64 value but not with 16 and the default 8). Is there a way we can integrate this in 8.3? This will improve out of box performance quite a bit for high number of users (atleat 30% in my OLTP test) Regards, Jignesh Simon Riggs wrote: > On Thu, 2007-07-26 at 11:27 -0400, Jignesh K. Shah wrote: > > >> However at 900 Users where the big drop in throughput occurs: >> It gives a different top "consumer" of time: >> > > > postgres`LWLockAcquire+0x1c8 > >> postgres`SimpleLruReadPage+0x1ac >> postgres`TransactionIdGetStatus+0x14 >> postgres`TransactionLogFetch+0x58 >> > > TransactionIdGetStatus doesn't directly call SimpleLruReadPage(). > Presumably the compiler has been rearranging things?? > > Looks like you're out of clog buffers. It seems like the clog buffers > aren't big enough to hold clog pages for long enough and the SELECT FOR > SHARE processing is leaving lots of additional read locks that are > increasing the number of clog requests for older xids. > > Try the enclosed patch. > > > ------------------------------------------------------------------------ > > Index: src/include/access/clog.h > =================================================================== > RCS file: /projects/cvsroot/pgsql/src/include/access/clog.h,v > retrieving revision 1.19 > diff -c -r1.19 clog.h > *** src/include/access/clog.h 5 Jan 2007 22:19:50 -0000 1.19 > --- src/include/access/clog.h 26 Jul 2007 15:44:58 -0000 > *************** > *** 29,35 **** > > > /* Number of SLRU buffers to use for clog */ > ! #define NUM_CLOG_BUFFERS 8 > > > extern void TransactionIdSetStatus(TransactionId xid, XidStatus status); > --- 29,35 ---- > > > /* Number of SLRU buffers to use for clog */ > ! #define NUM_CLOG_BUFFERS 64 > > > extern void TransactionIdSetStatus(TransactionId xid, XidStatus status); > > ------------------------------------------------------------------------ > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
On Fri, 2007-08-03 at 16:09 -0400, Jignesh K. Shah wrote: > This patch seems to work well (both with 32 and 64 value but not with 16 > and the default 8). Could you test at 24 please also? Tom has pointed out the additional cost of setting this higher, even in workloads that don't benefit from the I/O-induced contention reduction. > Is there a way we can integrate this in 8.3? I just replied to Josh's thread on -hackers about this. > This will improve out of box performance quite a bit for high number of > users (atleat 30% in my OLTP test) Yes, thats good. Will this have a dramatic effect on a particular benchmark, or for what reason might we need this? Tom has questioned the use case here, so I think it would be good to explain a little more for everyone. Thanks. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
I tried with CLOG 24 also and I got linear performance upto 1250 users after which it started to tank. 32 got us to 1350 users before some other bottleneck overtook it. Based on what Tom said earlier, it might then make sense to make it a tunable with the default of 8 but something one can change for high number of users. Thanks. Regards, Jignesh Simon Riggs wrote: > On Fri, 2007-08-03 at 16:09 -0400, Jignesh K. Shah wrote: > > >> This patch seems to work well (both with 32 and 64 value but not with 16 >> and the default 8). >> > > Could you test at 24 please also? Tom has pointed out the additional > cost of setting this higher, even in workloads that don't benefit from > the I/O-induced contention reduction. > > >> Is there a way we can integrate this in 8.3? >> > > I just replied to Josh's thread on -hackers about this. > > >> This will improve out of box performance quite a bit for high number of >> users (atleat 30% in my OLTP test) >> > > Yes, thats good. Will this have a dramatic effect on a particular > benchmark, or for what reason might we need this? Tom has questioned the > use case here, so I think it would be good to explain a little more for > everyone. Thanks. > >
On Fri, 2007-08-10 at 13:54 -0400, Jignesh K. Shah wrote: > I tried with CLOG 24 also and I got linear performance upto 1250 users > after which it started to tank. 32 got us to 1350 users before some > other bottleneck overtook it. Jignesh, Thanks for testing that. It's not very clear to everybody why an extra 100 users is useful and it would certainly help your case if you can explain. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com