On Tue, Dec 30, 2008 at 4:02 AM, Iñigo Martinez Lasala
<imartinez@vectorsf.com> wrote:
> Hi everybody.
>
> Recently our company has been granted with a contract for an on-line store
> mainteinance.
> The website has been developed under J2EE and Postgres 8.1 as database
> backend. The system has been working without problem for several month,
> but with Christmas access to web portal has raised a lot.
> The database suffers of a performance problem on high load. Lot of context
> switch happens reaching up to 200.000 cs per second.
> This system is a 16GB, 4 CPU intel Xeon MP with HT enabled and a RAID10
> iSCSI storage, kernel 2.4.21 (RHAS 3).
>
> Half of CPU power is lost on system time, as you can see.
>
> Vmstat on high load
> 19 0 0 281852 150316 13732396 0 0 32 80 1071 128209 41
> 43 16 0
> 75 0 0 282040 150316 13732396 0 0 0 0 719 148023 40
> 38 22 0
> 3 0 0 284208 150324 13732412 0 0 16 484 728 145371 39
> 40 21 0
> 12 0 0 278364 150324 13732508 0 0 80 56 660 157533 35
> 42 23 1
> 6 0 0 284972 150324 13732580 0 0 32 200 685 142014 39
> 41 20 0
> 8 0 0 296424 150324 13732624 0 0 40 136 554 139601 41
> 39 20 0
> 85 0 0 265004 150324 13732664 0 0 32 48 642 142437 48
> 32 20 0
> 32 0 0 267432 150324 13732680 0 0 0 788 1003 144409 37
> 42 21 0
> 13 0 0 270468 150324 13732676 0 0 0 24 724 146663 42
> 40 19
>
> Vmstat after 20 seconds after stopping portal:
> 8 0 0 962388 206744 13771548 0 0 0 0 131 199784 11
> 38 51 0
> 3 0 0 970212 206744 13771548 0 0 0 1856 305 203639 12
> 40 48 0
> 10 0 0 975036 206744 13771588 0 0 0 128 212 201899 11
> 36 52 0
> 3 0 0 970272 206744 13771652 0 0 16 232 685 202672 14
> 41 44 0
> 6 0 0 1008320 206744 13771656 0 0 0 40 198 196298 14
> 46 39 0
> 3 0 0 1034836 206744 13771656 0 0 0 0 147 202731 12
> 39 50 0
> 3 0 0 1037764 206752 13771656 0 0 0 952 202 202933 11
> 39 50 0
> 5 0 0 1078132 206752 13771656 0 0 0 0 154 203408 18
> 35 47 0
> 6 0 0 1110572 206752 13771656 0 0 0 0 153 196864 18
> 41 41 0
> 4 0 0 1105440 206752 13771824 0 0 16 592 461 207538 12
> 37 51 1
>
>
> I've read about this problem with version prior 8.2. However at this
> moment is not possible to migrate to 8.2 due to the amount of stored
> procedures and we don't have time enough to test ALL procedures in order
> to migrate to 8.2 (or 8.3).
> However we have performed light tests with 8.2 on high load an this
> problem has been solved or mitigated.
Are you using connection pooling, or do you have a whole bunch of
connections at once? How many connections do you have that are idle
versus active?
> Now the question. Is there any backport patch for 8.1 that solves
> context-switch storm?
It's far more likely that a back ported 8.1.x would have problems than
you'd run into issues with 8.2 or 8.3 with stored procs. I'd skip 8.2
and go straight to testing on 8.3. We upgraded from 8.1 to 8.3 on our
production database. The only issue we had was that a lot of implicit
casts had been removed, and some older code relied on an explicit date
:: text cast that had gone away. Since relying on date being a text
string is bad form anyway, we fixed the code and went on from there.
Usually when something like this doesn't get back patched, it's
because the code base was so different in that area that backporting
it represents a real danger to the code stability.
If you upgrade to 8.3 you're upgrading to a stable release that solves
your problems. If you backport the patch to 8.1 you're running a
version tested only by you.