Обсуждение: Connection pooling for a mixture of lightweight and heavyweight jobs?
I have a question that may be related to connection pooling. We create a bunch of high-performance lightweight Postgres clients that serve up images (via mod_perl and Apache::DBI). We have roughly ten web sites, with ten mod_perl instances each, so we always have around 100 Postgres backends sitting aroundall the time waiting. When a lightweight request comes in, it's a single query on an primary key with no joins, soit's very fast. We also have a very heavyweight process (our primary search technology) that can take many seconds, even minutes, to do asearch and generate a web page. The lightweight backends are mostly idle, but when a heavyweight search finishes, it causes a burst on the lightweight backends,which must be very fast. (They provide all of the images in the results page.) This mixture seems to make it hard to configure Postgres with the right amount of memory and such. The primary query needssome elbow room to do its work, but the lightweight queries all get the same resources. I figured that having these lightweight Postgres backends sitting around was harmless -- they allocate shared memory andother resources, but they never use them, so what's the harm? But recent discussions about connection pooling seem tosuggest otherwise, that merely having 100 backends sitting around might be a problem. Craig
Craig James <craig_james@emolecules.com> wrote: > We create a bunch of high-performance lightweight Postgres clients > that serve up images (via mod_perl and Apache::DBI). We have > roughly ten web sites, with ten mod_perl instances each, so we > always have around 100 Postgres backends sitting around all the > time waiting. When a lightweight request comes in, it's a single > query on an primary key with no joins, so it's very fast. > > We also have a very heavyweight process (our primary search > technology) that can take many seconds, even minutes, to do a > search and generate a web page. > > The lightweight backends are mostly idle, but when a heavyweight > search finishes, it causes a burst on the lightweight backends, > which must be very fast. (They provide all of the images in the > results page.) > > This mixture seems to make it hard to configure Postgres with the > right amount of memory and such. The primary query needs some > elbow room to do its work, but the lightweight queries all get the > same resources. > > I figured that having these lightweight Postgres backends sitting > around was harmless -- they allocate shared memory and other > resources, but they never use them, so what's the harm? But > recent discussions about connection pooling seem to suggest > otherwise, that merely having 100 backends sitting around might be > a problem. Well, the "if it ain't broke, don't fix it" rule might come into play here. The current configuration might leave you vulnerable to occasional less-than-optimal performance, if two or more heavyweight processes finish at the same time, and cause a "thundering herd" of lightweight processes. Having the lightweight requests go through a connection pool could mitigate that problem, but they introduce their own overhead on every request. So, I would advise keeping an eye out for problems which might match the above, but not to take hasty action in the absence of evidence. You might buy back 400MB of RAM for caching (which you may or may not need) at the cost of extra latency and CPU per request. -Kevin
On 7/30/10 8:57 AM, Kevin Grittner wrote: > Craig James<craig_james@emolecules.com> wrote: > >> We create a bunch of high-performance lightweight Postgres clients >> that serve up images (via mod_perl and Apache::DBI). We have >> roughly ten web sites, with ten mod_perl instances each, so we >> always have around 100 Postgres backends sitting around all the >> time waiting. When a lightweight request comes in, it's a single >> query on an primary key with no joins, so it's very fast. >> >> We also have a very heavyweight process (our primary search >> technology) that can take many seconds, even minutes, to do a >> search and generate a web page. >> >> The lightweight backends are mostly idle, but when a heavyweight >> search finishes, it causes a burst on the lightweight backends, >> which must be very fast. (They provide all of the images in the >> results page.) >> >> This mixture seems to make it hard to configure Postgres with the >> right amount of memory and such. The primary query needs some >> elbow room to do its work, but the lightweight queries all get the >> same resources. >> >> I figured that having these lightweight Postgres backends sitting >> around was harmless -- they allocate shared memory and other >> resources, but they never use them, so what's the harm? But >> recent discussions about connection pooling seem to suggest >> otherwise, that merely having 100 backends sitting around might be >> a problem. > > Well, the "if it ain't broke, don't fix it" rule might come into > play here. I should have given one more detail here: We've been the victim of persistent "CPU spikes" that were discussed extensivelyin postgres-performance. Tom suggested upgrading to 8.4.4, but that can't happen for a couple more months (we'reworking on it). http://archives.postgresql.org/pgsql-performance/2010-04/msg00071.php Craig > The current configuration might leave you vulnerable to > occasional less-than-optimal performance, if two or more heavyweight > processes finish at the same time, and cause a "thundering herd" of > lightweight processes. Having the lightweight requests go through a > connection pool could mitigate that problem, but they introduce > their own overhead on every request. So, I would advise keeping an > eye out for problems which might match the above, but not to take > hasty action in the absence of evidence. You might buy back 400MB > of RAM for caching (which you may or may not need) at the cost of > extra latency and CPU per request. > > -Kevin >
Craig James <craig_james@emolecules.com> wrote: >> Well, the "if it ain't broke, don't fix it" rule might come into >> play here. > > I should have given one more detail here: We've been the victim > of persistent "CPU spikes" that were discussed extensively in > postgres-performance. Tom suggested upgrading to 8.4.4, but that > can't happen for a couple more months (we're working on it). > > http://archives.postgresql.org/pgsql-performance/2010-04/msg00071.php Ah, I hadn't connected that thread with this. After rereading that thread with the information from this thread in mind, I think what you describe on the other thread could well be the "thundering herd" problem. Some form of connection pooling could well help. BTW, I hope you've updated to the latest 8.3.x by now. If not, you should expedite that. -Kevin
On 7/30/10 10:37 AM, Kevin Grittner wrote: > Craig James<craig_james@emolecules.com> wrote: > >>> Well, the "if it ain't broke, don't fix it" rule might come into >>> play here. >> >> I should have given one more detail here: We've been the victim >> of persistent "CPU spikes" that were discussed extensively in >> postgres-performance. Tom suggested upgrading to 8.4.4, but that >> can't happen for a couple more months (we're working on it). >> >> > http://archives.postgresql.org/pgsql-performance/2010-04/msg00071.php > > Ah, I hadn't connected that thread with this. After rereading that > thread with the information from this thread in mind, I think what > you describe on the other thread could well be the "thundering herd" > problem. Some form of connection pooling could well help. > > BTW, I hope you've updated to the latest 8.3.x by now. If not, you > should expedite that. Yes, I updated to 8.3.10, partly to see if it would solve this problem. I'm not clear on how connection pooling would help this problem. I would have 100 lightweight backends, whether they werepooled or not, always sitting around. Or are you suggesting that I not use Apache::DBI to maintain persistent connections,and instead rely on the connection pooler to provide fast connect/disconnect from Postgres? Thanks, Craig
Craig James <craig_james@emolecules.com> wrote: > I'm not clear on how connection pooling would help this problem. > I would have 100 lightweight backends, whether they were pooled or > not, always sitting around. Not if you use a good pooler with the right configuration. You should have 10 to 20 database connections which the lightweight query requests share. You want something which will queue requests above that maximum, and immediately release one more request when a previous request completes. We have this built in to our application framework, so I'm not familiar with how to configure pgbouncer or pgpool, but from what I've heard I would expect both of these to support such behavior. (Someone with more experience with one or both of these products, please jump in here....) At this point some people protest that they can't let requests queue up like that because they need fast response time. Trust me; if you set up a good connection pool to the right size, this *will* give you much better throughput *and* response time than letting the thundering herd loose to compete for resources. -Kevin