Обсуждение: Slow concurrent update of same row in a given table
Hi
While doing some stress testing for updates in a small sized table
we found the following results. We are not too happy about the speed
of the updates particularly at high concurrency (10 clients).
Initially we get 119 updates / sec but it drops to 10 updates/sec
as concurrency is increased.
PostgreSQL: 8.0.3
-------------------------------
TABLE STRUCTURE: general.stress
-------------------------------
| dispatch_id | integer | not null |
| query_id | integer | |
| generated | timestamp with time zone | |
| unsubscribes | integer | |
| read_count | integer | |
| status | character varying(10) | |
| bounce_tracking | boolean | |
| dispatch_hour | integer | |
| dispatch_date_id | integer | |
+------------------+--------------------------+-----------+
Indexes:
"stress_pkey" PRIMARY KEY, btree (dispatch_id)
UPDATE STATEMENT:
update general.stress set read_count=read_count+1 where dispatch_id=114
TOOL USED: Perl/DBI , with prepared statement handlers
CONCURRENCY METHOD: executing multiple copies of same program
from different shells (linux enviornment)
CLIENT SERVER LINK : 10/100 Mbits , LAN
CLIENT CODE: stress.pl
-------------------------------------------------------------------------
#!/opt/perl/bin/perl -I/usr/local/masonapache/lib/perl
################################################
#overview: update the table as fast as possible (while(1){})
#on every 100th commit , print the average update frequency
#of last 100 updates
##########################################
use strict;
use Time::HiRes qw(gettimeofday tv_interval);
use Utils;
my $dbh = &Utils::db_connect();
my $sth = $dbh -> prepare("update general.stress set
read_count=read_count+1 where dispatch_id=114");
my $cnt=0;
my $t0 = [ gettimeofday ];
while(1) {
$sth -> execute();
$dbh->commit();
$cnt++;
if ($cnt % 100 == 0)
{
my $t1 = [ gettimeofday ];
my $elapsed = tv_interval ( $t0 , $t1 );
$t0 = $t1;
printf "Rate: %d updates / sec\n" , 100.0/$elapsed ;
}
}
$sth->finish();
$dbh->disconnect();
--------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------
RESULTS:
--------------------------------------------------------------------------------------
Number of Copies | Update perl Sec
1 --> 119
2 ---> 59
3 ---> 38
4 ---> 28
5 --> 22
6 --> 19
7 --> 16
8 --> 14
9 --> 11
10 --> 11
11 --> 10
-------------------------------------------------------------------------------------
Note that the table was vacuum analyzed during the tests
total number of records in table: 93
-------------------------------------------------------------------------------------
Regds
Rajesh Kumar Mallah.
On Wed, 28 Sep 2005, Rajesh Kumar Mallah wrote: > Hi > > While doing some stress testing for updates in a small sized table > we found the following results. We are not too happy about the speed > of the updates particularly at high concurrency (10 clients). > > Initially we get 119 updates / sec but it drops to 10 updates/sec > as concurrency is increased. > > PostgreSQL: 8.0.3 > ------------------------------- > TABLE STRUCTURE: general.stress > ------------------------------- > | dispatch_id | integer | not null | > | query_id | integer | | > | generated | timestamp with time zone | | > | unsubscribes | integer | | > | read_count | integer | | > | status | character varying(10) | | > | bounce_tracking | boolean | | > | dispatch_hour | integer | | > | dispatch_date_id | integer | | > +------------------+--------------------------+-----------+ > Indexes: > "stress_pkey" PRIMARY KEY, btree (dispatch_id) > > UPDATE STATEMENT: > update general.stress set read_count=read_count+1 where dispatch_id=114 This means you are updating only one row, correct? > Number of Copies | Update perl Sec > > 1 --> 119 > 2 ---> 59 > 3 ---> 38 > 4 ---> 28 > 5 --> 22 > 6 --> 19 > 7 --> 16 > 8 --> 14 > 9 --> 11 > 10 --> 11 > 11 --> 10 So, 11 instances result in 10 updated rows per second, database wide or per instance? If it is per instance, then 11 * 10 is close to the performance for one connection. That being said, when you've got 10 connections fighting over one row, I wouldn't be surprised if you had bad performance. Also, at 119 updates a second, you're more than doubling the table's initial size (dead tuples) each second. How often are you vacuuming and are you using vacuum or vacuum full? Gavin
On 9/28/05, Gavin Sherry <swm@alcove.com.au> wrote: > On Wed, 28 Sep 2005, Rajesh Kumar Mallah wrote: > > > Hi > > > > While doing some stress testing for updates in a small sized table > > we found the following results. We are not too happy about the speed > > of the updates particularly at high concurrency (10 clients). > > > > Initially we get 119 updates / sec but it drops to 10 updates/sec > > as concurrency is increased. > > > > PostgreSQL: 8.0.3 > > ------------------------------- > > TABLE STRUCTURE: general.stress > > ------------------------------- > > | dispatch_id | integer | not null | > > | query_id | integer | | > > | generated | timestamp with time zone | | > > | unsubscribes | integer | | > > | read_count | integer | | > > | status | character varying(10) | | > > | bounce_tracking | boolean | | > > | dispatch_hour | integer | | > > | dispatch_date_id | integer | | > > +------------------+--------------------------+-----------+ > > Indexes: > > "stress_pkey" PRIMARY KEY, btree (dispatch_id) > > > > UPDATE STATEMENT: > > update general.stress set read_count=read_count+1 where dispatch_id=114 > > This means you are updating only one row, correct? Correct. > > > Number of Copies | Update perl Sec > > > > 1 --> 119 > > 2 ---> 59 > > 3 ---> 38 > > 4 ---> 28 > > 5 --> 22 > > 6 --> 19 > > 7 --> 16 > > 8 --> 14 > > 9 --> 11 > > 10 --> 11 > > 11 --> 10 > > So, 11 instances result in 10 updated rows per second, database wide or > per instance? If it is per instance, then 11 * 10 is close to the > performance for one connection. Sorry do not understand the difference between "database wide" and "per instance" > > That being said, when you've got 10 connections fighting over one row, I > wouldn't be surprised if you had bad performance. > > Also, at 119 updates a second, you're more than doubling the table's > initial size (dead tuples) each second. How often are you vacuuming and > are you using vacuum or vacuum full? Yes I realize the obvious phenomenon now, (and the uselessness of the script) , we should not consider it a performance degradation. I am having performance issue in my live database thats why i tried to simulate the situation(may the the script was overstresser). My original problem is that i send 100 000s of emails carrying a beacon for tracking readership every tuesday and on wednesday i see lot of the said query in pg_stat_activity each of these query update the SAME row that corresponds to the dispatch of last day and it is then i face the performance problem. I think i can only post further details next wednesday , please lemme know how should i be dealing with the situation if each the updates takes 100times more time that normal update duration. Best Regards Mallah. > > Gavin >
On Wed, 28 Sep 2005, Rajesh Kumar Mallah wrote: > > > Number of Copies | Update perl Sec > > > > > > 1 --> 119 > > > 2 ---> 59 > > > 3 ---> 38 > > > 4 ---> 28 > > > 5 --> 22 > > > 6 --> 19 > > > 7 --> 16 > > > 8 --> 14 > > > 9 --> 11 > > > 10 --> 11 > > > 11 --> 10 > > > > So, 11 instances result in 10 updated rows per second, database wide or > > per instance? If it is per instance, then 11 * 10 is close to the > > performance for one connection. > > > Sorry do not understand the difference between "database wide" > and "per instance" Per instance. > > > > > That being said, when you've got 10 connections fighting over one row, I > > wouldn't be surprised if you had bad performance. > > > > Also, at 119 updates a second, you're more than doubling the table's > > initial size (dead tuples) each second. How often are you vacuuming and > > are you using vacuum or vacuum full? > > > Yes I realize the obvious phenomenon now, (and the uselessness of the script) > , we should not consider it a performance degradation. > > I am having performance issue in my live database thats why i tried to > simulate the situation(may the the script was overstresser). > > My original problem is that i send 100 000s of emails carrying a > beacon for tracking readership every tuesday and on wednesday i see > lot of the said query in pg_stat_activity each of these query update > the SAME row that corresponds to the dispatch of last day and it is > then i face the performance problem. > > I think i can only post further details next wednesday , please lemme > know how should i be dealing with the situation if each the updates takes > 100times more time that normal update duration. I see. These problems regularly come up in database design. The best thing you can do is modify your database design/application such that instead of incrementing a count in a single row, you insert a row into a table, recording the 'dispatch_id'. Counting the number of rows for a given dispatch id will give you your count. Thanks, Gavin
On 9/29/05, Gavin Sherry <swm@alcove.com.au> wrote: > On Wed, 28 Sep 2005, Rajesh Kumar Mallah wrote: > > > > > Number of Copies | Update perl Sec > > > > > > > > 1 --> 119 > > > > 2 ---> 59 > > > > 3 ---> 38 > > > > 4 ---> 28 > > > > 5 --> 22 > > > > 6 --> 19 > > > > 7 --> 16 > > > > 8 --> 14 > > > > 9 --> 11 > > > > 10 --> 11 > > > > 11 --> 10 > > > > > > So, 11 instances result in 10 updated rows per second, database wide or > > > per instance? If it is per instance, then 11 * 10 is close to the > > > performance for one connection. > > > > > > Sorry do not understand the difference between "database wide" > > and "per instance" > > Per instance. > > > > > > > > > That being said, when you've got 10 connections fighting over one row, I > > > wouldn't be surprised if you had bad performance. > > > > > > Also, at 119 updates a second, you're more than doubling the table's > > > initial size (dead tuples) each second. How often are you vacuuming and > > > are you using vacuum or vacuum full? > > > > > > Yes I realize the obvious phenomenon now, (and the uselessness of the script) > > , we should not consider it a performance degradation. > > > > I am having performance issue in my live database thats why i tried to > > simulate the situation(may the the script was overstresser). > > > > My original problem is that i send 100 000s of emails carrying a > > beacon for tracking readership every tuesday and on wednesday i see > > lot of the said query in pg_stat_activity each of these query update > > the SAME row that corresponds to the dispatch of last day and it is > > then i face the performance problem. > > > > I think i can only post further details next wednesday , please lemme > > know how should i be dealing with the situation if each the updates takes > > 100times more time that normal update duration. > > I see. These problems regularly come up in database design. The best thing > you can do is modify your database design/application such that instead of > incrementing a count in a single row, you insert a row into a table, > recording the 'dispatch_id'. Counting the number of rows for a given > dispatch id will give you your count. > sorry i will be accumulating huge amount of rows in seperate table with no extra info when i really want just the count. Do you have a better database design in mind? Also i encounter same problem in implementing read count of articles in sites and in counting banner impressions where same row get updated by multiple processes frequently. Thanks & Regds mallah. > Thanks, > > Gavin >
On Thu, 29 Sep 2005, Rajesh Kumar Mallah wrote: > On 9/29/05, Gavin Sherry <swm@alcove.com.au> wrote: > > On Wed, 28 Sep 2005, Rajesh Kumar Mallah wrote: > > > > > > > Number of Copies | Update perl Sec > > > > > > > > > > 1 --> 119 > > > > > 2 ---> 59 > > > > > 3 ---> 38 > > > > > 4 ---> 28 > > > > > 5 --> 22 > > > > > 6 --> 19 > > > > > 7 --> 16 > > > > > 8 --> 14 > > > > > 9 --> 11 > > > > > 10 --> 11 > > > > > 11 --> 10 > > > > > > > > So, 11 instances result in 10 updated rows per second, database wide or > > > > per instance? If it is per instance, then 11 * 10 is close to the > > > > performance for one connection. > > > > > > > > > Sorry do not understand the difference between "database wide" > > > and "per instance" > > > > Per instance. > > > > > > > > > > > > > That being said, when you've got 10 connections fighting over one row, I > > > > wouldn't be surprised if you had bad performance. > > > > > > > > Also, at 119 updates a second, you're more than doubling the table's > > > > initial size (dead tuples) each second. How often are you vacuuming and > > > > are you using vacuum or vacuum full? > > > > > > > > > Yes I realize the obvious phenomenon now, (and the uselessness of the script) > > > , we should not consider it a performance degradation. > > > > > > I am having performance issue in my live database thats why i tried to > > > simulate the situation(may the the script was overstresser). > > > > > > My original problem is that i send 100 000s of emails carrying a > > > beacon for tracking readership every tuesday and on wednesday i see > > > lot of the said query in pg_stat_activity each of these query update > > > the SAME row that corresponds to the dispatch of last day and it is > > > then i face the performance problem. > > > > > > I think i can only post further details next wednesday , please lemme > > > know how should i be dealing with the situation if each the updates takes > > > 100times more time that normal update duration. > > > > I see. These problems regularly come up in database design. The best thing > > you can do is modify your database design/application such that instead of > > incrementing a count in a single row, you insert a row into a table, > > recording the 'dispatch_id'. Counting the number of rows for a given > > dispatch id will give you your count. > > > > sorry i will be accumulating huge amount of rows in seperate table > with no extra info when i really want just the count. Do you have > a better database design in mind? > > Also i encounter same problem in implementing read count of > articles in sites and in counting banner impressions where same > row get updated by multiple processes frequently. As I said in private email, accumulating large numbers of rows is not a problem. In your current application, you are write bound, not read bound. I've designed many similar systems which have hundred of millions of rows. It takes a while to generate the count, but you just do it periodically in non-busy periods. With 8.1, constraint exclusion will give you significantly better performance with this system, as well. Thanks, Gavin
On Thu, Sep 29, 2005 at 07:59:34AM +0530, Rajesh Kumar Mallah wrote:
> > I see. These problems regularly come up in database design. The best thing
> > you can do is modify your database design/application such that instead of
> > incrementing a count in a single row, you insert a row into a table,
> > recording the 'dispatch_id'. Counting the number of rows for a given
> > dispatch id will give you your count.
> >
>
> sorry i will be accumulating huge amount of rows in seperate table
> with no extra info when i really want just the count. Do you have
> a better database design in mind?
>
> Also i encounter same problem in implementing read count of
> articles in sites and in counting banner impressions where same
> row get updated by multiple processes frequently.
Databases like to work on *sets* of data, not individual rows. Something
like this would probably perform much better than what you've got now,
and would prevent having a huge table laying around:
INSERT INTO holding_table ... -- Done for every incomming
connection/what-have-you
CREATE OR REPLACE FUNCTION summarize() RETURNS void AS $$
DECLARE
v_rows int;
BEGIN
DELETE FROM holding_table;
GET DIAGNOSTICS v_rows = ROW_COUNT;
UPDATE count_table
SET count = count + v_rows
;
END;
$$ LANGUAGE plpgsql;
Periodically (say, once a minute):
SELECT summarize()
VACUUM holding_table;
VACUUM count_table;
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461