Обсуждение: My server is oddly very slow
I have 2 servers 1 for development (called dev) 1 for production (called prod). The server for development is very basic (a dual core 2.1ghtz on a basic 300MB hard drive with 2GB of memory (this server is not dedicated to postgres, it also uses php, mapserver and so on). I assigned the memory as follow max co 40 shared buffers 258MB work_mem 8MB maitainance_work_mem: 32MB wal_buffers 8MB check_point_seg 16 autovacuum on effective-cache-size 744MB Dev is installed on centos 4 32bits (on a virtual server openvz) My production server is a QUAD CORE XEON E5310 (1,6Ghtz 3GB of RAM and the hard drive is a very fast SAN with fiberchanel: the copy of files on the SAN is about 3 times faster than on the development hard drive. This is a fully dedicated server for postgres. I assigned the memory as follow max co 40 shared buffers 512MB work_mem 16MB maitainance_work_mem: 128MB wal_buffers 8MB check_point_seg 16 autovacuum on effective-cache-size 1536MB prod is installed on Red Hat Enterprise Linux ES release 4 (Nahant Update 6) 32bits (not on a virtual server) Those 2 distribution are very similar, and I use the same proprams RPM on both systems The version of both databases is postgres 8.4.1, I installed the same database (in fact prod comes form a dump of dev once every week) The dump is very big about 16GB compressed. Once it is dumped, I make a full analyze on the database before I use it. Now according to the hardware, I would suspect dev should be slower than prod, but almose all queries are slower on prod, some times 10 times slower. Today I made a query with an explain analyse (same query on both servers) and I got for dev: 1. "Limit (cost=396689.94..399801.92 rows=2000 width=512) (actual time=8696.467..10028.179 rows=2000 loops=1)" 2. " -> GroupAggregate (cost=396689.94..424470.56 rows=17854 width=512) (actual time=8696.463..10024.632 rows=2000 loops=1)" 3. " Filter: (sum(effectif_max) >= 10)" 4. " -> Sort (cost=396689.94..397088.34 rows=159360 width=512) (actual time=8693.396..8745.436 rows=8344 loops=1)" 5. " Sort Key: implantation_company_id, virtual_building_id" 6. " Sort Method: external merge Disk: 125928kB" 7. " -> Bitmap Heap Scan on gen_establishment_search (cost=8156.66..346425.20 rows=159360 width=512) (actual time=2545.813..6688.078 rows=152200 loops=1)" 8. " Recheck Cond: (gis_departement_id = '75'::bpchar)" 9. " Filter: (((telephone IS NOT NULL) OR (telephone_siege IS NOT NULL)) AND (NOT etablissement_masque) AND (ref_zone_permission_id = ANY ('{2,1}'::integer[])) AND (created_by_user_group = 1) AND (ref_establishment_type_id = ANY ('{1,2,3}'::integer[])))" 10. " -> Bitmap Index Scan on gen_establishment_search_gis_departement_id (cost=0.00..8116.82 rows=498680 width=0) (actual time=2435.952..2435.952 rows=502304 loops=1)" 11. " Index Cond: (gis_departement_id = '75'::bpchar)" 12. "Total runtime: 10109.194 ms" -> so 10 seconds and for prod 1. Limit (cost=397787.02..400793.19 rows=2000 width=513) (actual time=94188.493..97457.042 rows=2000 loops=1) 2. -> GroupAggregate (cost=397787.02..425288.95 rows=18297 width=513) (actual time=94188.489..97452.268 rows=2000 loops=1) 3. Filter: (sum(effectif_max) >= 10) 4. -> Sort (cost=397787.02..398179.69 rows=157069 width=513) (actual time=94187.660..94530.617 rows=8312 loops=1) 5. Sort Key: implantation_company_id, virtual_building_id 6. Sort Method: external merge Disk: 124192kB 7. -> Bitmap Heap Scan on gen_establishment_search (cost=8231.64..347992.15 rows=157069 width=513) (actual time=836.994..91818.746 rows=149851 loops=1) 8. Recheck Cond: (gis_departement_id = '75'::bpchar) 9. Filter: (((telephone IS NOT NULL) OR (telephone_siege IS NOT NULL)) AND (NOT etablissement_masque) AND (ref_zone_permission_id = ANY ('{2,1}'::int 10. eger[])) AND (created_by_user_group = 1) AND (ref_establishment_type_id = ANY ('{1,2,3}'::integer[]))) 11. -> Bitmap Index Scan on gen_establishment_search_gis_departement_id (cost=0.00..8192.37 rows=502886 width=0) (actual time=699.530..699.530 rows= 12. 507213 loops=1) 13. Index Cond: (gis_departement_id = '75'::bpchar) 14. Total runtime: 97496.574 ms -> 90 seconds for some other queries is is worst. None of the queries I have tested are faster on prod. So I checked the memory on prod during my query execution: vmstat procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 1 1280 23320 6356 2975956 0 0 29 45 1 0 2 0 95 3 The swap is not used but something is a bit odd: the cache is using 98% of the RAM if I do SHOW shared_buffers; I get shared_buffers ---------------- 512MB as I would expect from my configuration. I've got out of idea, any idea? -- Logo_HBS_mail.jpg Adrien DUCOS Analyste développeur aducos@hbs-research.com <mailto:aducos@hbs-research.com> www.hbs-research.com <http://www.hbs-research.com/> +33 (0)9 70 44 64 10 24 rue de l'Est 75020 Paris
adrien ducos <aducos@hbs-research.com> wrote: [rearranged somewhat] > The version of both databases is postgres 8.4.1 [sigh] You really should upgrade. http://www.postgresql.org/support/versioning http://www.postgresql.org/docs/8.4/static/release.html > So I checked the memory on prod during my query execution: > vmstat > procs -----------memory---------- > r b swpd free buff cache > 0 1 1280 23320 6356 2975956 > The swap is not used but something is a bit odd: the cache is > using 98% of the RAM The first line in vmstat (which is all you get if you run it without a number) is averages since the OS was booted. That's not usually very helpful. What does this show?: free -m or run the query while watching the output from: vmstat 1 > the copy of files on the SAN is about 3 times faster than on the > development hard drive. Copy time doesn't always correlate real well with database performance. Have you tried running bonnie++ or similar? -Kevin
Kevin Grittner a écrit : > adrien ducos <aducos@hbs-research.com> wrote: > > [rearranged somewhat] > > >> The version of both databases is postgres 8.4.1 >> > > [sigh] You really should upgrade. > > http://www.postgresql.org/support/versioning > > http://www.postgresql.org/docs/8.4/static/release.html > I know I should but only my system administrator is allowed to do so, we will soon change to postgres 9.1 anyway. > > >> So I checked the memory on prod during my query execution: >> vmstat >> > > >> procs -----------memory---------- >> r b swpd free buff cache >> 0 1 1280 23320 6356 2975956 >> > > >> The swap is not used but something is a bit odd: the cache is >> using 98% of the RAM >> > > The first line in vmstat (which is all you get if you run it without > a number) is averages since the OS was booted. That's not usually > very helpful. What does this show?: > > free -m > > or run the query while watching the output from: > > vmstat 1 > > So I did that also, and have the same results: free -m total used free shared buffers cached Mem: 3042 3000 41 0 4 2896 -/+ buffers/cache: 99 2942 Swap: 1983 1 1982 same for vmstat 1 while runing the query vmstat 1 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 1280 45752 4496 2963460 0 0 29 45 0 1 2 0 95 3 Almost everything is "cached", I read on some documents that cache means linux uses the free memory to cache disk data (and avoid waisting the free memory). On the other hand, when I do ps -aux while the query is running I see the query is using 17% of the RAM, so I guess it is using the RAM ok. The question could be, why is linux marking that 95% of the memory is cache while in the same time ps -aux shows my query uses 17% of the RAM. >> the copy of files on the SAN is about 3 times faster than on the >> development hard drive. >> > > Copy time doesn't always correlate real well with database > performance. Have you tried running bonnie++ or similar? > Ok, we didn't check on that but we will. I'll see with my system admin next week (he is on hollydays now). On the paper thought it is a SAN composed of several velociraptors. It is supposed to be faster than a normal and rather old hard drive (speed and access time). Thanks, Adrien -- Logo_HBS_mail.jpg Adrien DUCOS Analyste développeur aducos@hbs-research.com <mailto:aducos@hbs-research.com> www.hbs-research.com <http://www.hbs-research.com/> +33 (0)9 70 44 64 10 24 rue de l'Est 75020 Paris
adrien ducos <aducos@hbs-research.com> wrote: > Kevin Grittner a écrit : >> What does this show?: >> >> free -m > total used free shared buffers cached > Mem: 3042 3000 41 0 4 2896 > -/+ buffers/cache: 99 2942 > Swap: 1983 1 1982 This was while the PostgreSQL server was running and showing shared_buffers = 512MB? If so, there's something seriously wrong with your OS. >> The first line in vmstat (which is all you get if you run it >> without a number) is averages since the OS was booted. That's >> not usually very helpful. What does this show?: >> >> vmstat 1 > same for vmstat 1 while runing the query > > vmstat 1 > procs -----------memory---------- ---swap-- -----io---- -system-- > r b swpd free buff cache si so bi bo in cs > 0 0 1280 45752 4496 2963460 0 0 29 45 0 1 Those look like the sort of useless numbers you get on the *first* line of vmstat, even with a number. What did a few of the *other* lines look like while the query was running? -Kevin
Kevin Grittner a écrit : > adrien ducos <aducos@hbs-research.com> wrote: > >> Kevin Grittner a écrit : >> > > >>> What does this show?: >>> >>> free -m >>> > > >> total used free shared buffers cached >> Mem: 3042 3000 41 0 4 2896 >> -/+ buffers/cache: 99 2942 >> Swap: 1983 1 1982 >> > > This was while the PostgreSQL server was running and showing > shared_buffers = 512MB? If so, there's something seriously wrong > with your OS. > > >>> The first line in vmstat (which is all you get if you run it >>> without a number) is averages since the OS was booted. That's >>> not usually very helpful. What does this show?: >>> >>> vmstat 1 >>> > > >> same for vmstat 1 while runing the query >> >> vmstat 1 >> procs -----------memory---------- ---swap-- -----io---- -system-- >> r b swpd free buff cache si so bi bo in cs >> 0 0 1280 45752 4496 2963460 0 0 29 45 0 1 >> > > Those look like the sort of useless numbers you get on the *first* > line of vmstat, even with a number. What did a few of the *other* > lines look like while the query was running? > > -Kevin > . > > They Look exactly the same as the first one, with cache changing form 2963460 to 2961000 so not much difference at all... So what you say is what I thought, there is something wrong with the OS... we will try to put centos (the same OS as in development)instead of redhat the OS installed by the internet provider and see how it goes. Thanks for your feedback. Adrien -- Logo_HBS_mail.jpg Adrien DUCOS Analyste développeur aducos@hbs-research.com <mailto:aducos@hbs-research.com> www.hbs-research.com <http://www.hbs-research.com/> +33 (0)9 70 44 64 10 24 rue de l'Est 75020 Paris
I have postgres 8.4 I have a table "foo" with 16 million lines 99% of those lines have a column "bar" = 1. I had an index on this table: CREATE INDEX index_foo_bar ON foo using btree (bar); The thing is that the query select count(*) from foo where bar = 1; is not using the query (it is useless that is normal, using this index would be slower than not using it) the query select count(*) from foo where bar = 2; uses the index I have the answer in 20ms. With a prepared statement I have PREPARE fooplan (int) AS select count(*) from foo where bar = $1; execute fooplan (2); also a few milliseconds Now in order to optimise this last request I droped the first index and added a new index: CREATE INDEX index_foo_bar ON foo using btree (bar) where created_by_user_group <> 1; since the query is only using the index in this case anyway. with the query explain analyze select count(*) from foo where bar = 2; it uses the new index : "Aggregate (cost=8.29..8.30 rows=1 width=0) (actual time=0.119..0.119 rows=1 loops=1)" " -> Index Scan using index_foo_bar on foo (cost=0.00..8.29 rows=1 width=0) (actual time=0.017..0.084 rows=63 loops=1)" " Index Cond: (bar = 2)" "Total runtime: 0.144 ms" so great improvement from 20ms to 0.144ms and with the prepared statement... things becomes very bad: PREPARE fooplan (int) AS select count(*) from foo where bar = $1; explain analyze execute fooplan (2); "Aggregate (cost=627406.08..627406.09 rows=1 width=0) (actual time=11627.315..11627.316 rows=1 loops=1)" " -> Seq Scan on foo (cost=0.00..603856.80 rows=9419712 width=0) (actual time=7070.334..11627.266 rows=63 loops=1)" " Filter: (bar = $1)" "Total runtime: 11627.357 ms" No index uses and the time becomes very bad. This is probably due to the fact the prepared statement could have "1" as an input and so it avoids the index completely, but this is not very nice for software optimization since I am using pdo which uses prepared statements all the time and is unable to use all the partial indexes. The problem is I have 90 GB of indexes in the database and partial indexes could help me to save some of this space on my server, in addition to improve the speed of the queries. Adrien DUCOS -- Logo_HBS_mail.jpg Adrien DUCOS Analyste développeur aducos@hbs-research.com <mailto:aducos@hbs-research.com> www.hbs-research.com <http://www.hbs-research.com/> +33 (0)9 70 44 64 10 11-15 quai du Président Paul Doumer 92400 Courbevoie
On Mon, Jun 20, 2011 at 3:01 PM, adrien ducos <aducos@hbs-research.com> wrote: > I have postgres 8.4 > > I have a table "foo" with 16 million lines 99% of those lines have a column > "bar" = 1. > > I had an index on this table: > CREATE INDEX index_foo_bar ON foo using btree (bar); > > The thing is that the query > select count(*) from foo where bar = 1; > is not using the query (it is useless that is normal, using this index would > be slower than not using it) > > the query > select count(*) from foo where bar = 2; uses the index I have the answer in > 20ms. > > With a prepared statement I have > > PREPARE fooplan (int) AS > select count(*) from foo where bar = $1; > execute fooplan (2); > > also a few milliseconds > > Now in order to optimise this last request I droped the first index and > added a new index: > > CREATE INDEX index_foo_bar ON foo using btree (bar) where > created_by_user_group <> 1; > > since the query is only using the index in this case anyway. > > with the query > explain analyze select count(*) from foo where bar = 2; it uses the new > index : > > "Aggregate (cost=8.29..8.30 rows=1 width=0) (actual time=0.119..0.119 > rows=1 loops=1)" > " -> Index Scan using index_foo_bar on foo (cost=0.00..8.29 rows=1 > width=0) (actual time=0.017..0.084 rows=63 loops=1)" > " Index Cond: (bar = 2)" > "Total runtime: 0.144 ms" > > so great improvement from 20ms to 0.144ms > > and with the prepared statement... things becomes very bad: > PREPARE fooplan (int) AS > select count(*) from foo where bar = $1; > explain analyze execute fooplan (2); > > > "Aggregate (cost=627406.08..627406.09 rows=1 width=0) (actual > time=11627.315..11627.316 rows=1 loops=1)" > " -> Seq Scan on foo (cost=0.00..603856.80 rows=9419712 width=0) (actual > time=7070.334..11627.266 rows=63 loops=1)" > " Filter: (bar = $1)" > "Total runtime: 11627.357 ms" > > No index uses and the time becomes very bad. > > This is probably due to the fact the prepared statement could have "1" as an > input and so it avoids the index completely, but this is not very nice for > software optimization since I am using pdo which uses prepared statements > all the time and is unable to use all the partial indexes. > > The problem is I have 90 GB of indexes in the database and partial indexes > could help me to save some of this space on my server, in addition to > improve the speed of the queries. Unfortunately, prepared statements do act in the way you have seen. I have a patch into 9.2 under discussion to improve upon this situation, but don't hold your breath for that. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs a écrit : > On Mon, Jun 20, 2011 at 3:01 PM, adrien ducos <aducos@hbs-research.com> wrote: > >> I have postgres 8.4 >> >> I have a table "foo" with 16 million lines 99% of those lines have a column >> "bar" = 1. >> >> I had an index on this table: >> CREATE INDEX index_foo_bar ON foo using btree (bar); >> >> The thing is that the query >> select count(*) from foo where bar = 1; >> is not using the query (it is useless that is normal, using this index would >> be slower than not using it) >> >> the query >> select count(*) from foo where bar = 2; uses the index I have the answer in >> 20ms. >> >> With a prepared statement I have >> >> PREPARE fooplan (int) AS >> select count(*) from foo where bar = $1; >> execute fooplan (2); >> >> also a few milliseconds >> >> Now in order to optimise this last request I droped the first index and >> added a new index: >> >> CREATE INDEX index_foo_bar ON foo using btree (bar) where >> created_by_user_group <> 1; >> >> since the query is only using the index in this case anyway. >> >> with the query >> explain analyze select count(*) from foo where bar = 2; it uses the new >> index : >> >> "Aggregate (cost=8.29..8.30 rows=1 width=0) (actual time=0.119..0.119 >> rows=1 loops=1)" >> " -> Index Scan using index_foo_bar on foo (cost=0.00..8.29 rows=1 >> width=0) (actual time=0.017..0.084 rows=63 loops=1)" >> " Index Cond: (bar = 2)" >> "Total runtime: 0.144 ms" >> >> so great improvement from 20ms to 0.144ms >> >> and with the prepared statement... things becomes very bad: >> PREPARE fooplan (int) AS >> select count(*) from foo where bar = $1; >> explain analyze execute fooplan (2); >> >> >> "Aggregate (cost=627406.08..627406.09 rows=1 width=0) (actual >> time=11627.315..11627.316 rows=1 loops=1)" >> " -> Seq Scan on foo (cost=0.00..603856.80 rows=9419712 width=0) (actual >> time=7070.334..11627.266 rows=63 loops=1)" >> " Filter: (bar = $1)" >> "Total runtime: 11627.357 ms" >> >> No index uses and the time becomes very bad. >> >> This is probably due to the fact the prepared statement could have "1" as an >> input and so it avoids the index completely, but this is not very nice for >> software optimization since I am using pdo which uses prepared statements >> all the time and is unable to use all the partial indexes. >> >> The problem is I have 90 GB of indexes in the database and partial indexes >> could help me to save some of this space on my server, in addition to >> improve the speed of the queries. >> > > Unfortunately, prepared statements do act in the way you have seen. > > I have a patch into 9.2 under discussion to improve upon this > situation, but don't hold your breath for that. > > Ok, Thanks for your answer. I think this should at least be writen in this documentation: http://www.postgresql.org/docs/9.0/static/indexes-partial.html as a known limitation to avoid people spending hours of search for why is the application becoming slow. -- Logo_HBS_mail.jpg Adrien DUCOS Analyste développeur aducos@hbs-research.com <mailto:aducos@hbs-research.com> www.hbs-research.com <http://www.hbs-research.com/> +33 (0)9 70 44 64 10 11-15 quai du Président Paul Doumer 92400 Courbevoie
On 06/20/2011 10:55 AM, adrien ducos wrote: > I think this should at least be writen in this documentation: > http://www.postgresql.org/docs/9.0/static/indexes-partial.html > as a known limitation to avoid people spending hours of search for why > is the application becoming slow. It's documented in the "Notes" section of http://www.postgresql.org/docs/9.0/static/sql-prepare.html because it's not specific to partial indexes. Prepared statements are very risky when you run them against queries with a very skewed distribution, like your case where 99% of the values for a column are the same. They help to prevent SQL injection and can reduce parsing/planning overhead, but prepared statements can have terrible performance in some unfortunately common situations. Just recently I had one clients end up switching to another ORM specifically because they couldn't get the one they started with to stop using prepared statements, and those gave them terrible query plans. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
On Mon, Jun 20, 2011 at 7:06 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Unfortunately, prepared statements do act in the way you have seen. > > I have a patch into 9.2 under discussion to improve upon this > situation, but don't hold your breath for that. > > -- > Simon Riggs http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > I haven't used Oracle in years, but I seem to recall that they had "bind variable peeking" to work around this type of problem. That would be interesting to have in PG.