Poor performance o

Поиск
Список
Период
Сортировка
От Craig A. James
Тема Poor performance o
Дата
Msg-id 442080D1.5090803@modgraph-usa.com
обсуждение исходный текст
Ответ на Re: qsort again (was Re: Strange Create Index behaviour)  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Poor performance o  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
I'm reposting this -- I sent this out a month ago but never got a response, and hope someone can shed some light on
this.

Thanks,
Craig

--------------------------

This is a straightforward query that should be fairly quick, but takes about 30 minutes.  It's a query across three
tables,call them A, B, and C.  The tables are joined on indexed columns. 

Here's a quick summary:

Table A -----> Table B -----> Table C
 A_ID           B_ID           C_ID
                A_ID           NAME
                C_ID

Tables A and B have 6 million rows each.  Table C is small: 67 names, no repeats.  All columns involved in the join are
indexed. The database has been full-vacuumed and analyzed. 

Summary:

1. Query B only:    2.7 seconds, 302175 rows returned
2. Join B and C:    4.3 seconds, exact same answer
3. Join A and B:    7.2 minutes, exact same answer
4. Join A, B, C:    32.7 minutes, exact same answer

Looking at these:

Query #1 is doing the real work: finding the rows of interest.

Queries #1 and #2 ought to be virtually identical, since Table C has
just one row with C_ID = 9, but the time almost doubles.

Query #3 should take a bit longer than Query #1 because it has to join
300K rows, but the indexes should make this take just a few seconds,
certainly well under a minute.

Query #4 should be identical to Query #3, again because there's only
one row in Table C.  32 minutes is pretty horrible for such a
straightforward query.

It looks to me like the problem is the use of nested loops when a hash join should be used, but I'm no expert at query
planning.

This is psql 8.0.3.  Table definitions are at the end.  Hardware is a Dell, 2-CPU Xeon, 4 GB memory, database is on a
singleSATA 7200RPM disk. 

These table and column names are altered to protect the guilty, otherwise these are straight from Postgres.


QUERY #1:
---------

explain analyze select B.A_ID from B where B.B_ID = 9;

Index Scan using i_B_B_ID on B  (cost=0.00..154401.36 rows=131236 width=4) (actual time=0.158..1387.251 rows=302175
loops=1)
Index Cond: (B_ID = 9)
Total runtime: 2344.053 ms


QUERY #2:
---------

explain analyze select B.A_ID from B join C on (B.C_ID = C.C_ID) where C.name = 'Joe';

Nested Loop  (cost=0.00..258501.92 rows=177741 width=4) (actual time=0.349..3392.532 rows=302175 loops=1)
->  Seq Scan on C  (cost=0.00..12.90 rows=1 width=4) (actual time=0.232..0.336 rows=1 loops=1)
     Filter: ((name)::text = 'Joe'::text)
->  Index Scan using i_B_C_ID on B  (cost=0.00..254387.31 rows=328137 width=8) (actual time=0.102..1290.002 rows=302175
loops=1)
     Index Cond: (B.C_ID = "outer".C_ID)
Total runtime: 4373.916 ms


QUERY #3:
---------

explain analyze
select A.A_ID from A
 join B on (A.A_ID = B.A_ID)    where B.B_ID = 9;

Nested Loop  (cost=0.00..711336.41 rows=131236 width=4) (actual time=37.118..429419.347 rows=302175 loops=1)
->  Index Scan using i_B_B_ID on B  (cost=0.00..154401.36 rows=131236 width=4) (actual time=27.344..8858.489
rows=302175loops=1) 
     Index Cond: (B_ID = 9)
->  Index Scan using pk_A_test on A  (cost=0.00..4.23 rows=1 width=4) (actual time=1.372..1.376 rows=1 loops=302175)
     Index Cond: (A.A_ID = "outer".A_ID)
Total runtime: 430467.686 ms


QUERY #4:
---------
explain analyze
select A.A_ID from A
 join B on (A.A_ID = B.A_ID)
 join C on (B.B_ID = C.B_ID)
 where C.name = 'Joe';

Nested Loop  (cost=0.00..1012793.38 rows=177741 width=4) (actual time=70.184..1960112.247 rows=302175 loops=1)
->  Nested Loop  (cost=0.00..258501.92 rows=177741 width=4) (actual time=52.114..17753.638 rows=302175 loops=1)
     ->  Seq Scan on C  (cost=0.00..12.90 rows=1 width=4) (actual time=0.109..0.176 rows=1 loops=1)
           Filter: ((name)::text = 'Joe'::text)
     ->  Index Scan using i_B_B_ID on B  (cost=0.00..254387.31 rows=328137 width=8) (actual time=51.985..15566.896
rows=302175loops=1) 
           Index Cond: (B.B_ID = "outer".B_ID)
->  Index Scan using pk_A_test on A  (cost=0.00..4.23 rows=1 width=4) (actual time=6.407..6.412 rows=1 loops=302175)
     Index Cond: (A.A_ID = "outer".A_ID)
Total runtime: 1961200.079 ms


TABLE DEFINITIONS:
------------------

xxx => \d a
            Table "xxx.a"
      Column       |          Type          | Modifiers
 ------------------+------------------------+-----------
  a_id             | integer                | not null
  ... more columns

 Indexes:
  "pk_a_id" PRIMARY KEY, btree (a_id)
    ... more indexes on other columns

xxx => \d b
                Table "xxx.b"
          Column          |          Type          | Modifiers
 -------------------------+------------------------+-----------
   b_id                   | integer                | not null
   a_id                   | integer                | not null
   c_id                   | integer                | not null
   ... more columns

 Indexes:
  "b_pkey" PRIMARY KEY, btree (b_id)
  "i_b_a_id" btree (a_id)
  "i_b_c_id" btree (c_id)


xxx=> \d c
        Table "xxx.c"
    Column     |          Type          | Modifiers
 --------------+------------------------+-----------
   c_id        | integer                | not null
   name        | character varying(200) |
   ... more columns

 Indexes:
  "c_pkey" PRIMARY KEY, btree (c_id)






В списке pgsql-performance по дате отправления:

Предыдущее
От: Scott Marlowe
Дата:
Сообщение: Re: Migration study, step 1: bulk write performance
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Poor performance o