Query improvement

Поиск
Список
Период
Сортировка
От Mark
Тема Query improvement
Дата
Msg-id 1304245432864-4362578.post@n5.nabble.com
обсуждение исходный текст
Ответы Re: Query improvement  (Claudio Freire <klaussfreire@gmail.com>)
Список pgsql-performance
Hi I have 3 tables
page - revision - pagecontent

CREATE TABLE mediawiki.page
(
  page_id serial NOT NULL,
  page_namespace smallint NOT NULL,
  page_title text NOT NULL,
  page_restrictions text,
  page_counter bigint NOT NULL DEFAULT 0,
  page_is_redirect smallint NOT NULL DEFAULT 0,
  page_is_new smallint NOT NULL DEFAULT 0,
  page_random numeric(15,14) NOT NULL DEFAULT random(),
  page_touched timestamp with time zone,
  page_latest integer NOT NULL,
  page_len integer NOT NULL,
  titlevector tsvector,
  page_type integer NOT NULL DEFAULT 0,
  CONSTRAINT page_pkey PRIMARY KEY (page_id)
);

CREATE TABLE mediawiki.revision
(
  rev_id serial NOT NULL,
  rev_page integer,
  rev_text_id integer,
  rev_comment text,
  rev_user integer NOT NULL,
  rev_user_text text NOT NULL,
  rev_timestamp timestamp with time zone NOT NULL,
  rev_minor_edit smallint NOT NULL DEFAULT 0,
  rev_deleted smallint NOT NULL DEFAULT 0,
  rev_len integer,
  rev_parent_id integer,
  CONSTRAINT revision_rev_page_fkey FOREIGN KEY (rev_page)
      REFERENCES mediawiki.page (page_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT revision_rev_id_key UNIQUE (rev_id)
)

CREATE TABLE mediawiki.pagecontent
(
  old_id integer NOT NULL DEFAULT
nextval('mediawiki.text_old_id_seq'::regclass),
  old_text text,
  old_flags text,
  textvector tsvector,
  CONSTRAINT pagecontent_pkey PRIMARY KEY (old_id)
)

where i have query
SELECT pa.page_id, pa.page_title,
ts_rank(pc.textvector,(to_tsquery('fotbal')))+ts_rank(pa.titlevector,(to_tsquery('fotbal')))*10
as totalrank
    from mediawiki.page pa, mediawiki.revision re, mediawiki.pagecontent pc
    WHERE pa.page_id in
        (SELECT page_id FROM mediawiki.page WHERE page_id IN
        (SELECT page_id FROM mediawiki.page
             WHERE (titlevector @@ (to_tsquery('fotbal'))))
        OR page_id IN
        (SELECT p.page_id from mediawiki.page p,mediawiki.revision r,
        (SELECT old_id FROM mediawiki.pagecontent
        WHERE (textvector @@ (to_tsquery('fotbal')))) ss
        WHERE (p.page_id=r.rev_page AND r.rev_id=ss.old_id)))
    AND (pa.page_id=re.rev_page AND re.rev_id=pc.old_id)
    ORDER BY totalrank LIMIT 100;

This query find out titles of pages in page and content in page content by
full text search - @@
afterwards i count for the resulted id  by ts_rank the relevance.

Now the problem.
When I try ANALYZE it shows:
"Limit  (cost=136568.00..136568.25 rows=100 width=185)"
"  ->  Sort  (cost=136568.00..137152.26 rows=233703 width=185)"
"        Sort Key: ((ts_rank(pc.textvector, to_tsquery('fotbal'::text)) +
(ts_rank(pa.titlevector, to_tsquery('fotbal'::text)) * 10::double
precision)))"
"        ->  Hash Join  (cost=61707.99..127636.04 rows=233703 width=185)"
"              Hash Cond: (re.rev_id = pc.old_id)"
"              ->  Merge Join  (cost=24098.90..71107.48 rows=233703
width=66)"
"                    Merge Cond: (pa.page_id = re.rev_page)"
"                    ->  Merge Semi Join  (cost=24096.98..55665.69
rows=233703 width=66)"
"                          Merge Cond: (pa.page_id =
mediawiki.page.page_id)"
"                          ->  Index Scan using page_btree_id on page pa
(cost=0.00..13155.20 rows=311604 width=62)"
"                          ->  Index Scan using page_btree_id on page
(cost=24096.98..38810.19 rows=233703 width=4)"
"                                Filter: ((hashed SubPlan 1) OR (hashed
SubPlan 2))"
"                                SubPlan 1"
"                                  ->  Bitmap Heap Scan on page
(cost=10.41..900.33 rows=270 width=4)"
"                                        Recheck Cond: (titlevector @@
to_tsquery('fotbal'::text))"
"                                        ->  Bitmap Index Scan on gin_index
(cost=0.00..10.34 rows=270 width=0)"
"                                              Index Cond: (titlevector @@
to_tsquery('fotbal'::text))"
"                                SubPlan 2"
"                                  ->  Nested Loop  (cost=1499.29..23192.08
rows=1558 width=4)"
"                                        ->  Nested Loop
(cost=1499.29..15967.11 rows=1558 width=4)"
"                                              ->  Bitmap Heap Scan on
pagecontent  (cost=1499.29..6448.12 rows=1558 width=4)"
"                                                    Recheck Cond:
(textvector @@ to_tsquery('fotbal'::text))"
"                                                    ->  Bitmap Index Scan
on gin_index2  (cost=0.00..1498.90 rows=1558 width=0)"
"                                                          Index Cond:
(textvector @@ to_tsquery('fotbal'::text))"
"                                              ->  Index Scan using
page_btree_rev_content_id on revision r  (cost=0.00..6.10 rows=1 width=8)"
"                                                    Index Cond: (r.rev_id =
pagecontent.old_id)"
"                                        ->  Index Scan using page_btree_id
on page p  (cost=0.00..4.62 rows=1 width=4)"
"                                              Index Cond: (p.page_id =
r.rev_page)"
"                    ->  Index Scan using page_btree_rev_page_id on revision
re  (cost=0.00..11850.52 rows=311604 width=8)"
"              ->  Hash  (cost=27932.04..27932.04 rows=311604 width=127)"
"                    ->  Seq Scan on pagecontent pc  (cost=0.00..27932.04
rows=311604 width=127)"


I there some posibility to speed up the hash join which takes a lot of time?
I have tried to find some solution, but it was not successfull.
Thanks a lot.--
View this message in context: http://postgresql.1045698.n5.nabble.com/Query-improvement-tp4362578p4362578.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

В списке pgsql-performance по дате отправления:

Предыдущее
От: Justin Pitts
Дата:
Сообщение: Re: FUSION-IO io cards
Следующее
От: Claudio Freire
Дата:
Сообщение: Re: Query improvement