Re: Mnogosearch (Was: Re: website doc search is ... )

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: Mnogosearch (Was: Re: website doc search is ... )
Дата	1 января 2004 г. 15:56:11
Msg-id	29892.1072986918@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Mnogosearch (Was: Re: website doc search is ... ) ("Marc G. Fournier" <scrappy@postgresql.org>)
Ответы	Re: Mnogosearch (Was: Re: website doc search is ... )
Список	pgsql-general

Дерево обсуждения

"Marc G. Fournier" <scrappy@postgresql.org> writes:
> The full first query:

> SELECT ndict8.url_id,ndict8.intag
>   FROM ndict8, url
>  WHERE ndict8.word_id=417851441
>    AND url.rec_id=ndict8.url_id
>    AND ((url.url || '') LIKE 'http://archives.postgresql.org/%%');

> returns 13415 rows, and explain analyze shows:

>  Nested Loop  (cost=0.00..30199.82 rows=17 width=8) (actual time=0.312..1459.504 rows=13415 loops=1)
>    ->  Index Scan using n8_word on ndict8  (cost=0.00..12616.09 rows=3219 width=8) (actual time=0.186..387.673
rows=15532loops=1) 
>          Index Cond: (word_id = 417851441)
>    ->  Index Scan using url_rec_id on url  (cost=0.00..5.45 rows=1 width=4) (actual time=0.029..0.050 rows=1
loops=15532)
>          Index Cond: (url.rec_id = "outer".url_id)
>          Filter: ((url || ''::text) ~~ 'http://archives.postgresql.org/%%'::text)
>  Total runtime: 1520.145 ms
> (7 rows)

The more I look at it, the more it seems that this is the best plan for
the query.  Since the URL condition is very unselective (and will
probably be so in most all variants of this query), it just doesn't pay
to try to apply it before doing the join.  What we want is to make the
join happen quickly, and not even bother applying the URL test until
after we have a joinable url entry.

(In the back of my mind here is the knowledge that mnogosearch is
optimized for mysql, which is too stupid to do the query in any way
other than a plan like the above.)

I think Bruce's original suggestion of clustering was right on, except
he guessed wrong about what to cluster.  The slow part is the scan on
ndict8, as we saw in the later message:

   ->  Index Scan using n8_word on ndict8  (cost=0.00..12616.09 rows=3219 width=8) (actual time=113.645..79163.431
rows=15533loops=1) 
         Index Cond: (word_id = 417851441)

Presumably, the first EXPLAIN shows the behavior when this portion of
ndict8 and its index have been cached, while the second EXPLAIN shows
what happens when they're not in cache.  So my suggestion is to CLUSTER
ndict8 on n8_word.  It might also help to CLUSTER url on url_rec_id.
Make sure the plan goes back to the nested indexscan as above (you might
need to undo the statistics-target changes).

            regards, tom lane

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Mnogosearch (Was: Re: website doc search is ... )