Re: Random Page Cost and Planner

Поиск

Список

Период

Сортировка

От	Alexey Klyukin
Тема	Re: Random Page Cost and Planner
Дата	26 мая 2010 г. 06:00:49
Msg-id	1B94E0EC-D665-4059-83DC-5D1B1E2B1009@commandprompt.com обсуждение исходный текст
Ответ на	Re: Random Page Cost and Planner (David Jarvis <thangalin@gmail.com>)
Ответы	Re: Random Page Cost and Planner
Список	pgsql-performance

Дерево обсуждения

On May 26, 2010, at 6:50 AM, David Jarvis wrote:
>
> That said, when using the following condition, the query is fast (1 second):
>
>     extract(YEAR FROM sc.taken_start) >= 1963 AND
>     extract(YEAR FROM sc.taken_end) <= 2009 AND
>
> "                    ->  Index Scan using measurement_013_stc_idx on measurement_013 m  (cost=0.00..511.00 rows=511
width=15)(actual time=0.018..3.601 rows=3356 loops=104)" 
> "                          Index Cond: ((m.station_id = sc.station_id) AND (m.taken >= sc.taken_start) AND (m.taken
<=sc.taken_end) AND (m.category_id = 7))" 
>
> This condition makes it slow (13 seconds on first run, 8 seconds thereafter):
>
>     extract(YEAR FROM sc.taken_start) >= 1900 AND
>     extract(YEAR FROM sc.taken_end) <= 2009 AND
>
> "                          Filter: (category_id = 7)"
> "                    ->  Seq Scan on measurement_013 m  (cost=0.00..359704.80 rows=18118464 width=15) (actual
time=0.008..4025.692rows=18118395 loops=1)" 
>
> At this point, I'm tempted to write a stored procedure that iterates over each station category for all the years of
eachstation. My guess is that the planner's estimate for the number of rows that will be returned by extract(YEAR FROM
sc.taken_start)>= 1900 is incorrect and so it chooses a full table scan for all rows.  

Nope, it appears that the planner estimate is correct (it estimates 18118464 vs 18118464 real rows). I think what's
happeningthere is  that 18M rows is large enough part of the total table rows that it makes sense to scan it
sequentially(eliminating random access costs).  Try SET enable_seqsan = false and repeat the query - there is a chance
thatthe index scan would be even slower. 

> The part I am having trouble with is convincing PG to use the index for the station ID and the date range for when
thestation was active. Each station has a unique ID; the data in the measurement table is ordered by measurement date
thenby station. 
>
> Should I add a clustered index by station then by date?
>
> Any other suggestions are very much appreciated.

Is it necessary to get the data as far as 1900 all the time ? Maybe there is a possibility to aggregate results
from the past years if they are constant.

Regards,
--
Alexey Klyukin <alexk@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Random Page Cost and Planner