Re: Random Page Cost and Planner
От | Alexey Klyukin |
---|---|
Тема | Re: Random Page Cost and Planner |
Дата | |
Msg-id | 1B94E0EC-D665-4059-83DC-5D1B1E2B1009@commandprompt.com обсуждение исходный текст |
Ответ на | Re: Random Page Cost and Planner (David Jarvis <thangalin@gmail.com>) |
Ответы |
Re: Random Page Cost and Planner
|
Список | pgsql-performance |
On May 26, 2010, at 6:50 AM, David Jarvis wrote: > > That said, when using the following condition, the query is fast (1 second): > > extract(YEAR FROM sc.taken_start) >= 1963 AND > extract(YEAR FROM sc.taken_end) <= 2009 AND > > " -> Index Scan using measurement_013_stc_idx on measurement_013 m (cost=0.00..511.00 rows=511 width=15)(actual time=0.018..3.601 rows=3356 loops=104)" > " Index Cond: ((m.station_id = sc.station_id) AND (m.taken >= sc.taken_start) AND (m.taken <=sc.taken_end) AND (m.category_id = 7))" > > This condition makes it slow (13 seconds on first run, 8 seconds thereafter): > > extract(YEAR FROM sc.taken_start) >= 1900 AND > extract(YEAR FROM sc.taken_end) <= 2009 AND > > " Filter: (category_id = 7)" > " -> Seq Scan on measurement_013 m (cost=0.00..359704.80 rows=18118464 width=15) (actual time=0.008..4025.692rows=18118395 loops=1)" > > At this point, I'm tempted to write a stored procedure that iterates over each station category for all the years of eachstation. My guess is that the planner's estimate for the number of rows that will be returned by extract(YEAR FROM sc.taken_start)>= 1900 is incorrect and so it chooses a full table scan for all rows. Nope, it appears that the planner estimate is correct (it estimates 18118464 vs 18118464 real rows). I think what's happeningthere is that 18M rows is large enough part of the total table rows that it makes sense to scan it sequentially(eliminating random access costs). Try SET enable_seqsan = false and repeat the query - there is a chance thatthe index scan would be even slower. > The part I am having trouble with is convincing PG to use the index for the station ID and the date range for when thestation was active. Each station has a unique ID; the data in the measurement table is ordered by measurement date thenby station. > > Should I add a clustered index by station then by date? > > Any other suggestions are very much appreciated. Is it necessary to get the data as far as 1900 all the time ? Maybe there is a possibility to aggregate results from the past years if they are constant. Regards, -- Alexey Klyukin <alexk@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc.
В списке pgsql-performance по дате отправления: