Re: Optimizing suggestions
От | Tom Lane |
---|---|
Тема | Re: Optimizing suggestions |
Дата | |
Msg-id | 9233.1024852902@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Optimizing suggestions (David Ford <david+cert@blue-labs.org>) |
Список | pgsql-general |
David Ford <david+cert@blue-labs.org> writes: > bmilter=> select count(*) from history where timestamp::date > > (now()::date - '7 days'::interval) group by timestamp::date ; > Now.. :) How can I further improve on this? The date-restriction clause is poorly phrased: you're taking a timestamp from now(), coercing to date, coercing back to timestamp (because there is no date - interval operator, only timestamp - interval), subtracting an interval to give a timestamp ... and then on the other side of the comparison, coercing to date and then back to timestamp to do the comparison (unless there's a date > timestamp operator, which I doubt). Aside from all the wasted datatype conversion operators, there is no hope of using this clause with an index on the timestamp column. What I'd try is where timestamp >= (current_date - 6) (which I think has the same boundary-condition behavior as your given expression; adjust to taste). As of current development sources (7.3 to be) the above WHERE clause is directly indexscannable. In existing releases you have to play some games to get the planner to realize that (current_date - 6) can be treated as a constant for the purposes of making an indexscan. One way is to make a function called, say, "ago(int)" that returns current_date minus the parameter, and mark it isCachable. > In the near future I'm going to split this table up into many sub tables >> >> Why? Aren't you just going to have a need for lots of joins, if you >> do that? > No, actually most of this data will be drawn into focused reporting for > graphing metrics. Only when I drill down to a specific (small) set of > message serial numbers, will I be doing a join of all this data. > In my naivety, which is more performance effective? Most of the time I > will be searching for a count of statistics on a single column. I'd guess that you're better off with the single table anyway, just on grounds of simplicity. I really doubt that the split-up could provide enough performance gain to be worth the trouble... > p.s. I am assuming count(*) v.s. count(column) is always faster because > it's doing a generic row count v.s. accumulating stats on a column > inside a row? Marginally faster, yes --- the count(column) form has to take the time to see whether the column is null or not. regards, tom lane
В списке pgsql-general по дате отправления: