Help with query: indexes on timestamps

Поиск
Список
Период
Сортировка
От Keith C. Perry
Тема Help with query: indexes on timestamps
Дата
Msg-id 1089219845.40ec2d0503fb7@webmail.vcsn.com
обсуждение исходный текст
Ответы Re: Help with query: indexes on timestamps  (Richard Huxton <dev@archonet.com>)
Список pgsql-general
Ok, I've tried a number of things here and I know I'm missing something but at
this point my head is spinning (i.e. lack of sleep, too much coffee, etc...)

My environment is PG 7.4.3 on Linux with 512Mb of ram and swap.  This was just
upgraded from 7.4 (just to make sure I'm current).  Some of my settings in
postgresql are giving fatal errors but I don't think my issue is related to my
query problems.  I also have a laptop running with the same basic specs (no
raid, slower processor).

I use a recent pgadmin-III as my client.

We're also running this query in MS-SQL.

I have a table with with 1 million records in it.  Here is the definition

CREATE TABLE report
(
  match int4,
  action varchar(16),
  stamp timestamptz,
  account varchar(32),
  ipaddress inet,
  profile varchar(16),
  rating text,
  url text
)
WITHOUT OIDS;

The is one index:

CREATE INDEX stamp_idx
  ON report
  USING btree
  (stamp);

That query I'm running is:

 SELECT date_part('hour'::text, report.stamp) AS "hour", count(*) AS count
   FROM report
  GROUP BY date_part('hour'::text, report.stamp)
  ORDER BY date_part('hour'::text, report.stamp);



Here is the plan I get:

                                 QUERY PLAN
----------------------------------------------------------------------------
 Sort  (cost=47420.64..47421.14 rows=200 width=8)
   Sort Key: date_part('hour'::text, stamp)
   ->  HashAggregate  (cost=47412.00..47413.00 rows=200 width=8)
         ->  Seq Scan on report  (cost=0.00..42412.00 rows=1000000 width=8)
(4 rows)


Now from from I understand that, the index I created would not be used since I
would be looking at every row to do the date part.  The query under 7.4 ran in
about 8 seconds.  In 7.4.3, its taking 37 seconds for the same plan (which is
fine for the system not be tuned yet).  On my laptop its taking 6 seconds.
MS-SQL is taking 8 seconds.  These runs are after I do vacuum full, vacuum
analyse and reindex on the database and table respectively

My question:  How can I get this query to use an index build on the date_part
function.  On the MS-SQL side, creating a computed column with the date part and
then don't an index on that column bring the query done to 2 seconds.

I tried creating this function:

CREATE OR REPLACE FUNCTION whathour(timestamptz)
  RETURNS int4 AS
'begin
  return date_part(\'hour\',$1);
end;'
  LANGUAGE 'plpgsql' IMMUTABLE;

and then and index:

CREATE INDEX hour_idx
  ON report
  USING btree
  (stamp)
  WHERE whathour(stamp) >= 0 AND whathour(stamp) <= 23;

but I get the same plan- which makes sense to me because I'm again inspect
quiet a few row.  I'm sure I'm missing something...

I couldn't see from the docs how to make a column equal a function (like
MS-SQL's computed column) but to me it seems like I should not have to do
something like that since it really is wasting space in the table.  I hoping a
partial index or a function index will solve this and be just as efficient.
However, that method **does** work.  Is there a better way?

Thanks to all in advance.

--
Keith C. Perry, MS E.E.
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

____________________________________
This email account is being host by:
VCSN, Inc : http://vcsn.com

В списке pgsql-general по дате отправления:

Предыдущее
От: Bruno Wolff III
Дата:
Сообщение: Re: Grant Select privileges for all tables in schema
Следующее
От: Richard Huxton
Дата:
Сообщение: Re: Help with query: indexes on timestamps