Help with query: indexes on timestamps

Поиск

Список

Период

Сортировка

От	Keith C. Perry
Тема	Help with query: indexes on timestamps
Дата	7 июля 2004 г. 14:04:25
Msg-id	1089219845.40ec2d0503fb7@webmail.vcsn.com обсуждение исходный текст
Ответы	Re: Help with query: indexes on timestamps
Список	pgsql-general

Дерево обсуждения

Ok, I've tried a number of things here and I know I'm missing something but at
this point my head is spinning (i.e. lack of sleep, too much coffee, etc...)

My environment is PG 7.4.3 on Linux with 512Mb of ram and swap.  This was just
upgraded from 7.4 (just to make sure I'm current).  Some of my settings in
postgresql are giving fatal errors but I don't think my issue is related to my
query problems.  I also have a laptop running with the same basic specs (no
raid, slower processor).

I use a recent pgadmin-III as my client.

We're also running this query in MS-SQL.

I have a table with with 1 million records in it.  Here is the definition

CREATE TABLE report
(
  match int4,
  action varchar(16),
  stamp timestamptz,
  account varchar(32),
  ipaddress inet,
  profile varchar(16),
  rating text,
  url text
)
WITHOUT OIDS;

The is one index:

CREATE INDEX stamp_idx
  ON report
  USING btree
  (stamp);

That query I'm running is:

 SELECT date_part('hour'::text, report.stamp) AS "hour", count(*) AS count
   FROM report
  GROUP BY date_part('hour'::text, report.stamp)
  ORDER BY date_part('hour'::text, report.stamp);



Here is the plan I get:

                                 QUERY PLAN
----------------------------------------------------------------------------
 Sort  (cost=47420.64..47421.14 rows=200 width=8)
   Sort Key: date_part('hour'::text, stamp)
   ->  HashAggregate  (cost=47412.00..47413.00 rows=200 width=8)
         ->  Seq Scan on report  (cost=0.00..42412.00 rows=1000000 width=8)
(4 rows)


Now from from I understand that, the index I created would not be used since I
would be looking at every row to do the date part.  The query under 7.4 ran in
about 8 seconds.  In 7.4.3, its taking 37 seconds for the same plan (which is
fine for the system not be tuned yet).  On my laptop its taking 6 seconds.
MS-SQL is taking 8 seconds.  These runs are after I do vacuum full, vacuum
analyse and reindex on the database and table respectively

My question:  How can I get this query to use an index build on the date_part
function.  On the MS-SQL side, creating a computed column with the date part and
then don't an index on that column bring the query done to 2 seconds.

I tried creating this function:

CREATE OR REPLACE FUNCTION whathour(timestamptz)
  RETURNS int4 AS
'begin
  return date_part(\'hour\',$1);
end;'
  LANGUAGE 'plpgsql' IMMUTABLE;

and then and index:

CREATE INDEX hour_idx
  ON report
  USING btree
  (stamp)
  WHERE whathour(stamp) >= 0 AND whathour(stamp) <= 23;

but I get the same plan- which makes sense to me because I'm again inspect
quiet a few row.  I'm sure I'm missing something...

I couldn't see from the docs how to make a column equal a function (like
MS-SQL's computed column) but to me it seems like I should not have to do
something like that since it really is wasting space in the table.  I hoping a
partial index or a function index will solve this and be just as efficient.
However, that method **does** work.  Is there a better way?

Thanks to all in advance.

--
Keith C. Perry, MS E.E.
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

____________________________________
This email account is being host by:
VCSN, Inc : http://vcsn.com

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Help with query: indexes on timestamps