-------------- Original message ----------------------
From: William Garrison <postgres@mobydisk.com>
> I am looking for records with duplicate keys, so I am running this query:
>
> SELECT
> fileid, COUNT(*)
> FROM
> file
> GROUP BY
> fileid
> HAVING
> COUNT(*)>1
>
> The table has an index on fileid (non-unique index) so I am surprised
> that postgres is doing a table scan. This database is >15GB, and there
> are a number of fairly large string columns in the table. I am very
> surprised that scanning the index is not faster than scanning the
> table. Any thoughts on that? Is scanning the table faster than
> scanning the index? Is there a reason that it needs anything other than
> the index?
>
I may be missing something, but it would have to scan the entire table to get all the occurrences of each fileid in
orderto do the count(*).
--
Adrian Klaver
aklaver@comcast.net