Re: SQL - Indexing for performance on uniquness check...
От | Josh Berkus |
---|---|
Тема | Re: SQL - Indexing for performance on uniquness check... |
Дата | |
Msg-id | 200407181309.23862.josh@agliodbs.com обсуждение исходный текст |
Ответ на | SQL - Indexing for performance on uniquness check... ("Loftis, Charles E" <charles.loftis@eds.com>) |
Ответы |
Re: SQL - Indexing for performance on uniquness check...
|
Список | pgsql-novice |
Charles, > Assume the table has more attributes than those attributes being GROUPed > on. Also, assume all attributes are of type varchar. > > Sample query to return non-uniqueness > SELECT A1, A2, A3, ..., An > FROM Table > GROUP BY A1, A2, A3, ..., An > HAVING Count(*)>1 In order for it to be even possible to use an index (a hashaggregate operation, actually) on this table, you'd have to include *all* of the GROUP BY columns in a single, multi-column index. However, it would be unlikely for PG to use any kind of an index in the operation above, because of the number of columns, the unlikelyness of grouping (i.e. there will only be a minority of rows with count(*) > 1) and the fact that you're running this against the whole table. So any kind of an index is liable to be useless. If the table is so large that you *have* to use an index or it takes absolutely forever to run, then you may wish to try different operations to detect duplicate rows. For example, if it could be assumed that there was a column Ax which was not included as a unique identifier, and could be counted on to be both (a) not null and (b) different for duplicate rows (a timestamp for example), then you could do: SELECT A1, A2, A3, ... An FROM table WHERE EXISTS (SELECT 1 FROM table t2 WHERE t2.A1 = table.A1 AND t2.A2 = table.A2 ... AND t2.An = table.An AND t2.Ax <> table.Ax ); This can be a much better structure for indexed searches because an index on some-but-not-all of the columns A1 ... An can be used for the EXISTS join, such as an index on A1, A2, A3. Of course, if this is an import table, where some of the rows are *exact* duplicates, the above doesn't help. On the other hand, if the rows *are* exact duplicates, why do you care? Just do SELECT DISTINCT on transfer and eliminate them. -- Josh Berkus Aglio Database Solutions San Francisco
В списке pgsql-novice по дате отправления: