Re: How slow is DISTINCT?
От | Michael Contzen |
---|---|
Тема | Re: How slow is DISTINCT? |
Дата | |
Msg-id | 3CB2BF53.77D41E79@dohle.com обсуждение исходный текст |
Ответ на | How slow is DISTINCT? (Wei Weng <wweng@kencast.com>) |
Ответы |
Re: How slow is DISTINCT?
Re: How slow is DISTINCT? |
Список | pgsql-sql |
Hello, our problem to 'select distinct' is similar: We have a big table of about 320.000.000 rows of an datawarehouse application. 55 GB, yes, it works fine! Createing an index on it quite fast (nearly as fast than oracle on the same data and same machine) 1GHz P4, 4GB-Ram -> 30 Minutes. Then we tried 'select distinct one_field' which would result to about 200.000 different values. Postgres needed 6 hours while Oracle managed it in about 30 minutes. Looking into the pgsql_tmp directory of this db while doing this selection showed me a lot of tempfiles nearly as big as the table. Does postgres sort the whole table without projection to one column an performs a unique on this whole table? This would explain the big amount of disk usage in pgsql_tmp and the big amount of time. The statement, something could be wrong with the data, is not very useful: This is data of our electronic cash-desks. Unfortunately our customers buy every day nearly the same articles - therefor the repeatition of data :-) Kind regards M.Contzen Developer Dohle Systemberatung Germany Some facts of our test: Table "warenausg_ges" Column | Type | Modifiers -----------+---------------+-----------ean | numeric(13,0) | menge | numeric(13,3) | lvkumsatz | numeric(15,3)| vkumsatz | numeric(15,3) | ekumsatz | numeric(15,3) | rabatt | numeric(12,0) | kdnr | numeric(10,0)| artnr | numeric(10,0) | lfnr | numeric(10,0) | wg | integer | aktion | character(1) | datum | date | status | integer | Indexes: warenausg_ges_inx Index "warenausg_ges_inx"Column | Type --------+------datum | date btree explain select distinct artnr from warenausg_ges; NOTICE: QUERY PLAN: Unique (cost=224522801.22..225315849.86 rows=31721946 width=12) -> Sort (cost=224522801.22..224522801.22 rows=317219456width=12) -> Seq Scan on warenausg_ges (cost=0.00..165793667.00 rows=317219456 width=12)
В списке pgsql-sql по дате отправления: