Re: BUG #4462: Adding COUNT to query causes massive slowdown
От | Jussi Pakkanen |
---|---|
Тема | Re: BUG #4462: Adding COUNT to query causes massive slowdown |
Дата | |
Msg-id | 42d23b2e0810101255p59d5d1eyce79a0c917bd72ed@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #4462: Adding COUNT to query causes massive slowdown (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: BUG #4462: Adding COUNT to query causes massive slowdown
|
Список | pgsql-bugs |
On Thu, Oct 9, 2008 at 6:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Given that PostgreSQL does the scan even with the huge seqscan >> penalty, I can think of only two different causes: >> 1) some sort of a bug in the query analyzer >> 2) SELECT COUNT(DISTINCT x) for some reason requires information that >> is not available in the index. > > Try (3) COUNT(DISTINCT x) ... or any DISTINCT aggregate for that matter > ... is implemented by a sort-and-uniq step inside the aggregate function > itself. You can't see it in the plan. Does this mean that the sort-and-uniq will always lead to a full table scan? It would seem so, because I could not force PostgreSQL to use the index even with enable_seqscan set to off. I understand that in some cases the table scan is faster, but it is very strange if the query optimizer refuses to use the index no matter what. A quick calculation says that the table scan needs to access 32 million elements (and sort them, and uniq them). An index scan needs only 2 million (or 4 million I suppose, if you account for the higher levels in the B-tree). That is an order of magnitude difference in disk reads alone. > I wouldn't actually think that this approach would be slower than an > indexscan, btw, unless maybe the index were very nearly correlated with > physical order --- but that would make the sort more efficient, too. I have ordered the data table according to the "code" column by first importing it to a temp table and then building the actual "log" table with CREATE TABLE log AS SELECT ... ORDER BY code;. In this case the index is faster. A lot faster. SELECT DISTINCT using the index and counting the rows takes 4 minutes. Building a view with distinct "code"s and counting that also takes 4 minutes. But the aggregate scan takes 11 minutes. Worst of all, COUNT(DISTINCT ...) is the way all SQL books tell you to do these kinds of queries. > Perhaps you need to raise work_mem enough to allow the sort to take > place without spilling to disk? (Turning on trace_sort should let you > see what's happening there.) But isn't this just hiding the root cause? Having working sets larger than available memory is not that uncommon.
В списке pgsql-bugs по дате отправления: