Re: Our CLUSTER implementation is pessimal
От | Heikki Linnakangas |
---|---|
Тема | Re: Our CLUSTER implementation is pessimal |
Дата | |
Msg-id | 48BB9C6C.2000202@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: Our CLUSTER implementation is pessimal (Martijn van Oosterhout <kleptog@svana.org>) |
Список | pgsql-hackers |
Martijn van Oosterhout wrote: > On Mon, Sep 01, 2008 at 12:25:26AM +0100, Gregory Stark wrote: >> The problem is that it does a full index scan and looks up each tuple in the >> order of the index. That means it a) is doing a lot of random i/o and b) has >> to access the same pages over and over again. > > <snip> > >> a) We need some way to decide *when* to do a sort and when to do an index >> scan. The planner has all this machinery but we don't really have all the >> pieces handy to use it in a utility statement. This is especially important >> for the case where we're doing a cluster operation on a table that's already >> clustered. In that case an index scan could conceivably actually win (though I >> kind of doubt it). I don't really have a solution for this. > > The case I had recently was a table that was hugely bloated. 300MB data > and only 110 live rows. A cluster was instant, a seqscan/sort would > probably be much slower. A VACUUM FULL probably worse :) > > Isn't there some compromise. Like say scanning the index to collect a > few thousand records and then sort them the way a bitmap index scan > does. Should be much more efficient that what we have now. Ideally we would use the planner, and the planner would choose the best plan for a bloated table like that (it probably does, I'm not sure) as well. However, I'm not sure how much we can trust the statistics for a table we're about to CLUSTER. Often you run CLUSTER on a table you've just loaded or mass-updated. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: