Re: Equivalent praxis to CLUSTERED INDEX?
От | J. Andrew Rogers |
---|---|
Тема | Re: Equivalent praxis to CLUSTERED INDEX? |
Дата | |
Msg-id | 1093544071.349.106.camel@vulture.corp.neopolitan.com обсуждение исходный текст |
Ответ на | Equivalent praxis to CLUSTERED INDEX? (Mischa Sandberg <mischa.sandberg@telus.net>) |
Ответы |
Re: Equivalent praxis to CLUSTERED INDEX?
|
Список | pgsql-performance |
On Tue, 2004-08-24 at 22:28, Mischa Sandberg wrote: > I see that PG has a one-shot CLUSTER command, but doesn't support > continuously-updated clustered indexes. > > What I infer from newsgroup browsing is, such an index is impossible, > given the MVCC versioning of records (happy to learn I'm wrong). It is possible to have MVCC and ordered/indexed heaps, but it isn't something you can just tack onto the currently supported types -- I looked into this myself. It would take substantial additional code infrastructure to support it, basically an alternative heap system and adding support for tables with odd properties to many parts of the system. Pretty non-trivial. This is probably my #1 "I wish postgres had this feature" feature. It is a serious scalability enhancer for big systems and a pain to work around not having. > I'd be curious to know what other people, who've crossed this same > bridge from MSSQL or Oracle or Sybase to PG, have devised, > faced with the same kind of desired performance gain for retrieving > blocks of rows with the same partial key. The CLUSTER command is often virtually useless for precisely the kinds of tables that need to be clustered. My databases are on-line 24x7, and the tables that are ideal candidates for clustering are in the range of 50-100 million rows. I can afford to lock these tables up for no more than 5-10 minutes during off-peak in the hopes that no one notices, and CLUSTER does not work remotely in the ballpark of that fast for tables of that size. People who can run CLUSTER in a cron job must either have relatively small tables or regular large maintenance windows. My solution, which may or may not work for you, was to write a table partitioning system using the natural flexibility and programmability of postgresql (e.g. table inheritance). From this I automatically get a roughly ordered heap according to the index I would cluster on, with only slightly funky SQL access. The end result works much better with CLUSTER too, though CLUSTER is much less necessary at that point because, at least for my particular purposes, the rows are mostly ordered due to how the data was partitioned. So there are ways to work around CLUSTER, but you'll have to be clever and it will require tailoring the solution to your particular requirements. J. Andrew Rogers
В списке pgsql-performance по дате отправления: