Range-based clustering?
От | Hans Guijt |
---|---|
Тема | Range-based clustering? |
Дата | |
Msg-id | 85978993146E1A4AA9CC4B2945CAFC940474E0D5@exch-be.terma.com обсуждение исходный текст |
Список | pgsql-admin |
Hi, I have an application that gathers data in a table (who doesn't around here? ;-) ). The behaviour is roughly as follows: - on any single day, a lot of information is inserted. Inserts are more or less in order of increasing timestamp; there may be inserts for data as far back as an hour or two, but not further than that. - inserted data may be updated several times, but only for data from the last couple of hours. Older data is always left alone. - data is inserted on a 24-hour basis (there is no downtime). - to improve read-access speed, the table is clustered on timestamp. - read access is over the entire set, and always returns all records from a timerange (this is why clustering is useful to me). - clustering and vacuuming takes place every 24 hours. Over time, this table has grown considerably, making the cluster-operation take longer and longer. This seems inefficient, since most of the data (basically everything that is older than 24 hours) is already perfectly fine as it is; it is only the last 24 hours, i.e. the bit that has seen numerous inserts and updates, that needs to be straightened out. Is there some way to do limit the cluster-operation to only a subset of the table? In my case that would be "all data with a timestamp in the last 24 hours". Alternatively, is there a way to add this as a feature to a new version of PostgreSQL? I believe the use-case I have presented is in fact fairly common, and might be interesting as a feature to others as well. Regards, Hans Guijt TERMA B.V. Schuttersveld 9 NL-2316 XG, Leiden The Netherlands Phone: +31 (0)71 52 40 835 Telefax: +31 (0)71 51 43 277 E-mail: hg@terma.com
В списке pgsql-admin по дате отправления: