Re: [HACKERS] Maintaining cluster order on insert
От | Ron Mayer |
---|---|
Тема | Re: [HACKERS] Maintaining cluster order on insert |
Дата | |
Msg-id | 44DD13C6.2020609@cheapcomplexdevices.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] Maintaining cluster order on insert (Heikki Linnakangas <heikki@enterprisedb.com>) |
Список | pgsql-patches |
Heikki Linnakangas wrote: > Ron Mayer wrote: >> In my case my biggest/slowest tables are clustered by zip-code (which >> does a reasonable job at keeping counties/cities/etc on the >> same pages too).... > > No deletes? If the tables grow over time, you probably would need to run > CLUSTER every now and then to get the best performance, though the patch > would alleviate that quite a lot. Yup, pretty much no deletes; since it's a historical archive of some government documents with address info. Though I the live system may periodically expunge data, say, 10+years old. > Do you have a development environment where you could test what effect > the patch would have? It would be interesting to have a real-world use > case, since I don't have one myself at the moment. I have a development environment, but it doesn't have the same real-time-growing behavior, and only a small region of the country. I suppose I could pre-load N-1 years and cluster it, and then incrementally insert the last year of data to simulate the effect. But sure, I'll attempt to try the patch; but don't really have any good benchmarking environment to give any definitive results. If an anecdotal "this is how it feels to me" is useful, I can give one of those. >> > Your best bet might be to partition the table into two subtables, one >> > with "stable" data and one with the fresh data. >> >> Hmm... that should work well for me too.... > > Umm, if your inserts are uniformly distributed across the country, you > wouldn't have a stable part, right? Hmm. Maybe. I was thinking when archiving to the large table an "order by" clause when inserting from the new partition to the stable partition could at least make the big table "piecewise" clustered so most records for a zip code fit in the same few disk pages, even though those pages would still end up lying around far apart on the disk. I wonder what part of "CLUSTER" gives the most benefit - that most records of a type fit on a few blocks; or that those blocks are next to each other so can be read sequentially?
В списке pgsql-patches по дате отправления: