Re: Dynamic Partitioning using Segment Visibility Maps
От | Richard Huxton |
---|---|
Тема | Re: Dynamic Partitioning using Segment Visibility Maps |
Дата | |
Msg-id | 477E119F.1090102@archonet.com обсуждение исходный текст |
Ответ на | Re: Dynamic Partitioning using Segment Visibility Maps (Simon Riggs <simon@2ndquadrant.com>) |
Список | pgsql-hackers |
Simon Riggs wrote: > On Fri, 2008-01-04 at 10:22 +0000, Richard Huxton wrote: >> Simon Riggs wrote: >>> We would keep a dynamic visibility map at *segment* level, showing which >>> segments have all rows as 100% visible. No freespace map data would be >>> held at this level. >> Small dumb-user question. >> >> I take it you've considered some more flexible consecutive-run-of-blocks >> unit of flagging rather than file-segments. That obviously complicates >> the tracking but means you can cope with infrequent updates as well as >> mark most of the "most recent" segment for log-style tables. > > I'm writing the code to abstract that away, so yes. > > Now you mention it, it does seem straightforward to have a table storage > parameter for partition size, which defaults to 1GB. The partition size > is simply a number of consecutive blocks, as you say. > > The smaller the partition size the greater the overhead of managing it. Oh, obviously, but with smaller partition sizes this also becomes useful for low-end systems as well as high-end ones. Skipping 80% of a seq-scan on a date-range query is a win for even small (by your standards) tables. I shouldn't be surprised if the sensible-number-of-partitions remained more-or-less constant as you scaled the hardware, but the partition size grew. > Also I've been looking at read-only tables and compression, as you may > know. My idea was that in the future we could mark segments as either > - read-only > - compressed > - able to be shipped off to hierarchical storage > > Those ideas work best if the partitioning is based around the physical > file sizes we use for segments. I can see why you've chosen file segments. It certainly makes things easier. Hmm - thinking about the date-range scenario above, it occurs to me that for seq-scan purposes the correct partition sizedepends upon the data value you are interested in. What I want to know is what blocks Jan 07 covers (or rather what blocks it doesn't) rather than knowing blocks 1-9999999 cover 2005-04-12 to 2007-10-13. Of course that means that you'd eventually want different partition sizes tracking visibility for different columns (e.g. id, timestamp). I suspect the same would be true for read-only/compressed/archived flags, but I can see how they are tightly linked to physical files (particularly the last two). -- Richard Huxton Archonet Ltd
В списке pgsql-hackers по дате отправления: