Re: [GENERAL] CLUSTER command
От | Stephan Szabo |
---|---|
Тема | Re: [GENERAL] CLUSTER command |
Дата | |
Msg-id | 20021212154146.T13718-100000@megazone23.bigpanda.com обсуждение исходный текст |
Ответ на | Re: [GENERAL] CLUSTER command (johnnnnnn <john@phaedrusdeinus.org>) |
Ответы |
Re: [GENERAL] CLUSTER command
|
Список | pgsql-performance |
On Thu, 12 Dec 2002, johnnnnnn wrote: > On Thu, Dec 12, 2002 at 05:39:44PM -0500, Jean-Luc Lachance wrote: > > Let's create a new command: > > > > PARTITION <table> ON <attribute> > <snip> > > Because the table file is already extended (2G limit) using > > different files extension (.N) > > how complicated (modifying the code) would it be to have the table > > files split according to the cluster key? > > I think the code changes would be complicated. Just at a 30-second > consideration, this would need to touch: > - all sql (selects, inserts, updates, deletes) > - vacuuming > - indexing > - statistics gathering > - existing clustering I think his idea was to treat it similarly to the way that the system treats tables >2G with .N files. The only thing is that I believe the code that deals with that wouldn't be particularly easy to change to do it though, but I've only taken a cursory look at what I think is the place that does that(storage/smgr/md.c). Some sort of good partitioning system would be nice though. > create table u1 (...); > create table u2 (...); > create table u3 (...); > > create view uv as (select "A" as partition_key, ... from u1 > union all > select "B" as partition_key, ... from u2 > union all > select "C" as partition_key, ... from u3); > > That keeps the tables in different files on-disk while still allowing > you to query against all of them. You need to index them separately > and logic is necessary when changing data. Unfortunately, I think that the optimizer isn't going to do what you'd hope here and scan only the appropriate table if you were to say partition_key='A' and foo='bar'. I'd love to be shown that I'm wrong, but the best I could see hoping for would be that if partition_key was part of u1-u3 and there was an index on partition_key,foo that it could use that and do minimal work on the other tables. In addition, doing something like the above is a nightmare if you don't know beforehand what the partitions should be (for example if you know there aren't alot of distinct values, but you don't know what they are) or for that matter even with 10-15 partitions, writing the rules and such would probably be really error prone.
В списке pgsql-performance по дате отправления: