Re: many tables in db
От | Tom Lane |
---|---|
Тема | Re: many tables in db |
Дата | |
Msg-id | 24822.995567160@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | many tables in db (Kenneth Been <kennethb@telocity.com>) |
Список | pgsql-general |
Kenneth Been <kennethb@telocity.com> writes: > I am considering a db schema with many tables (in the > thousands). Just out of curiosity, why would you need so many tables? Couldn't you represent the same info in fewer tables, maybe with more key columns? If every table has a different set of columns, that wouldn't work, but I'm wondering what the organizing principle really is. > 2. One of the reasons I am considering breaking the data > into multiple tables is that I want to cluster a large table > (on an rtree index) This isn't really relevant to your main point, but: since an rtree doesn't have an associated sort order, it's not clear to me that this operation makes any sense. Have you determined that you'll actually get any performance improvement as a result of the clustering? I suspect you may find that you're just rearranging the table into a different random order. > 3. Another reason for splitting up the data is that I am > worried about the size of the rtree. I have some suspicion > that the rtree size grows more than linearly with the size > of the table. Is that the case? Dunno. The rtree code isn't very well maintained, because none of the current crop of developers uses it or knows much about it. It could well have some bugs that cause it to waste space. (I realized only a few days ago that it probably behaves completely horribly when asked to store NULLs, for example. Do you have any NULLs in the column you're indexing?) > 4. Another possible reason is query speed. Any guesses on > which would be faster, two or three queries on tables of > about 40K records, or one query on a table of about 10M > records? (All queries are geometric range searches on an > rtree index.) Given the size difference, I'd suppose that the 10M table would actually be the equivalent of a materialized join of the smaller tables? I'd guess that recomputing the join on the fly is better because it takes so much less I/O, but that's just a guess. You'd be well advised to test both ways and see. regards, tom lane
В списке pgsql-general по дате отправления: