Re: Declarative partitioning - another take
От | Amit Langote |
---|---|
Тема | Re: Declarative partitioning - another take |
Дата | |
Msg-id | 2587c47e-7ff7-80db-953f-fb532c215219@lab.ntt.co.jp обсуждение исходный текст |
Ответ на | Re: Declarative partitioning - another take (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: Declarative partitioning - another take
|
Список | pgsql-hackers |
On 2016/11/18 4:14, Robert Haas wrote: > On Thu, Nov 17, 2016 at 6:27 AM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> Meanwhile, here are updated patch that address most of the following comments. > > OK, I have re-reviewed 0005 and it looks basically fine to me, modulo > a few minor nitpicks. "This is called, *iff*" shouldn't have a comma > there, Fixed. > and I think the entire comment block that starts with "NOTE: > SQL specifies that a NULL" and ends with "it's unlikely that NULL > would result." should be changed to say something like /* As for > catalogued constraints, we treat a NULL result as a success, not a > failure. */ rather than duplicating an existing comment that doesn't > quite apply here. Ah, you're right that the comment does not apply as it is. I rewrote that comment. Oh but wait, that means I can insert rows with NULLs in the range partition key if I choose to insert it directly into the partition, whereas I have been thinking all this while that there could never be NULLs in the partition key of a range partition. What's more, get_qual_for_partbound() (patch 0003) emits a IS NOT NULL constraint for every partition key column in case of a range partition. Is that wrongheaded altogether? (also see my reply to your earlier message about NULLs in the range partition key) > Finally, ExecConstraints() contains a new if-block > whose sole contents are another if-block. Perhaps if (this && that) > would be better. Agreed, should have noticed that. > Regarding 0006 and 0007, I think the PartitionTreeNodeData structure > you've chosen is awfully convoluted and probably not that efficient. > For example, get_partition_for_tuple() contains this loop: > > + prev = parent; > + node = parent->downlink; > + while (node != NULL) > + { > + if (node->index >= cur_idx) > + break; > + > + prev = node; > + node = node->next; > + } > > Well, it looks to me like that's an O(n) way to find the n'th > partition, which seems like a pretty bad idea in performance-critical > code, which this is. I think this whole structure needs a pretty > heavy overhaul. Here's a proposal: Thanks for the idea below! > 1. Forget the idea of a tree. Instead, let the total number of tables > in the partitioning hierarchy be N and let the number of those that > are partitioned be K. Assign each partitioned table in the hierarchy > an index between 0 and K-1. Make your top level data structure (in > lieu of PartitionTreeNodeData) be an array of K PartitionDispatch > objects, with the partitioning root in entry 0 and the rest in the > remaining entries. > > 2. Within each PartitionDispatch object, store (a) a pointer to a > PartitionDesc and (b) an array of integers of length equal to the > PartitionDesc's nparts value. Each integer i, if non-negative, is the > final return value for get_partition_for_tuple. If i == -1, tuple > routing fails. If i < -1, we must next route using the subpartition > whose PartitionDesc is at index -(i+1). Arrange for the array to be > in the same order the PartitionDesc's OID list. > > 3. Now get_partition_for_tuple looks something like this: > > K = 0 > loop: > pd = PartitionDispatch[K] > idx = list/range_partition_for_tuple(pd->partdesc, ...) > if (idx >= -1) > return idx > K = -(idx + 1) > > No recursion, minimal pointer chasing, no linked lists. The whole > thing is basically trivial aside from the cost of > list/range_partition_for_tuple itself; optimizing that is a different > project. I might have some details slightly off here, but hopefully > you can see what I'm going for: you want to keep the computation that > happens in get_partition_for_tuple() to an absolute minimum, and > instead set things up in advance so that getting the partition for a > tuple is FAST. And you want the data structures that you are using in > that process to be very compact, hence arrays instead of linked lists. This sounds *much* better. Here is a quick attempt at coding the design you have outlined above in the attached latest set of patches. PS: I haven't run the patches through pgindent yet. Thanks, Amit
Вложения
- 0001-Catalog-and-DDL-for-partitioned-tables-16.patch
- 0002-psql-and-pg_dump-support-for-partitioned-tables-16.patch
- 0003-Catalog-and-DDL-for-partitions-16.patch
- 0004-psql-and-pg_dump-support-for-partitions-16.patch
- 0005-Teach-a-few-places-to-use-partition-check-quals-16.patch
- 0006-Introduce-a-PartitionDispatch-data-structure-16.patch
- 0007-Tuple-routing-for-partitioned-tables-16.patch
- 0008-Update-DDL-Partitioning-chapter-to-reflect-new-devel-16.patch
В списке pgsql-hackers по дате отправления: