Re: [HACKERS] path toward faster partition pruning
От | Amit Langote |
---|---|
Тема | Re: [HACKERS] path toward faster partition pruning |
Дата | |
Msg-id | 159153b4-0360-f696-769a-0e0d6e1d6a5b@lab.ntt.co.jp обсуждение исходный текст |
Ответ на | Re: [HACKERS] path toward faster partition pruning (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: [HACKERS] path toward faster partition pruning
|
Список | pgsql-hackers |
Thanks for the review. On 2018/02/02 7:38, Robert Haas wrote: > +/* > + * PartitionBoundCmpArg - Caller-defined argument to be passed to > + * partition_bound_cmp() > + * > + * The first (fixed) argument involved in a comparison is the partition bound > + * found in the catalog, while an instance of the following struct describes > + * either a new partition bound being compared against existing bounds > + * (caller should set is_bound to true and set bound), or a new tuple's > + * partition key specified in datums (caller should set ndatums to the number > + * of valid datums that are passed in the array). > + */ > +typedef struct PartitionBoundCmpArg > +{ > + bool is_bound; > + union > + { > + PartitionListValue *lbound; > + PartitionRangeBound *rbound; > + PartitionHashBound *hbound; > + } bound; > + > + Datum *datums; > + int ndatums; > +} PartitionBoundCmpArg; > > This is a pretty strange definition. datums/ndatums are never valid > at the same time as any of lbound/rbound/hbound, but are not included > in the union. Also, is_bound doesn't tell you which of > rbound/lbound/hbound is actually valid. Granted, the current calling > convention looks like a mess, too. Apparently, the argument to > partition_bound_cmp is a PartitionBoundSpec if using hash > partitioning, a Datum if list partitioning, and either a > PartitionRangeBound or a Datum * if range partitioning depending on > the value of probe_is_bound, and I use the word "apparently" because > there are zero words of comments explaining what the argument to > partition_bound_cmp of type "void *" is supposed to mean. I really > should have noticed that and insisted that it be fixed before > partitioning got committed. Yeah, I was trying to fix the status quo by introducing that new struct, but I agree it's much better to modify the functions around a bit like the way you describe below. > Looking a bit further, there are exactly two calls to > partition_bound_cmp(). One is in partition_bound_bsearch() and the > other is in check_new_partition_bound(). Now, looking at this, both > the call to partition_bound_cmp() and every single call to > partition_bound_bsearch() are inside a switch branch where we've > dispatched on the partitioning type, which means that from code that > is already specialized by partitioning type we are calling generic > code which then turns back around and calls code that is specialized > by partitioning type. Now, that could make sense if the generic code > is pretty complex, but here's it's basically just the logic to do a > bsearch. It seems to me that a cleaner solution here would be to > duplicate that logic. Then we could have... > > static int partition_list_bsearch(PartitionKey key, PartitionBoundInfo > boundinfo, > Datum value, bool *is_equal); > static int partition_range_bsearch(PartitionKey key, > PartitionBoundInfo boundinfo, > PartitionRangeBound *probe); > static int partition_range_datum_bsearch(PartitionKey key, > PartitionBoundInfo boundinfo, > int nvalues, Datum *values); > static int partition_hash_bsearch(PartitionKey key, PartitionBoundInfo > boundinfo, > int modulus, int remainder, bool *is_equal); > > ...which would involve fewer branches at runtime and more type-safety > at compile time. partition_hash_bsearch could directly call > partition_hbound_cmp, partition_list_bsearch could directly invoke > FunctionCall2Coll, partition_range_bsearch could directly call > partition_rbound_cmp, and partition_range_datum_bsearch could directly > call partition_rbound_datum_cmp. > > All-in-all that seems a lot nicer to me than what we have here now. > IIUC, the purpose of this patch is to let you search on a prefix of > the partition keys, but I think that's really only possible for range > partitioning, and it seems like the proposed nvalues argument to > partition_range_datum_bsearch would give you what you need. Your proposed cleanup sounds much better, so I implemented it in the attached updated 0001, while dropping the previously proposed PartitionBoundCmpArg approach. Updated set of patches attached (patches 0002 onward mostly unchanged, except I incorporated the delta patch posted by David upthread). Thanks, Amit
Вложения
- v24-0001-Refactor-code-for-partition-bound-searching.patch
- v24-0002-Introduce-a-get_partitions_from_clauses.patch
- v24-0003-Move-some-code-of-set_append_rel_size-to-separat.patch
- v24-0004-More-refactoring-around-partitioned-table-Append.patch
- v24-0005-Teach-planner-to-use-get_partitions_from_clauses.patch
В списке pgsql-hackers по дате отправления: