Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
От | Jeff Davis |
---|---|
Тема | Re: GiST for range types (was Re: Range Types - typo + NULL string constructor) |
Дата | |
Msg-id | 1324540006.7608.79.camel@jdavis обсуждение исходный текст |
Ответ на | Re: GiST for range types (was Re: Range Types - typo + NULL string constructor) (Alexander Korotkov <aekorotkov@gmail.com>) |
Ответы |
Re: GiST for range types (was Re: Range Types - typo + NULL
string constructor)
|
Список | pgsql-hackers |
On Wed, 2011-12-14 at 01:04 +0400, Alexander Korotkov wrote: > Hi! > Thank you! Attached a few changes: * Change "ordinal" to "normal" for clarity (at least to me). * Some comment cleanup * Change classes_groups to be an enum of SPLIT_LEFT and SPLIT_RIGHT, rather than using 1 and 2. * Changed the "bounds_lower" and "bounds_upper" variables into "by_lower" and "by_upper" to indicate that the arrays are distinguished by sort order. It was confusing me to read it otherwise. A few comments: * In range_gist_picksplit, it would be nice to have a little bit more intuitive description of what's going on with the nonEmptyCount and nonInfCount numbers. For instance, it appears to depend on the fact that a range must either be in nonEmptyCount or in nonInfCount. Also, can you describe the reason you're multiplying by two and taking the absolute value? It seems to work, but I am missing the intuition behind those operations. * The penalty function is fairly hard to read still. At a high level, I think we're trying to accomplish a few things (in order from most to least important): - Keep normal ranges separate. - Avoid broadening the class of the original predicate (e.g. turning single-infinite into double-infinite). - Avoid broadening (as determined by subtype_diff) the original predicate. - Favor adding ranges to narrower original predicates. Do you agree? If so, perhaps we can attack those more directly and it might be a little more readable. Additionally, the arbitrary numbers might become a problem. Can we choose better constants there? They would still be arbitrary when compared with real numbers derived from subtype_diff, but maybe we can still do better than what's there. * Regarding the leftover "common" entries that can go to either side: what is the "delta" measure trying to accomplish? When I work through some examples, it seems to favor putting larger common ranges on the left (small delta) and smaller common ranges on the right (smaller delta). Why is that good? Or did I misread the code? Intuitively, I would think that we'd want to push the ranges with lower upper bounds to the left and higher lower bounds to the right -- in other words, recurse. Obviously, we'd need to make sure it terminated at some point, but splitting the common entries does seem like a smaller version of the original problem. Thoughts? Thank you for the helpful comments! It took me a while to work through the logic, but I would have been lost completely without the comments around the double sorting split. Regards, Jeff Davis
Вложения
В списке pgsql-hackers по дате отправления: