Re: Double sorting split patch
От | Heikki Linnakangas |
---|---|
Тема | Re: Double sorting split patch |
Дата | |
Msg-id | 4E8AD603.80506@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: Double sorting split patch (Alexander Korotkov <aekorotkov@gmail.com>) |
Ответы |
Re: Double sorting split patch
|
Список | pgsql-hackers |
On 04.10.2011 11:51, Alexander Korotkov wrote: > On Tue, Oct 4, 2011 at 12:12 PM, Heikki Linnakangas< > heikki.linnakangas@enterprisedb.com> wrote: > >> Can you elaborate the consider-split algorithm? The criteria to select the >> new split over the previously selected one is this: >> >>> ! /* >>> ! * If ratio is acceptable, we should compare current split >>> with >>> ! * previously selected one. If no split was selected then >>> we select >>> ! * current anyway. Between splits of one dimension we >>> search for >>> ! * minimal overlap (allowing negative values) and minimal >>> ration >>> ! * (between same overlaps. We switch dimension if find >>> less overlap >>> ! * (non-negative) or less range with same overlap. >>> ! */ >>> ! range = diminfo->upper - diminfo->lower; >>> ! overlap = ((leftUpper) - (rightLower)) / range; >>> ! if (context->first || >>> ! (context->dim == dimNum&& >>> ! (overlap< context->overlap || >>> ! (overlap == context->overlap&& ratio> >>> context->ratio))) || >>> ! (context->dim != dimNum&& >>> ! ((range> context->range&& >>> ! non_negative(overlap)<= >>> non_negative(context->overlap)**) || >>> ! non_negative(overlap)< >>> non_negative(context->overlap)**)) >>> ! ) >>> ! { >>> >> >> Why are negative overlaps handled differently across dimensions and within >> the same dimension? Your considerSplit algorithm in the SYRCoSE 2011 paper >> doesn't seem to make that distinction. > > Yes, I've changed this behaviour after experiments on real-life datasets. On > the datasets where data don't overlap themselfs (points are always such > datasets), non-overlapping splits are frequently possible. In this case > splits tends to be along one dimension, because most distant non-overlapping > splits (i.e. having lowest negative overlap) appears to be in the same > dimension as previous split. Therefore MBRs appear to be very prolonged > along another dimension, that's bad for search. In order to prevent this > behavour I've made transition to another dimension by non-nagative part of > overlap and range. Using range as the split criteria makes MBRs more > quadratic, and using non-negative part of overlap as priority criteria give > to range chance to matter. Ok. Could you phrase that as a code comment? Here's a version of the patch I've been working on. There's no functional changes, just a lot of moving things around, comment changes, etc. to hopefully make it more readable. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Вложения
В списке pgsql-hackers по дате отправления: