Re: [GENERAL] Creation of tsearch2 index is very
От | Ron |
---|---|
Тема | Re: [GENERAL] Creation of tsearch2 index is very |
Дата | |
Msg-id | 7.0.1.0.2.20060121064930.03aba810@earthlink.net обсуждение исходный текст |
Ответ на | Re: [GENERAL] Creation of tsearch2 index is very slow (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: [GENERAL] Creation of tsearch2 index is very
Re: [GENERAL] Creation of tsearch2 index is very |
Список | pgsql-performance |
At 07:23 PM 1/20/2006, Tom Lane wrote: >"Steinar H. Gunderson" <sgunderson@bigfoot.com> writes: > > On Fri, Jan 20, 2006 at 06:52:37PM -0500, Tom Lane wrote: > >> It's also worth considering that the entire approach is a heuristic, > >> really --- getting the furthest-apart pair of seeds doesn't guarantee > >> an optimal split as far as I can see. Maybe there's some totally > >> different way to do it. > > > For those of us who don't know what tsearch2/gist is trying to accomplish > > here, could you provide some pointers? :-) > >Well, we're trying to split an index page that's gotten full into >two index pages, preferably with approximately equal numbers of items in >each new page (this isn't a hard requirement though). Maybe we are over thinking this. What happens if we do the obvious and just make a new page and move the "last" n/2 items on the full page to the new page? Various forms of "move the last n/2 items" can be tested here: 0= just split the table in half. Sometimes KISS works. O(1). 1= the one's with the highest (or lowest) "x" value. 2= the one's with the highest sum of coordinates (x+y+...= values in the top/bottom n/2 of entries). 3= split the table so that each table has entries whose size_waste values add up to approximately the same value. 4= I'm sure there are others. 1-5 can be done in O(n) time w/o auxiliary data. They can be done in O(1) if we've kept track of the appropriate metric as we've built the current page. >I think the true figure of merit for a split is how often will >subsequent searches have to descend into *both* of the resulting >pages as opposed to just one >--- the less often that is true, the better. I'm not very clear on >what tsearch2 is doing with these bitmaps, but it looks like an >upper page's downlink has the union (bitwise OR) of the one-bits in >the values on the lower page, and you have to visit the lower page >if this union has a nonempty intersection with the set you are >looking for. If that's correct, what you really want is to divide >the values so that the unions of the two sets have minimal overlap >... which seems to me to have little to do with what the code does at present. I'm not sure what "upper page" and "lower page" mean here? >Teodor, Oleg, can you clarify what's needed here? Ditto. Guys what is the real motivation and purpose for this code? Ron
В списке pgsql-performance по дате отправления: